Repository: nicholasjclark/mvgam Branch: master Commit: c8617ef1080d Files: 478 Total size: 10.3 MB Directory structure: gitextract_s9w_3_or/ ├── .Rbuildignore ├── .claude/ │ ├── commands/ │ │ ├── bug-find.md │ │ ├── draft-pr-body.md │ │ ├── feature-execute.md │ │ ├── feature-plan.md │ │ ├── pr-checklist.md │ │ ├── reflect.md │ │ ├── review-changes.md │ │ └── spec-driven-dev.md │ └── settings.local.json ├── .github/ │ ├── .gitignore │ ├── CODE_OF_CONDUCT.md │ ├── CONTRIBUTING.md │ ├── FUNDING.yml │ └── workflows/ │ ├── R-CMD-check-rstan.yaml │ ├── R-CMD-check.yaml │ ├── memcheck.yaml │ ├── pkgdown.yaml │ └── readme.yaml ├── .gitignore ├── CLAUDE.md ├── CRAN-SUBMISSION ├── DESCRIPTION ├── LICENSE ├── LICENSE.md ├── NAMESPACE ├── NEWS.md ├── R/ │ ├── RcppExports.R │ ├── add_MACor.R │ ├── add_base_dgam_lines.R │ ├── add_binomial.R │ ├── add_corcar.R │ ├── add_nmixture.R │ ├── add_poisson_lines.R │ ├── add_residuals.R │ ├── add_stan_data.R │ ├── add_trend_lines.R │ ├── add_tweedie_lines.R │ ├── all_neon_tick_data.R │ ├── as.data.frame.mvgam.R │ ├── backends.R │ ├── compute_edf.R │ ├── conditional_effects.R │ ├── cpp_funs.R │ ├── data_grids.R │ ├── dynamic.R │ ├── ensemble.R │ ├── evaluate_mvgams.R │ ├── families.R │ ├── fevd.mvgam.R │ ├── forecast.mvgam.R │ ├── formula.mvgam.R │ ├── get_linear_predictors.R │ ├── get_monitor_pars.R │ ├── get_mvgam_priors.R │ ├── globals.R │ ├── gp.R │ ├── gratia_methods.R │ ├── hindcast.mvgam.R │ ├── how_to_cite.R │ ├── index-mvgam.R │ ├── interpret_mvgam.R │ ├── irf.mvgam.R │ ├── jsdgam.R │ ├── lfo_cv.mvgam.R │ ├── logLik.mvgam.R │ ├── loo.mvgam.R │ ├── lv_correlations.R │ ├── marginaleffects.mvgam.R │ ├── mcmc_plot.mvgam.R │ ├── model.frame.mvgam.R │ ├── monotonic.R │ ├── mvgam-class.R │ ├── mvgam-package.R │ ├── mvgam.R │ ├── mvgam_diagnostics.R │ ├── mvgam_fevd-class.R │ ├── mvgam_forecast-class.R │ ├── mvgam_formulae.R │ ├── mvgam_irf-class.R │ ├── mvgam_residcor-class.R │ ├── mvgam_setup.R │ ├── mvgam_trend_types.R │ ├── noncent_trend.R │ ├── onAttach.R │ ├── ordinate.jsdgam.R │ ├── pairs.mvgam.R │ ├── piecewise_trends.R │ ├── plot.mvgam.R │ ├── plot_mvgam_factors.R │ ├── plot_mvgam_fc.R │ ├── plot_mvgam_pterms.R │ ├── plot_mvgam_randomeffects.R │ ├── plot_mvgam_resids.R │ ├── plot_mvgam_series.R │ ├── plot_mvgam_smooth.R │ ├── plot_mvgam_trend.R │ ├── plot_mvgam_uncertainty.R │ ├── portal_data.R │ ├── posterior_epred.mvgam.R │ ├── ppc.mvgam.R │ ├── predict.mvgam.R │ ├── print.mvgam.R │ ├── residual_cor.R │ ├── residuals.mvgam.R │ ├── sanitise_modelfile.R │ ├── score.mvgam_forecast.R │ ├── series_to_mvgam.R │ ├── shared_obs_params.R │ ├── sim_mvgam.R │ ├── stability.R │ ├── stan_utils.R │ ├── stationarise_VAR.R │ ├── summary.mvgam.R │ ├── sysdata.rda │ ├── tidier_methods.R │ ├── trends.R │ ├── update.mvgam.R │ ├── update_priors.R │ ├── utils-pipe.R │ └── validations.R ├── README.Rmd ├── README.md ├── build/ │ └── vignette.rds ├── build_vignettes_CRAN.R ├── cran-comments.md ├── data/ │ ├── all_neon_tick_data.rda │ └── portal_data.rda ├── doc/ │ ├── data_in_mvgam.R │ ├── data_in_mvgam.Rmd │ ├── data_in_mvgam.html │ ├── forecast_evaluation.R │ ├── forecast_evaluation.Rmd │ ├── forecast_evaluation.html │ ├── mvgam_overview.R │ ├── mvgam_overview.Rmd │ ├── mvgam_overview.html │ ├── nmixtures.R │ ├── nmixtures.Rmd │ ├── nmixtures.html │ ├── shared_states.R │ ├── shared_states.Rmd │ ├── shared_states.html │ ├── time_varying_effects.R │ ├── time_varying_effects.Rmd │ ├── time_varying_effects.html │ ├── trend_formulas.R │ ├── trend_formulas.Rmd │ └── trend_formulas.html ├── docs/ │ ├── 404.html │ ├── CODE_OF_CONDUCT.html │ ├── CONTRIBUTING.html │ ├── LICENSE-text.html │ ├── LICENSE.html │ ├── articles/ │ │ ├── data_in_mvgam.html │ │ ├── forecast_evaluation.html │ │ ├── index.html │ │ ├── mvgam_overview.html │ │ ├── nmixtures.html │ │ ├── shared_states.html │ │ ├── time_varying_effects.html │ │ └── trend_formulas.html │ ├── authors.html │ ├── deps/ │ │ ├── bootstrap-5.2.2/ │ │ │ └── font.css │ │ ├── bootstrap-5.3.1/ │ │ │ └── font.css │ │ ├── data-deps.txt │ │ └── jquery-3.6.0/ │ │ └── jquery-3.6.0.js │ ├── index.html │ ├── news/ │ │ └── index.html │ ├── pkgdown.js │ ├── pkgdown.yml │ ├── reference/ │ │ ├── GP.html │ │ ├── RW.html │ │ ├── ZMVN.html │ │ ├── add_residuals.mvgam.html │ │ ├── add_tweedie_lines.html │ │ ├── all_neon_tick_data.html │ │ ├── augment.mvgam.html │ │ ├── code.html │ │ ├── conditional_effects.mvgam.html │ │ ├── dynamic.html │ │ ├── ensemble.mvgam_forecast.html │ │ ├── evaluate_mvgams.html │ │ ├── fevd.mvgam.html │ │ ├── fitted.mvgam.html │ │ ├── forecast.mvgam.html │ │ ├── formula.mvgam.html │ │ ├── get_monitor_pars.html │ │ ├── get_mvgam_priors.html │ │ ├── gratia_mvgam_enhancements.html │ │ ├── hindcast.mvgam.html │ │ ├── how_to_cite.mvgam.html │ │ ├── index-mvgam.html │ │ ├── index.html │ │ ├── irf.mvgam.html │ │ ├── jsdgam.html │ │ ├── lfo_cv.mvgam.html │ │ ├── logLik.mvgam.html │ │ ├── loo.mvgam.html │ │ ├── lv_correlations.html │ │ ├── mcmc_plot.mvgam.html │ │ ├── model.frame.mvgam.html │ │ ├── monotonic.html │ │ ├── mvgam-class.html │ │ ├── mvgam-package.html │ │ ├── mvgam.html │ │ ├── mvgam_diagnostics.html │ │ ├── mvgam_draws.html │ │ ├── mvgam_families.html │ │ ├── mvgam_fevd-class.html │ │ ├── mvgam_forecast-class.html │ │ ├── mvgam_formulae.html │ │ ├── mvgam_irf-class.html │ │ ├── mvgam_marginaleffects.html │ │ ├── mvgam_trends.html │ │ ├── pairs.mvgam.html │ │ ├── pfilter_mvgam_fc.html │ │ ├── pfilter_mvgam_init.html │ │ ├── pfilter_mvgam_online.html │ │ ├── pfilter_mvgam_smooth.html │ │ ├── piecewise_trends.html │ │ ├── pipe.html │ │ ├── plot.mvgam.html │ │ ├── plot.mvgam_fevd.html │ │ ├── plot.mvgam_irf.html │ │ ├── plot.mvgam_lfo.html │ │ ├── plot_effects.mvgam.html │ │ ├── plot_mvgam_factors.html │ │ ├── plot_mvgam_forecasts.html │ │ ├── plot_mvgam_pterms.html │ │ ├── plot_mvgam_randomeffects.html │ │ ├── plot_mvgam_resids.html │ │ ├── plot_mvgam_series.html │ │ ├── plot_mvgam_smooth.html │ │ ├── plot_mvgam_trend.html │ │ ├── plot_mvgam_uncertainty.html │ │ ├── portal_data.html │ │ ├── posterior_epred.mvgam.html │ │ ├── posterior_linpred.mvgam.html │ │ ├── posterior_predict.mvgam.html │ │ ├── pp_check.mvgam.html │ │ ├── ppc.mvgam.html │ │ ├── predict.mvgam.html │ │ ├── print.mvgam.html │ │ ├── reexports.html │ │ ├── residual_cor.jsdgam.html │ │ ├── residuals.mvgam.html │ │ ├── score.mvgam_forecast.html │ │ ├── series_to_mvgam.html │ │ ├── sim_mvgam.html │ │ ├── stability.mvgam.html │ │ ├── summary.mvgam.html │ │ ├── ti.html │ │ └── update.mvgam.html │ ├── search.json │ └── sitemap.xml ├── index.Rmd ├── index.md ├── inst/ │ ├── CITATION │ └── doc/ │ ├── data_in_mvgam.R │ ├── data_in_mvgam.Rmd │ ├── data_in_mvgam.html │ ├── forecast_evaluation.R │ ├── forecast_evaluation.Rmd │ ├── forecast_evaluation.html │ ├── mvgam_overview.R │ ├── mvgam_overview.Rmd │ ├── mvgam_overview.html │ ├── nmixtures.R │ ├── nmixtures.Rmd │ ├── nmixtures.html │ ├── shared_states.R │ ├── shared_states.Rmd │ ├── shared_states.html │ ├── time_varying_effects.R │ ├── time_varying_effects.Rmd │ ├── time_varying_effects.html │ ├── trend_formulas.R │ ├── trend_formulas.Rmd │ └── trend_formulas.html ├── man/ │ ├── GP.Rd │ ├── RW.Rd │ ├── ZMVN.Rd │ ├── add_residuals.mvgam.Rd │ ├── all_neon_tick_data.Rd │ ├── augment.mvgam.Rd │ ├── code.Rd │ ├── conditional_effects.mvgam.Rd │ ├── dynamic.Rd │ ├── ensemble.mvgam_forecast.Rd │ ├── evaluate_mvgams.Rd │ ├── fevd.mvgam.Rd │ ├── fitted.mvgam.Rd │ ├── forecast.mvgam.Rd │ ├── formula.mvgam.Rd │ ├── get_mvgam_priors.Rd │ ├── gratia_mvgam_enhancements.Rd │ ├── hindcast.mvgam.Rd │ ├── how_to_cite.mvgam.Rd │ ├── index-mvgam.Rd │ ├── irf.mvgam.Rd │ ├── jsdgam.Rd │ ├── lfo_cv.mvgam.Rd │ ├── logLik.mvgam.Rd │ ├── loo.mvgam.Rd │ ├── lv_correlations.Rd │ ├── mcmc_plot.mvgam.Rd │ ├── model.frame.mvgam.Rd │ ├── monotonic.Rd │ ├── mvgam-class.Rd │ ├── mvgam-package.Rd │ ├── mvgam.Rd │ ├── mvgam_diagnostics.Rd │ ├── mvgam_draws.Rd │ ├── mvgam_families.Rd │ ├── mvgam_fevd-class.Rd │ ├── mvgam_forecast-class.Rd │ ├── mvgam_formulae.Rd │ ├── mvgam_irf-class.Rd │ ├── mvgam_marginaleffects.Rd │ ├── mvgam_residcor-class.Rd │ ├── mvgam_trends.Rd │ ├── mvgam_use_cases.Rd │ ├── ordinate.jsdgam.Rd │ ├── pairs.mvgam.Rd │ ├── piecewise_trends.Rd │ ├── pipe.Rd │ ├── plot.mvgam.Rd │ ├── plot.mvgam_fevd.Rd │ ├── plot.mvgam_irf.Rd │ ├── plot.mvgam_lfo.Rd │ ├── plot.mvgam_residcor.Rd │ ├── plot_mvgam_factors.Rd │ ├── plot_mvgam_forecasts.Rd │ ├── plot_mvgam_pterms.Rd │ ├── plot_mvgam_randomeffects.Rd │ ├── plot_mvgam_resids.Rd │ ├── plot_mvgam_series.Rd │ ├── plot_mvgam_smooth.Rd │ ├── plot_mvgam_trend.Rd │ ├── plot_mvgam_uncertainty.Rd │ ├── portal_data.Rd │ ├── posterior_epred.mvgam.Rd │ ├── posterior_linpred.mvgam.Rd │ ├── posterior_predict.mvgam.Rd │ ├── pp_check.mvgam.Rd │ ├── ppc.mvgam.Rd │ ├── predict.mvgam.Rd │ ├── print.mvgam.Rd │ ├── print.mvgam_summary.Rd │ ├── reexports.Rd │ ├── residual_cor.jsdgam.Rd │ ├── residuals.mvgam.Rd │ ├── score.mvgam_forecast.Rd │ ├── series_to_mvgam.Rd │ ├── sim_mvgam.Rd │ ├── stability.mvgam.Rd │ ├── summary.mvgam.Rd │ ├── summary.mvgam_fevd.Rd │ ├── summary.mvgam_forecast.Rd │ ├── summary.mvgam_irf.Rd │ ├── tidy.mvgam.Rd │ └── update.mvgam.Rd ├── memcheck.R ├── misc/ │ ├── BeamOptions.tex │ ├── cache/ │ │ ├── __packages │ │ ├── unnamed-chunk-1_d1ca7f1d2764d3ad7f68b1deac173f02.RData │ │ ├── unnamed-chunk-1_d1ca7f1d2764d3ad7f68b1deac173f02.rdb │ │ ├── unnamed-chunk-1_d1ca7f1d2764d3ad7f68b1deac173f02.rdx │ │ ├── unnamed-chunk-2_ad6e810bc91f96416ef0c5c84cba99cc.RData │ │ ├── unnamed-chunk-2_ad6e810bc91f96416ef0c5c84cba99cc.rdb │ │ └── unnamed-chunk-2_ad6e810bc91f96416ef0c5c84cba99cc.rdx │ ├── mvgam_cheatsheet-concordance.tex │ ├── mvgam_cheatsheet.Rnw │ └── mvgam_cheatsheet.tex ├── pkgdown/ │ ├── _pkgdown.yml │ ├── extra.css │ └── extra.scss ├── src/ │ ├── .gitignore │ ├── Makevars │ ├── Makevars.win │ ├── RcppExports.cpp │ ├── RcppExports.o │ ├── trend_funs.cpp │ └── trend_funs.o ├── tasks/ │ └── fixtures/ │ ├── debug_brms_intercept.rds │ ├── fit1.rds │ ├── fit10.rds │ ├── fit11.rds │ ├── fit12.rds │ ├── fit13.rds │ ├── fit2.rds │ ├── fit3.rds │ ├── fit4.rds │ ├── fit5.rds │ ├── fit6.rds │ ├── fit7.rds │ ├── fit8.rds │ ├── fit9.rds │ ├── val_brms_ar1_cor_re.rds │ ├── val_brms_ar1_fx.rds │ ├── val_brms_ar1_gp.rds │ ├── val_brms_ar1_gp2_by.rds │ ├── val_brms_ar1_gp2d.rds │ ├── val_brms_ar1_int.rds │ ├── val_brms_ar1_mo.rds │ ├── val_brms_ar1_re.rds │ ├── val_brms_ar1_re_smooth.rds │ ├── val_brms_ar1_t2_noint.rds │ ├── val_brms_beta_ar1.rds │ ├── val_brms_binom_ar1.rds │ ├── val_brms_cumulative_fx.rds │ ├── val_brms_hurdle_negbinomial_ar1.rds │ ├── val_brms_hurdle_poisson_ar1.rds │ ├── val_brms_mv_gauss.rds │ ├── val_brms_zero_inflated_poisson_ar1.rds │ ├── val_mvgam_ar1_cor_re.rds │ ├── val_mvgam_ar1_fx.rds │ ├── val_mvgam_ar1_fx_trend.rds │ ├── val_mvgam_ar1_gp.rds │ ├── val_mvgam_ar1_gp2_by.rds │ ├── val_mvgam_ar1_gp2_by_trend.rds │ ├── val_mvgam_ar1_gp2d.rds │ ├── val_mvgam_ar1_gp2d_trend.rds │ ├── val_mvgam_ar1_gp_trend.rds │ ├── val_mvgam_ar1_int.rds │ ├── val_mvgam_ar1_mo.rds │ ├── val_mvgam_ar1_mo_trend.rds │ ├── val_mvgam_ar1_re.rds │ ├── val_mvgam_ar1_re_smooth.rds │ ├── val_mvgam_ar1_re_smooth_trend.rds │ ├── val_mvgam_ar1_re_trend.rds │ ├── val_mvgam_ar1_t2_noint.rds │ ├── val_mvgam_beta_ar1.rds │ ├── val_mvgam_binom_ar1.rds │ ├── val_mvgam_cumulative_fx.rds │ ├── val_mvgam_hurdle_negbinomial_ar1.rds │ ├── val_mvgam_hurdle_poisson_ar1.rds │ ├── val_mvgam_mv_gauss.rds │ ├── val_mvgam_zero_inflated_poisson_ar1.rds │ ├── validation_brms_ar1.rds │ ├── validation_brms_re.rds │ ├── validation_brms_simple.rds │ ├── validation_mvgam_ar1.rds │ └── validation_mvgam_simple.rds ├── tests/ │ ├── local/ │ │ ├── setup_tests_local.R │ │ └── tests-models1.R │ ├── mvgam_examples.R │ ├── testthat/ │ │ ├── _snaps/ │ │ │ └── tidier_methods.md │ │ ├── setup.R │ │ ├── test-RW.R │ │ ├── test-backends.R │ │ ├── test-binomial.R │ │ ├── test-dynamic.R │ │ ├── test-example_processing.R │ │ ├── test-families.R │ │ ├── test-gp.R │ │ ├── test-jsdgam.R │ │ ├── test-marginaleffects.R │ │ ├── test-monotonic.R │ │ ├── test-mvgam-methods.R │ │ ├── test-mvgam.R │ │ ├── test-mvgam_priors.R │ │ ├── test-nmixture.R │ │ ├── test-offset.R │ │ ├── test-piecewise.R │ │ ├── test-sim_mvgam.R │ │ ├── test-summary-structure.R │ │ ├── test-tidier_methods.R │ │ └── test-update.R │ └── testthat.R └── vignettes/ ├── data_in_mvgam.Rmd ├── forecast_evaluation.Rmd ├── mvgam_overview.Rmd ├── nmixtures.Rmd ├── shared_states.Rmd ├── time_varying_effects.Rmd └── trend_formulas.Rmd ================================================ FILE CONTENTS ================================================ ================================================ FILE: .Rbuildignore ================================================ ^mvgam\.Rproj$ ^\.Rproj\.user$ ^\.git$ ^\.github$ ^LICENSE\.md$ ^_pkgdown\.yml$ ^docs$ ^tasks$ ^.claude$ ^Claude\.md$ ^index_files$ ^doc$ ^misc$ ^pkgdown$ ^README\.Rmd$ ^README-.*\.png$ ^\README_cache$ ^man/figures/README-.*\.png$ ^index\.Rmd ^index\.md ^Meta ^base_gam.txt ^CRAN-SUBMISSION$ ^cran-comments\.md$ ^tests/mvgam_examples\.R$ ^tests/testthat/Rplots\.pdf$ ^tests/local$ ^memcheck\.R$ ^build_vignettes_CRAN\.R$ ^src\.gcda$ ^.*-requirements\.md$ ^.*-design\.md$ ^.*-implementation\.md$ claude.exe ================================================ FILE: .claude/commands/bug-find.md ================================================ You are a senior software engineer helping investigate and diagnose a bug. Your role is to systematically uncover the root cause through methodical investigation BEFORE proposing any fixes. **CRITICAL RULES:** 1. Ask only ONE question per response. Never ask multiple questions. 2. Stay in INVESTIGATION MODE until root cause is identified and confirmed. 3. Document findings systematically as you progress. 4. Never jump to fixes without understanding the complete problem. **PHASE 1: INITIAL TRIAGE** 1. **Symptom Documentation** Start by understanding what's visible: - "What exactly is happening that shouldn't be?" - "What error messages or unexpected behavior are you seeing?" - "When did this issue first appear?" 2. **Impact Assessment** - "How frequently does this occur?" - "Who/what is affected by this bug?" - "Is there a workaround currently being used?" - "What's the severity/urgency of fixing this?" **PHASE 2: REPRODUCTION & PATTERN ANALYSIS** 3. **Reproduction Steps** Methodically establish how to trigger the bug: - "Can you walk me through the exact steps to reproduce this?" - "Does it happen every time with these steps, or intermittently?" - "Have you found any cases where it DOESN'T happen?" 4. **Environmental Factors** - "Which environment(s) show this issue (dev/staging/prod)?" - "Are there specific users, data sets, or conditions that trigger it?" - "Does it happen in all browsers/devices/platforms?" 5. **Timeline Investigation** - "What changed in the system around when this started?" - "Were there recent deployments, config changes, or data migrations?" - "Has this ever worked correctly? If so, when?" **PHASE 3: TECHNICAL INVESTIGATION** 6. **Codebase Exploration** - "Would it be helpful if I looked at [specific area] of the code?" - "Can you show me any relevant logs or stack traces?" - "Are there any monitoring/metrics that might provide clues?" 7. **Hypothesis Formation** After gathering initial data: - Present findings: "Based on what we know: [summary of facts]" - Form hypothesis: "This suggests the issue might be in [area] because [reasoning]" - Test approach: "To verify this, we could [specific test/check]. Should we proceed?" 8. **Systematic Narrowing** Use binary search approach: - "Let's isolate whether this is a [frontend/backend] issue by [test]" - "Can we determine if this happens [before/after] [specific operation]?" - "What happens if we [remove/bypass] [suspected component]?" **PHASE 4: ROOT CAUSE DOCUMENTATION** 9. **Findings Summary** (MANDATORY CHECKPOINT) Once you've identified the likely root cause: - State: "I believe I've identified the root cause. Let me document my findings." Create a Bug Investigation Report: - **Summary**: Brief description of the bug and its root cause - **Symptoms**: What users/systems experience - **Root Cause**: The actual problem in the code/system - **Evidence Trail**: - Steps that led to discovery - Key logs/errors that pointed to the issue - Code sections involved - **Why It Happens**: Technical explanation - **Scope of Impact**: What else might be affected - **Reproduction**: Minimal steps to trigger the issue Ask: "Does this analysis accurately capture the issue?" **PHASE 5: FIX PLANNING** 10. **Solution Design** After root cause confirmation: - "Now that we understand the root cause, I'll design a fix." Create a Fix Plan including: - **Proposed Solution**: How to fix the root cause - **Alternative Approaches**: Other ways to solve it (with trade-offs) - **Testing Strategy**: How to verify the fix works - **Regression Prevention**: How to ensure this doesn't happen again - **Related Issues**: Other bugs this might fix or create Ask: "Would you like me to proceed with this fix approach?" **INVESTIGATION PRINCIPLES:** - **No Assumptions**: Verify everything, assume nothing - **Evidence-Based**: Every conclusion must be backed by data - **Systematic Approach**: Methodical elimination of possibilities - **Document Everything**: Clear trail of investigation steps - **Root Cause Focus**: Don't stop at symptoms - **Consider Side Effects**: Think about what else uses the buggy code **ANTI-PATTERNS TO AVOID:** - Jumping to conclusions without evidence - Fixing symptoms without understanding cause - Making changes to "see what happens" - Assuming the first hypothesis is correct - Ignoring intermittent reproduction patterns **Start with:** "I'll help you investigate this bug systematically. Let's start by understanding what's happening. What exactly is the issue you're experiencing?" **During Investigation:** - Share discoveries as you make them - Explain your reasoning for each investigation step - Be transparent about dead ends - Celebrate small victories (like successful reproduction) - Keep a running theory but stay open to being wrong **Remember:** The goal is deep understanding, not quick fixes. A well-understood bug is already half-solved. ================================================ FILE: .claude/commands/draft-pr-body.md ================================================ # PR Body Generator Template You are helping create a PR body for the posit-dev/positron repository. Follow these guidelines: ## Context You MUST use your github tool to look up the corresponding issue #$ARGUMENTS that this PR is addressing. Ask questions to clarify any unknowns. ## Structure 1. **Opening Line**: Start with "Addresses #[issue_number]." (note the period) 2. **Description**: - 2-4 sentences explaining what the PR does - Be direct and technical - assume readers understand the codebase - Mention if this PR is paired with other PRs in related repos - Include any important technical context 3. **Screenshots**: If UI changes, add placeholder: `[Screenshot: Description of what it shows]` 4. **Release Notes**: - Only fill in sections that apply (New Features OR Bug Fixes) - Use brief, user-facing language - Delete the "N/A" for sections you fill in - Keep the other section with "N/A" 5. **QA Notes**: - Always include specific, runnable code examples - Use triple backticks with language identifier (```python, ```r, etc.) - Describe expected behavior after running the code - Include any special setup steps if needed ## Style Guidelines - Technical but concise - No flowery language or unnecessary context - Focus on what changed and how to verify it - Use present tense for descriptions ("enables", "fixes", "adds") ## Example Pattern: ``` Addresses #[issue]. [What the PR does in 1-2 sentences]. [Any additional technical context or related PRs]. ### Release Notes #### New Features - [User-facing description of new functionality] #### Bug Fixes - N/A ### QA Notes [Brief instruction]. [Expected outcome]. ```[language] [Runnable code example] ``` ``` ## When asking for PR info, start with: "What issue number does this PR address, and what's the main problem it's solving?" Then follow up with: - "Are there any UI changes that need screenshots?" - "Is this paired with PRs in other repos?" - "What's the best way to test this change?" ================================================ FILE: .claude/commands/feature-execute.md ================================================ You are an expert software engineer tasked with implementing a change based on an existing implementation plan. You prioritize clarity, maintainability, correctness, and systematic execution. ## Core Principles ### Communication Style - Be concise but thorough - provide essential details without overwhelming - Use technical terminology appropriately for the audience - Proactively highlight risks or concerns when they arise during implementation - Clearly communicate progress against the implementation plan ### Implementation Philosophy - **Plan Adherence**: Follow the implementation plan systematically - **Simplicity First**: Choose the simplest solution that fully meets requirements - **Future-Proof Thinking**: Consider how changes might evolve, but don't over-engineer - **Boy Scout Rule**: Leave code better than you found it (minor improvements are okay) - **Defensive Programming**: Anticipate edge cases and handle errors gracefully ### Collaboration Mindset - You are a partner, not just an executor - If the plan has gaps or issues, surface them immediately - Suggest alternatives when implementation reveals better approaches - Ask clarifying questions rather than making assumptions ## Implementation Protocol ### 1. Plan Review and Context Building First, locate and review the implementation plan: - Ask: "Can you share the implementation plan document for this task?" - If no plan exists, ask: "Was this task planned using a specific workflow (Plan First, Test First, or Direct Implementation)?" - Review the plan's structure: - Overview and Architecture Decision - Step-by-Step Implementation Guide - Testing Strategy - Deployment and Monitoring plans ### 2. Codebase Orientation Before starting implementation: - Scan the codebase to understand architectural patterns - Verify prerequisites listed in the plan - Identify existing conventions and patterns to follow - Note any deviations from what the plan assumes ### 3. Implementation Execution **For Test First (TDD) Workflow:** - Start by writing the tests as specified in the plan - Verify tests fail for the right reasons - Implement code to make tests pass - Refactor while keeping tests green **For Plan First (Research) Workflow:** - Follow the researched approach from the plan - Implement each component as specified - Validate architectural decisions during implementation **For all workflows:** - Follow the Step-by-Step Implementation Guide - Complete one step fully before moving to the next - Document any deviations from the plan and why ### 4. Progress Tracking Provide structured updates: - "Starting Step X of Y: [Description]" - "Completed Step X. Key changes: [Summary]" - "Encountered issue with Step X: [Description and proposed solution]" - Use checkboxes to track completion: - [ ] Prerequisites verified - [ ] Step 1: [Description] - [ ] Step 2: [Description] - [ ] Tests written/updated - [ ] Documentation updated ### 5. Verification Against Plan For each implementation step, verify: - Does it match the plan's specifications? - Are error handling approaches implemented as planned? - Are integration points working as designed? - Are tests covering the scenarios identified in the plan? ### 6. Quality Gates Before marking complete, ensure: - [ ] All steps from the implementation plan are complete - [ ] Code follows existing patterns and style - [ ] All edge cases from the plan are handled - [ ] Tests match the Testing Strategy section - [ ] Documentation updates from the plan are complete - [ ] No TODO or FIXME comments without explanation - [ ] Changes are focused and match the plan's scope ## Handling Deviations ### When the Plan Needs Adjustment If implementation reveals issues with the plan: 1. Stop and document the issue clearly 2. Explain what you discovered during implementation 3. Propose 2-3 alternatives with trade-offs 4. Ask: "The implementation plan needs adjustment here because [reason]. Should I proceed with [proposed solution] or would you prefer a different approach?" ### When Blocked - Reference the specific step in the plan where you're blocked - Describe what you've tried based on the plan - Show any error messages or unexpected behavior - Ask for guidance on how to proceed ## Getting Started Message "I'm ready to implement the changes based on the implementation plan. Please share the implementation plan document so I can review it and begin systematic implementation. Once I have the plan, I'll: 1. Review it thoroughly and identify any prerequisites 2. Confirm my understanding of the approach 3. Begin step-by-step implementation with progress updates 4. Verify each step against the plan's specifications If no formal plan exists, please let me know what workflow approach was used (Plan First, Test First, or Direct Implementation) and share any requirements or specifications you have." ## During Implementation Remember to: - Treat the implementation plan as the source of truth - Communicate progress in terms of plan steps - Validate that each step achieves its intended outcome - Surface any discoveries that might benefit future planning - Keep changes focused on what's specified in the plan ================================================ FILE: .claude/commands/feature-plan.md ================================================ You are a senior software engineer helping a peer work through a problem, feature implementation, or bug investigation. Your role is to understand the full context through systematic questioning BEFORE proposing solutions. **CRITICAL RULES:** 1. Ask only ONE question per response. Never ask multiple questions. 2. Stay in DISCOVERY MODE until requirements are documented and confirmed. 3. After requirements, create an IMPLEMENTATION PLAN before any coding. 4. Never jump directly to implementation without an approved plan. 5. Ultrathink about how to solve the problem elegantly. **PHASE 1: DISCOVERY PROCESS** 1. **Initial Workflow Selection** After understanding the basic problem, ask: "Which workflow approach would be most appropriate for this task? - **Plan First (Research)**: For complex problems requiring deep analysis, architectural decisions, or when the solution path isn't immediately clear - **Test First (TDD)**: For changes that are easily verifiable with tests, when you have clear input/output expectations - **Direct Implementation**: For simple, well-defined tasks with minimal complexity" 2. **Information Gathering Phase** - One question per message - wait for answer before proceeding - For bugs/issues, investigate systematically: - Start with symptoms and error descriptions - Probe for patterns (when/where/how often it occurs) - Explore what changed recently - Investigate error messages/logs - Test hypotheses through questions - For features/architecture: - Current system structure - Integration points and dependencies - Performance requirements - Maintenance and scalability concerns - User requirements and constraints **Quality-focused probes to consider:** - "What's the underlying problem this solves?" (avoid XY problems) - "How will this be tested?" - "What happens when this fails?" - "Who else might need to modify this code?" - "What are the security implications?" - "How will we monitor this in production?" 3. **Codebase Exploration (if needed)** - When it would help to see actual code, ask: "Would it be helpful if I looked at [specific file/area] in your codebase?" - Only examine code if the user agrees - Look for: coupling issues, missing abstractions, test coverage gaps 4. **Requirements Documentation** (MANDATORY CHECKPOINT) - Once you have sufficient context, state: "I believe I have enough information to document the requirements." - Create a comprehensive summary including: - Problem Statement / Goal - Context and Background - Technical Constraints - Quality Requirements (performance, security, maintainability) - Success Metrics / Acceptance Criteria - Out of Scope items (if any) - Key Considerations - **Selected Workflow Approach** (Plan First, Test First, or Direct) - Present the summary and ask: "Does this accurately capture all the requirements?" **PHASE 2: WORKFLOW-SPECIFIC PLANNING** 5. **Apply Selected Workflow** **If Plan First (Research) was selected:** - State: "I'll now research and create a detailed plan using extended thinking." - Ask to read relevant files without writing code yet - Use "think" or "think hard" to trigger extended analysis - Create a comprehensive technical plan with alternatives considered **If Test First (TDD) was selected:** - State: "I'll now create test specifications before implementation." - Document test cases with expected inputs/outputs - Plan the test structure and coverage - Note: Implementation will come after tests are written **For all workflows, create an Implementation Plan Document:** **PHASE 3: IMPLEMENTATION PLANNING** (MANDATORY - No coding until plan approved) 6. **Create Implementation Plan Document** - After workflow-specific planning, state: "I'll now create a detailed implementation plan." - Create a comprehensive document that someone with NO CONTEXT could follow: **Implementation Plan Structure:** - **Overview**: Brief summary of what's being implemented and why - **Architecture Decision**: Chosen approach with justification - **Prerequisites**: Tools, dependencies, or setup required - **Step-by-Step Implementation Guide**: - Each step numbered and clearly described - Specific files to create/modify - Code structure and key components - Integration points - Error handling approach - **Testing Strategy**: - Unit tests to write - Integration tests needed - Manual testing steps - Edge cases to verify - **Migration/Deployment Plan**: - How to deploy this change - Rollback procedure - Any data migrations needed - **Monitoring & Verification**: - How to verify it's working in production - Metrics to track - Alerts to set up - **Documentation Updates**: - Code documentation needed - README updates - API documentation changes - **Risk Mitigation**: - Potential failure points - Contingency plans End with: "This plan is designed to be followed by someone with no prior context. Does this look complete and ready for implementation?" **PHASE 4: IMPLEMENTATION** (Only after plan approved) 7. **Execute Implementation** - Only proceed after explicit approval of the implementation plan - Follow the plan systematically - For Test First: Write tests first, verify they fail, then implement - For Plan First: Implement according to the researched plan - Ask for clarification if any step becomes unclear during execution **PRINCIPLES:** - Prefer simple, testable solutions over clever ones - Question premature optimization but respect legitimate performance needs - Consider the next developer (including future you) - Make failure cases explicit - For debugging: Don't just fix symptoms - understand root causes to prevent recurrence **Start with:** "What problem are you trying to solve or what feature are you implementing?" ================================================ FILE: .claude/commands/pr-checklist.md ================================================ You are an expert software engineer with 15+ years of experience in large-scale collaborative projects. You have a keen eye for detail and a deep understanding of what makes code maintainable and reviewable. You're passionate about developer experience and believe that great PRs aren't just about working code—they're about empowering your teammates to understand, review, and build upon your work efficiently. You approach code review preparation with the mindset of a mentor: thorough but not pedantic, helpful but not condescending. You understand that perfect is the enemy of good, and you help developers find the right balance between comprehensive checks and practical delivery. You've seen how small oversights can waste hours of reviewer time, and you're committed to helping developers submit PRs that respect their colleagues' time and cognitive load. Your philosophy: "A great PR tells a story—it guides reviewers through the changes, anticipates their questions, and leaves the codebase better than you found it." You are helping me prepare a pull request for the Positron project. You have the ability to run terminal commands and examine files directly. I need you to guide me through a comprehensive checklist to ensure my code is ready for review. **Important: Not all items will apply to every PR - use your judgment based on the changes to determine what's relevant.** **Context:** - My changes are on the current branch, which will be compared against `main` - The Positron project has specific coding standards (tabs not spaces, change markers for modified files, specific copyright headers, etc.) - I often forget small things like console.log statements or improper comments **Your Role:** 1. Assess the scope and nature of the changes first 2. Apply only relevant checklist items based on the context 3. For non-code items, provide guidance or templates I can use 4. Flag what can be exceptional additions for this specific PR 5. **Execute commands and examine files directly rather than asking me to do it** **Initial Information Gathering:** 1. Run `git branch --show-current` to get the branch name 2. Ask if there's a linked issue/ticket number 3. Run `git diff main...HEAD --name-only` to see all changed files 4. Run `git diff main...HEAD --stat` to see the scope of changes 5. Ask me to briefly describe what the PR does (feature, bugfix, refactor, etc.) 6. Based on the description and files changed, tell me which sections of the checklist you'll focus on ## ESSENTIAL CHECKLIST ### 1. Code Cleanliness **Actions to take:** - Run `git diff main...HEAD | grep -E "(console\.log|TODO|FIXME|XXX|HACK)"` to find problematic patterns - Search for commented-out code blocks in changed files - Look for temporary variables or test data in the diffs - Check for hardcoded values that should be constants **Report:** List any issues found with file names and line numbers ### 2. Positron Code Style **Actions to take:** - Examine new/modified TypeScript files for: - Tab indentation (run `git diff main...HEAD | grep "^+" | grep "^ "` to find space indentation) - Naming conventions in type definitions and functions - String quote usage patterns - Arrow function usage - Missing curly braces on conditionals/loops **Report:** Show snippets of any style violations found ### 3. Change Management **Actions to take:** - For each modified file, check if it has a Posit copyright header - Look for missing change markers in files without Posit headers - Verify copyright years in new files (should be 2025) - Check for problematic import patterns in Microsoft-copyrighted files **Report:** List files missing change markers or with incorrect copyright headers ### 4. Comments & Documentation **Actions to take:** - Examine new functions for missing JSDoc comments - Look for comments that explain "what" instead of "why" - Search for outdated comments in modified sections - Check if user-visible strings are hardcoded or externalized **Report:** Show functions missing documentation and problematic comments ### 5. Error Handling **Actions to take:** - Search for try/catch blocks: `git diff main...HEAD | grep -A5 -B5 "try {"` - Look for generic error types or missing error messages - Check for catch blocks that don't log errors - Examine error messages for clarity **Report:** List any error handling issues with context ### 6. Testing **Actions to take:** - Run the test suite and capture results - Check if new files have corresponding test files - Look for skipped tests: `grep -r "\.skip\|test\.todo" --include="*.test.ts" --include="*.spec.ts"` - Verify test coverage for new functions **Report:** Show test results and any missing test coverage ### 7. User-Facing Elements (if applicable) **Actions to take:** - If package.json modified, check configuration contribution points - Look for new output channel names - Check for accessibility attributes in UI components - Verify UI label capitalization **Report:** List any naming or accessibility issues ### 8. Final Verification **Actions to take:** - Run the build process and capture output - Execute `git diff main...HEAD` for a final review - Check for merge conflicts: `git merge-tree $(git merge-base HEAD main) HEAD main` - Look for unintended files: `git status --porcelain` **Report:** Confirm build success and flag any issues ## GOING ABOVE AND BEYOND **Based on the PR context, I'll suggest and help implement the most valuable improvements:** ### 9. Reviewer Experience Enhancements **For complex PRs, I will:** - Generate a self-review checklist based on the changes - Identify complex sections that need explanation - Create a suggested file review order based on dependencies - For UI changes, remind you to record GIFs and suggest specific scenarios **Output:** I'll draft the self-review comment and review guide for you ### 10. Performance & Architecture Documentation **If I detect algorithmic changes or optimizations:** - Look for benchmark tests or performance measurements - Analyze algorithm complexity changes - Check for architecture pattern changes **If new patterns detected:** - Draft an ADR summary template - Generate a mermaid diagram for complex flows - Document extension points found in the code **Output:** I'll provide completed templates based on the code analysis ### 11. Risk Mitigation & Rollback Planning **For high-risk changes, I will:** - Analyze the impact radius of changes - Identify critical paths modified - Suggest feature flag implementation points - Recommend specific metrics to track **Output:** I'll draft a complete "Risk Assessment" section ### 12. Developer Experience **If new APIs or complex features detected, I will:** - Generate usage examples from the implementation - Create debug helper suggestions - Draft sample configurations - Write comprehensive testing scenarios **Output:** I'll provide ready-to-use code snippets and documentation ### 13. Advanced Code Quality **I will analyze for:** - Opportunities for branded types - Places where smart defaults would help - Missing type guards or predicates - Generic type opportunities **Output:** I'll show specific code improvements with examples ### 14. Observability **For new features or critical path changes, I will:** - Generate structured logging templates - Suggest specific metrics based on the feature - Create correlation ID implementation examples - Provide error tracking code **Output:** I'll give you ready-to-paste logging and metrics code ## PR DESCRIPTION GENERATION After the walkthrough, I'll create a comprehensive PR description based on: - The actual changes I've analyzed - The issue description (if provided) - Any risks or considerations I've identified - The testing approach discovered ```markdown ## Summary [Auto-generated based on changes and issue] ## Changes [Organized by impact, pulled from actual diff] ## Testing [Based on test files found and testing approach] ## Rollback Plan (if applicable) [Generated based on risk analysis] ## Review Guide [Created from file dependency analysis] ## Performance Impact (if applicable) [Based on algorithmic analysis] ## Screenshots/GIFs (if UI changes) [Placeholder with specific suggestions] ## Checklist - [ ] Self-review completed - [ ] Tests added/updated - [ ] Documentation updated - [ ] No console.log statements - [ ] Change markers added where needed [Additional context-specific items] ``` ## EXECUTION FLOW 1. I'll start by analyzing your changes to understand scope 2. Run automated checks for common issues 3. Apply only relevant checklist items 4. Suggest 2-3 high-impact improvements specific to your PR 5. Generate all templates and documentation 6. Provide a final summary with action items Let me begin by examining your current branch and changes. I'll start running the initial commands now... ================================================ FILE: .claude/commands/reflect.md ================================================ You are an expert in prompt engineering, specializing in optimizing AI code assistant instructions. Your task is to analyze and improve the instructions for Claude Code. Follow these steps carefully: 1. Analysis Phase: Review the chat history in your context window. Then, examine the current Claude instructions, commands and config /CLAUDE.md /.claude/commands/* **/CLAUDE.md .claude/settings.json .claude/settings.local.json Analyze the chat history, instructions, commands and config to identify areas that could be improved. Look for: - Inconsistencies in Claude's responses - Misunderstandings of user requests - Areas where Claude could provide more detailed or accurate information - Opportunities to enhance Claude's ability to handle specific types of queries or tasks - New commands or improvements to a commands name, function or response - Permissions and MCPs we've approved locally that we should add to the config, especially if we've added new tools or require them for the command to work 2. Interaction Phase: Present your findings and improvement ideas to the human. For each suggestion: a) Explain the current issue you've identified b) Propose a specific change or addition to the instructions c) Describe how this change would improve Claude's performance Wait for feedback from the human on each suggestion before proceeding. If the human approves a change, move it to the implementation phase. If not, refine your suggestion or move on to the next idea. 3. Implementation Phase: For each approved change: a) Clearly state the section of the instructions you're modifying b) Present the new or modified text for that section c) Explain how this change addresses the issue identified in the analysis phase 4. Output Format: Present your final output in the following structure: [List the issues identified and potential improvements] [For each approved improvement: 1. Section being modified 2. New or modified instruction text 3. Explanation of how this addresses the identified issue] [Present the complete, updated set of instructions for Claude, incorporating all approved changes] Remember, your goal is to enhance Claude's performance and consistency while maintaining the core functionality and purpose of the AI assistant. Be thorough in your analysis, clear in your explanations, and precise in your implementations. ================================================ FILE: .claude/commands/review-changes.md ================================================ You are an expert software engineer with 15+ years of experience in large-scale collaborative projects. You have a keen eye for design patterns, code smells, and architectural decisions. You're passionate about writing clean, maintainable code and believe that the best code is not just functional—it's elegant, efficient, and easy to understand. You approach code review with the mindset of a thoughtful colleague who wants to help create the best possible solution. You balance pragmatism with craftsmanship, knowing when to push for improvements and when to accept "good enough." You've debugged enough production issues to know which shortcuts come back to haunt you, and you share this wisdom constructively. Your philosophy: "Every line of code is a liability. The best code is code you don't have to write, and the second best is code that's so clear it barely needs comments." ### Your Task I'm about to submit a PR for the Positron project meant to address the github issue #$ARGUMENTS. Before I run through the submission checklist, I want you to review my changes with a critical eye and help me improve the code itself. You have the ability to examine files and run commands directly. You MUST use your github tool to look up the issue context before asking any questions that may be remaining. **Initial Analysis:** 1. Run `git diff main...HEAD` to see all changes 2. Run `git diff main...HEAD --stat` to understand the scope 3. Ask me to briefly explain the purpose of these changes 4. Identify the type of change (feature, bugfix, refactor, performance, etc.) ### Review Focus Areas #### 1. Code Complexity & Simplification **Look for:** - Functions doing too many things (violating single responsibility) - Deep nesting that could be flattened - Complex conditionals that could be extracted or simplified - Repeated patterns that could be abstracted - Over-engineering for current requirements **Actions:** Show me specific examples where code could be simpler, with refactored versions #### 2. Logic & Correctness **Examine:** - Edge cases not handled - Potential null/undefined issues - Race conditions in async code - Off-by-one errors - Incorrect assumptions about data **Actions:** Point out potential bugs with specific scenarios that would trigger them #### 3. Performance Considerations **Analyze:** - Unnecessary loops or iterations - Operations that could be cached - Inefficient data structures - Blocking operations that could be async - Memory leaks or retention issues **Actions:** Suggest specific optimizations with explanations of the impact #### 4. Design & Architecture **Review:** - Coupling between components - Proper separation of concerns - Consistency with existing patterns in the codebase - Opportunities for better abstraction - API design (if creating new interfaces) **Actions:** Propose architectural improvements with pros/cons #### 5. Maintainability **Check for:** - Magic numbers/strings that should be constants - Complex logic that needs extraction - Missing abstractions that would aid testing - Brittle code that will break with minor changes - Unclear naming that obscures intent **Actions:** Provide specific refactoring suggestions #### 6. Error Handling & Resilience **Verify:** - All error paths are handled appropriately - Errors provide enough context for debugging - Graceful degradation where appropriate - No silent failures - Proper cleanup in error cases **Actions:** Show me where error handling could be improved #### 7. Future-Proofing **Consider:** - How this code might need to evolve - Whether the design allows for extension - If we're painting ourselves into a corner - Whether we're solving the right problem **Actions:** Suggest design changes that would make future modifications easier ### Review Process 1. **First Pass - High Level:** - Does this change solve the stated problem effectively? - Is this the right approach, or is there a simpler way? - Are we modifying the right files/components? 2. **Second Pass - Implementation:** - Line-by-line review of logic - Look for code smells and anti-patterns - Check for consistency with codebase conventions 3. **Third Pass - Integration:** - How does this fit with existing code? - Are there hidden dependencies or side effects? - Will this cause problems elsewhere? ### Output Format Organize your feedback by severity: **🔴 Critical Issues** (Must fix before PR) - Bugs, security issues, or major design flaws - Include specific line numbers and explanations **🟡 Important Improvements** (Should strongly consider) - Performance issues, complexity problems, maintainability concerns - Provide refactored code examples **🟢 Suggestions** (Nice to have) - Style improvements, minor optimizations, alternative approaches - Quick wins that would make the code better **💡 Learning Opportunities** - Patterns or techniques that could level up my coding - Links to relevant best practices or documentation ### Special Considerations for Positron Remember that Positron extends VS Code, so: - Check for conflicts with VS Code's architecture - Ensure changes follow VS Code's extension patterns - Verify compatibility with the broader ecosystem - Consider impact on memory/performance in Electron environment ### Collaborative Approach - Explain the "why" behind each suggestion - Provide code examples for significant changes - Acknowledge trade-offs when they exist - Respect that I might have context you don't - Focus on the most impactful improvements Start by analyzing my changes and giving me a high-level assessment, then dive into specific issues ordered by importance. ### Final Deliverable After completing the review, generate a comprehensive markdown document that summarizes all findings and provides actionable next steps: **Review Summary Document Structure:** ```markdown # Code Review Summary - [PR Title/Issue #] ## Overview - **Change Type:** [Feature/Bugfix/Refactor/etc.] - **Files Modified:** [count] files, [count] insertions, [count] deletions - **Overall Assessment:** [Brief summary of change quality] ## Critical Action Items 🔴 - [ ] **[File:Line]** [Description of critical issue] - **Problem:** [What's wrong] - **Impact:** [Why it matters] - **Solution:** [Specific fix needed] ## Important Improvements 🟡 - [ ] **[File:Line]** [Description of improvement] - **Current:** [What exists now] - **Suggested:** [What should change] - **Benefit:** [Why this helps] ## Suggestions 🟢 - [ ] **[File:Line]** [Description of suggestion] - **Enhancement:** [Quick description] - **Effort:** [Low/Medium/High] ## Architecture Notes 🏗️ [High-level design observations and recommendations] ## Next Steps 1. **Immediate:** Address all 🔴 critical issues 2. **Before PR:** Consider implementing 🟡 important improvements 3. **Future:** Keep 🟢 suggestions for follow-up work ## Ready for PR Checklist - [ ] All critical issues resolved - [ ] Important improvements addressed or documented as tech debt - [ ] Code follows project conventions - [ ] Error handling is robust - [ ] Performance considerations reviewed ``` Generate this markdown summary at the end of your review to provide a clear, actionable roadmap for improving the code before submission. ================================================ FILE: .claude/commands/spec-driven-dev.md ================================================ You are a software development agent focused on creating simple, beautiful software through thoughtful specification. Your philosophy: the best code is often the code not written, and the clearest solution emerges from deep understanding of the problem. ## Core Principles 1. **Simplicity First**: Always favor the simplest solution that fully addresses the need 2. **Question Before Building**: Challenge whether features are truly necessary 3. **Iterative Clarity**: Start minimal, expand only when justified 4. **User-Centric**: Focus on actual user needs, not imagined ones ## Initial Context Check **Before starting any conversation, check for existing project documents:** 1. Look for files matching these patterns: - `*-requirements.md` - `*-design.md` - `*-implementation.md` 2. If documents exist, provide a brief summary: ``` I found existing project documents: 📋 **Requirements**: `-requirements.md` - Problem: [One sentence summary of the problem] - Solution: [One sentence summary of the minimal solution] - Status: [Approved/Draft/Needs Review] 🎨 **Design**: `-design.md` - Approach: [One sentence summary] - Components: [List main components] - Status: [Approved/Draft/Needs Review] 📝 **Implementation**: `-implementation.md` - Increments: [Number of increments planned] - Current: [Which increment we're on] - Status: [In Progress/Planned/Complete] Would you like to continue from [next phase] or revisit any existing work? ``` 3. **Assess current project state**: - If specification documents exist, provide status summary and offer to continue/revise - If project CLAUDE.md exists, incorporate its development practices and build commands - If in active development context, adapt workflow to complement existing work 4. **Integrate with established patterns**: - Use existing code style guidelines from project documentation - Respect established architectural patterns and service dependencies - Follow existing testing and build processes 3. If no documents exist, proceed with: "I don't see any existing project documents. Let's start by understanding the problem you're trying to solve." ## Development Process ### Phase 1: Problem Understanding & Requirements Before documenting anything, engage in discovery through focused, single questions: - What problem are we actually solving? - Who experiences this problem and how often? - What's the simplest possible solution? - What can we NOT build and still succeed? Build understanding iteratively—one question at a time—before creating documentation. Then create a focused requirements document: ```markdown # Requirements: [Feature Name] ## Problem Statement [One clear sentence describing the core problem] ## Minimal Solution [The simplest thing that could possibly work] ## Users & Use Cases - Primary User: [Who] needs [what] because [why] - Use Case: [Specific scenario with concrete example] ## Success Criteria - [ ] [Observable, measurable outcome] - [ ] [Another specific criterion] ## Non-Goals [What we're explicitly NOT doing and why] ## Status - Created: [Date] - Status: [Draft/Approved] - Next Step: Design specification ``` **Output**: Generate `-requirements.md` ### Phase 2: Design Specification Only after requirements approval, design the simplest viable solution: ```markdown # Design: [Feature Name] ## Approach [2-3 sentences on the solution strategy] ## Components [Only essential components, each with clear single responsibility] ### Component Name - Purpose: [One sentence] - Interface: [Minimal public API] - Dependencies: [What it needs, kept minimal] ## Data Flow [Simple diagram or description of how data moves] ## Error Handling [Only handle likely errors, fail fast for unexpected ones] ## What We're Not Doing [Complexity we're avoiding and why] ## Status - Created: [Date] - Status: [Draft/Approved] - Next Step: Implementation planning ``` **Output**: Generate `-design.md` ### Phase 3: Implementation Roadmap Break work into small, complete increments: ```markdown # Implementation: [Feature Name] ## Increments Each increment should be shippable and add value. ### Increment 1: [Core Functionality] - [ ] Task: [Specific, small change] - Files: [What to modify] - Validates: [Which requirement] - Complete when: [Definition of done] ### Increment 2: [Enhancement] [Only if truly needed after Increment 1 is live] ## Status - Created: [Date] - Current Increment: [1/2/etc] - Overall Progress: [Not Started/In Progress/Complete] ``` **Output**: Generate `-implementation.md` ## Working Method ### Conversation Style **Always ask one focused question at a time.** This helps users think clearly and provide specific answers without feeling overwhelmed. Build understanding iteratively through a natural conversation. ### During Problem Understanding: 1. **Ask "Why?" repeatedly**: Get to the root need 2. **Challenge scope**: "Do we really need this?" 3. **Seek the 80/20**: What 20% of effort delivers 80% of value? 4. **Consider alternatives**: Including non-technical solutions 5. **Define "good enough"**: Perfect is the enemy of done ### During Design: 1. **Start with the naive approach**: Why won't the simple solution work? 2. **Add complexity only when forced**: Document why it's necessary 3. **Design for deletion**: Make components easy to remove 4. **Embrace constraints**: They force creative simplicity 5. **Show your work**: Explain rejected alternatives ### During Planning: 1. **First make it work**: Function before form 2. **Then make it right**: Refactor with working tests 3. **Finally, only if needed, make it fast**: Measure first 4. **Each step deployable**: No long-running branches 5. **Learn and adjust**: Each increment informs the next ## Deliverables ### File Naming Convention For each project, generate three markdown files with consistent naming: 1. **Requirements**: `-requirements.md` 2. **Design**: `-design.md` 3. **Implementation**: `-implementation.md` Where `` is a kebab-case identifier (e.g., `user-notifications`, `order-tracking`, `auth-refresh`). ### Phase Transitions #### When starting fresh: - After completing requirements discovery, say: "I'll now create the requirements document as `-requirements.md`" - After requirements approval, say: "Great! I'll now create the design document as `-design.md`" - After design approval, say: "Excellent! I'll create the implementation plan as `-implementation.md`" #### When resuming existing work: - If only requirements exist: "I've reviewed the requirements in `-requirements.md`. Shall we proceed with the design phase?" - If requirements and design exist: "I've reviewed both requirements and design documents. Ready to create the implementation plan?" - If all documents exist: "All project documents are in place. Would you like to review progress or continue with implementation?" ## Quality Markers Good specifications have: - **Brevity**: If it's longer than a page, it's probably too complex - **Clarity**: A junior developer could implement it - **Focus**: Solves one problem well - **Flexibility**: Doesn't over-specify implementation details - **Justification**: Every complexity has a reason ## Question Flow Examples ### Understanding the Problem Instead of asking multiple questions at once, follow this pattern: 1. "What specific problem led you to need this feature?" 2. [After response] "Who is experiencing this problem?" 3. [After response] "How frequently does this occur?" 4. [After response] "What happens currently when they encounter this?" 5. [After response] "What would success look like for them?" ### Challenging Complexity When complexity creeps in: 1. "What's the core need behind that requirement?" 2. [After response] "Could we achieve 80% of the value with something simpler?" 3. [After response] "What would break if we didn't include that?" 4. [After response] "Is that risk acceptable for the first version?" ### Design Decisions When exploring solutions: 1. "What's the most straightforward way to solve this?" 2. [After response] "What makes you think that won't work?" 3. [After response] "Is that constraint real or assumed?" 4. [After response] "What if we tried [simpler alternative]?" ## Example Interactions ### Starting Fresh **User**: "We need a notification system that..." **Agent**: "Before we dive into specifications, let me understand the core need. What specific problem are notifications solving for your users?" [Continue with single questions as before...] ### Resuming Existing Work **User**: "Can you help me with the notification system?" **Agent**: "I found existing project documents: 📋 **Requirements**: `delivery-notifications-requirements.md` - Problem: Customers miss important delivery updates - Solution: SMS alerts 30 minutes before delivery - Status: Approved Would you like to continue with the design phase, or should we revisit the requirements?" **User**: "Let's continue with design." **Agent**: "Great! Based on the approved requirements, I'll now create the design document as `delivery-notifications-design.md`. Let me start with the simplest approach..." ## Remember - The best feature is often the one we don't build - Simple solutions require deep understanding - Every line of code is a liability - When in doubt, do less but do it well - Beautiful software is simple software that delights users The goal isn't to create comprehensive documentation—it's to think deeply enough that the solution becomes obvious and simple. ================================================ FILE: .claude/settings.local.json ================================================ { "permissions": { "allow": [ "WebFetch(domain:github.com)", "Bash(R CMD Rd2pdf:*)", "Bash(git pull:*)", "Bash(git stash:*)" ], "deny": [] } } ================================================ FILE: .github/.gitignore ================================================ *.html ================================================ FILE: .github/CODE_OF_CONDUCT.md ================================================ # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at codeofconduct@posit.co. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at . Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][https://github.com/mozilla/inclusion]. For answers to common questions about this code of conduct, see the FAQ at . Translations are available at . [homepage]: https://www.contributor-covenant.org ================================================ FILE: .github/CONTRIBUTING.md ================================================ # Contributing to mvgam This document outlines how to propose a change to mvgam. For a detailed discussion on contributing to this and other open source R packages, please see the [development contributing guide](https://rstd.io/tidy-contrib) and our [code review principles](https://code-review.tidyverse.org/). ## Fixing typos You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file. This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file. You can find the `.R` file that generates the `.Rd` by reading the comment in the first line. ## Bigger changes If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it’s needed. If you’ve found a bug, please file an issue that illustrates the bug with a minimal [reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed). See the tidyverse guide on [how to create a great issue](https://code-review.tidyverse.org/issues/) for more advice. ### Pull request process * Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("nicholasjclark/mvgam", fork = TRUE)`. * Install all development dependencies with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. * Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. * Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. The title of your PR should briefly describe the change. The body of your PR should contain `Fixes #issue-number`. * For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in . ### Code style * New code should follow the tidyverse [style guide](https://style.tidyverse.org) where possible. You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. * We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd-formatting.html), for documentation. * We use [testthat](https://cran.r-project.org/package=testthat) for unit tests. Contributions with test cases included are easier to accept. ## Code of Conduct Please note that the mvgam project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project you agree to abide by its terms. ## Roadmap The mvgam package is in a stable state of development, with some degree of active subsequent development as envisioned by the primary authors. ================================================ FILE: .github/FUNDING.yml ================================================ github: nicholasjclark ================================================ FILE: .github/workflows/R-CMD-check-rstan.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/master/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] name: R-CMD-check-rstan jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} name: ${{ matrix.config.os }} (${{ matrix.config.r }}) strategy: fail-fast: false matrix: config: - {os: macos-latest, r: 'release'} - {os: windows-latest, r: 'release'} # use 4.0 or 4.1 to check with rtools40's older compiler - {os: windows-latest, r: 'oldrel-4'} - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} # Use a CRAN-like environment to emulate CRAN submission checks env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} NOT_CRAN: false R_KEEP_PKG_SOURCE: yes steps: - uses: actions/checkout@v4 - uses: n1hility/cancel-previous-runs@v2 with: token: ${{ secrets.GITHUB_TOKEN }} - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: r-version: ${{ matrix.config.r }} rtools-version: ${{ matrix.config.rtools }} http-user-agent: ${{ matrix.config.http-user-agent }} use-public-rspm: true # Install some suggests packages - uses: r-lib/actions/setup-r-dependencies@v2 with: dependencies: NA extra-packages: | BH RcppEigen knitr extraDistr lubridate wrswoR tweedie corpcor splines2 ggrepel ggpp ggarrow scoringRules matrixStats xts collapse rmarkdown ggplot2 rjags coda testthat usethis rcmdcheck - name: Ensure install works run: | install.packages('mvgam', repos = "http://cran.rstudio.com") shell: Rscript {0} - uses: r-lib/actions/check-r-package@v2 with: build_args: 'c("--no-manual", "--no-build-vignettes")' args: 'c("--no-examples", "--no-manual", "--as-cran", "--ignore-vignettes")' ================================================ FILE: .github/workflows/R-CMD-check.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/master/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] name: R-CMD-check jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} name: ${{ matrix.config.os }} (${{ matrix.config.r }}) strategy: fail-fast: false matrix: config: # Cmdstan isntall not working on win-latest release; check back later - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} R_KEEP_PKG_SOURCE: yes steps: - uses: actions/checkout@v4 - uses: n1hility/cancel-previous-runs@v2 with: token: ${{ secrets.GITHUB_TOKEN }} - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: r-version: ${{ matrix.config.r }} rtools-version: ${{ matrix.config.rtools }} http-user-agent: ${{ matrix.config.http-user-agent }} use-public-rspm: true # As cmdstanr is not yet on CRAN, configure the action to only install the # 'Depends' packages by default and then manually specify the 'Suggests' # packages that are needed for R CMD CHECK - uses: r-lib/actions/setup-r-dependencies@v2 with: dependencies: NA extra-packages: | knitr extraDistr lubridate wrswoR tweedie corpcor splines2 scoringRules matrixStats xts collapse rmarkdown ggplot2 rjags ggrepel ggpp ggarrow coda stan-dev/cmdstanr testthat usethis rcmdcheck devtools - name: Build Cmdstan run: | install.packages('mvgam', repos = "http://cran.rstudio.com") cmdstanr::check_cmdstan_toolchain(fix = TRUE) cmdstanr::install_cmdstan() shell: Rscript {0} - name: Install colorspace manually run: | install.packages("colorspace", repos = "https://cran.rstudio.com/") shell: Rscript {0} - uses: r-lib/actions/check-r-package@v2 with: build_args: 'c("--no-manual", "--no-build-vignettes")' args: 'c("--no-manual", "--as-cran", "--ignore-vignettes")' - name: Run dontrun examples run: | devtools::run_examples(run_dontrun = TRUE, fresh = FALSE) shell: Rscript {0} ================================================ FILE: .github/workflows/memcheck.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/master/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] name: memcheck jobs: memcheck: runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} NOT_CRAN: false steps: - uses: actions/checkout@v2 - uses: n1hility/cancel-previous-runs@v2 with: token: ${{ secrets.GITHUB_TOKEN }} - uses: actions/checkout@v2 - name: apt install dependency run: | sudo apt-get update sudo apt-get -y install valgrind - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: dependencies: NA extra-packages: | devtools testthat - name: Memory check run: | R -d valgrind -f memcheck.R - name: Configure GH run: | git config --global user.name "$GITHUB_ACTOR" git config --global user.email "$GITHUB_ACTOR@users.noreply.github.com" - name: Install Air latest shell: bash run: | curl -LsSf https://github.com/posit-dev/air/releases/latest/download/air-installer.sh | sh - name: Air version shell: bash run: | echo "" echo "Formatting R code with $(air --version)" echo "" - name: Format R code using Air shell: bash run: air format . - name: Commit any Air formatting changes shell: bash run: | if find . -type f \( -name '*.r' -o -name '*.R' \) -exec git add -u {} +; then echo "Staged modified R files" git commit -a -m '`air format` (GitHub Actions)' git push else echo "No changes found in any R files" fi ================================================ FILE: .github/workflows/pkgdown.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/master/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] release: types: [published] workflow_dispatch: name: pkgdown jobs: pkgdown: runs-on: ubuntu-latest # Only restrict concurrency for non-PR jobs concurrency: group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} R_KEEP_PKG_SOURCE: yes steps: - uses: actions/checkout@v3 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: dependencies: NA extra-packages: | local::. remotes knitr extraDistr lubridate gratia wrswoR tweedie ggrepel ggpp ggarrow corpcor splines2 scoringRules matrixStats xts collapse rmarkdown stan-dev/cmdstanr usethis - name: Build Cmdstan and install development brms version run: | cmdstanr::check_cmdstan_toolchain(fix = TRUE) cmdstanr::install_cmdstan() remotes::install_github('paul-buerkner/brms') remotes::install_version("pkgdown", version = "2.0.9") shell: Rscript {0} - name: Build site run: pkgdown::build_site_github_pages(lazy = TRUE, run_dont_run = TRUE, new_process = FALSE, install = FALSE) shell: Rscript {0} - name: Deploy to GitHub pages 🚀 if: github.event_name != 'pull_request' uses: JamesIves/github-pages-deploy-action@v4.4.1 with: clean: false branch: gh-pages folder: docs ================================================ FILE: .github/workflows/readme.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/master/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: master name: render-rmarkdown jobs: render-rmarkdown: runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} steps: - name: Checkout repo uses: actions/checkout@v4 with: fetch-depth: 0 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: | rmarkdown knitr gratia patchwork remotes splines2 extraDistr stan-dev/cmdstanr - name: Install mvgam run: Rscript -e 'remotes::install_github("nicholasjclark/mvgam", upgrade_dependencies = FALSE)' - name: Build Cmdstan run: | cmdstanr::check_cmdstan_toolchain(fix = TRUE) cmdstanr::install_cmdstan() shell: Rscript {0} - name: Render README run: Rscript -e 'rmarkdown::render("README.Rmd", output_format = "md_document")' - name: Render pkgdown index run: Rscript -e 'rmarkdown::render("index.Rmd", output_format = "md_document")' - name: Commit results run: | git config --local user.name "$GITHUB_ACTOR" git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" git commit README.md index.md man/figures/README*.png -m 'Re-build README.Rmd' || echo "No changes to commit" git push origin || echo "No changes to commit" ================================================ FILE: .gitignore ================================================ *.Rproj* .Rhistory .RData .Ruserdata .Rprofile Meta .Rproj.user /Meta/ desktop.ini ^cran-comments\.md$ ^src\.gcda$ claude.exe ================================================ FILE: CLAUDE.md ================================================ # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Package Overview mvgam is an R package for fitting Multivariate Dynamic Generalized Additive Models.The package enables Bayesian forecasting and analysis of multivariate time series data using flexible GAM frameworks. It can handle various data types (counts, proportions, continuous values) with complex temporal dynamics, missing data, and seasonality, building custom Stan models that provide robust Bayesian inference. ## Development Commands ### Testing - `R CMD check` - Full package check (used in CI) - `testthat::test_check("mvgam")` - Run all tests via testthat - `devtools::test()` - Run tests interactively during development ### Building and Documentation - `devtools::document()` - Generate documentation from roxygen2 comments - `pkgdown::build_site()` - Build package website - `devtools::build()` - Build package tarball - `devtools::install()` - Install package locally for development ### Package Structure - Uses standard R package structure with DESCRIPTION, NAMESPACE, and man/ directories - Source code organized in `R/` directory with provider-specific files - Vignettes in vignettes/ directory demonstrate key features - Tests in `tests/testthat/` ## Architecture ### Key Design Patterns **S3 Type System**: Uses S3 for structured objects - Maintains compatibility with R's statistical modeling ecosystem - Supports method inheritance and specialization **Layered Architecture Pattern**: Uses clear separation of concerns across multiple layers: - Interface Layer: User-facing functions (mvgam(), forecast(), plot()) provide clean APIs - Model Specification Layer: Formula processing, trend model constructors (RW(), VAR(), GP()), family definitions - Code Generation Layer: Translates R specifications into Stan/JAGS model code - Computational Backend Layer: Interfaces with Stan/JAGS for MCMC sampling - Post-processing Layer: Methods for analysis, diagnostics, and visualization **Modular Component System**: Modular design where different components can be mixed and matched: - Trend Modules: Independent implementations of different temporal dynamics (Random Walk, AR, VAR, Gaussian Process, CAR) - Family Modules: Separate observation model implementations for different distributions - Backend Modules: Pluggable computational backends (Stan via rstan/cmdstanr, JAGS) - Visualization Modules: Modular plotting system with specialized functions for different aspects **Bayesian Workflow Integration Pattern**: Designed around the complete Bayesian modeling workflow: - Model Building: Formula specification, prior setup, trend model selection - Fitting: MCMC sampling with convergence monitoring - Checking: Posterior predictive checks, residual analysis, diagnostic plots - Inference: Parameter summarization, uncertainty quantification - Prediction: Forecasting with proper uncertainty propagation - Evaluation: Cross-validation, scoring rules, model comparison ## Key Files ### Core Model Functions - `R/mvgam.R` - Main model fitting function that: - Validates and processes GAM formulas for observation and trend processes - Sets up Stan/JAGS model code generation - Runs MCMC sampling and returns fitted model objects - Trend model constructors in `R/mvgam_trend_types.R` (`RW()`, `AR()`, `VAR()`, `GP()`, `CAR()`): - Define temporal dynamics specifications - Configure stationarity constraints and correlation structures ### Prediction & Forecasting - `R/forecast.mvgam.R` - Generates in-sample and out-of-sample forecasts: - Respects temporal dynamics for proper time series forecasting - Supports multiple prediction types (response, trend, link) - Returns structured forecast objects with uncertainty quantification - `R/predict.mvgam.R` - General prediction treating trends as random effects ### Visualization Suite - `R/plot.mvgam.R` - Main plotting method with multiple types: - Series plots, residual diagnostics, smooth functions, forecasts - Calls specialized functions: `plot_mvgam_forecasts()`, `plot_mvgam_series()`, `plot_mvgam_trend()` ### Model Analysis - `R/summary.mvgam.R` - Parameter estimates and convergence diagnostics - `R/ppc.mvgam.R` - Posterior predictive checks using bayesplot - `R/residuals.mvgam.R` - Dunn-Smyth residuals for model checking - `R/loo.mvgam.R` - Approximate leave-one-out cross-validation ### Family Support - Extensive distribution families in `R/families.R`: - Standard: gaussian, poisson, binomial, Gamma - Extended: negative binomial, beta, Student-t, Tweedie - Special: N-mixture models for imperfect detection ### Testing and Quality - `tests/testthat/` - Test suite - `vignettes/` - Documentation and examples - `.github/workflows/` - CI/CD with R CMD check, pkgdown building and valgrind check ## Development Notes ### Testing Strategy - Separate test files for each major component - Prioritize internal mvgam objects (i.e. `mvgam:::mvgam_example1`) for testing ### File Management - Specification documents (`*-requirements.md`, `*-design.md`, `*-implementation.md`) should be automatically added to `.Rbuildignore` - Any temporary development files should be excluded from package builds - When creating new specification files, always update `.Rbuildignore` to prevent inclusion in built package ### Code Organization - Provider files should follow consistent naming pattern - Utility functions should be grouped by purpose (`utils-*.R`) - Standalone imports should minimize external dependencies ### Documentation - Roxygen2 comments for all exported functions - tidyverse styling (https://style.tidyverse.org/) for all R and roxygen code - Vignettes demonstrate in-depth use cases - pkgdown site provides comprehensive documentation - Examples demonstrate simpler use cases ================================================ FILE: CRAN-SUBMISSION ================================================ Version: 1.1.0 Date: 2024-04-18 23:09:30 UTC SHA: 3d852f1f92b4d6d10ed64dc212fd6b0ebf933bca ================================================ FILE: DESCRIPTION ================================================ Package: mvgam Title: Multivariate (Dynamic) Generalized Additive Models Version: 1.1.595 Date: 2026-01-19 Authors@R: c(person("Nicholas J", "Clark", email = "nicholas.j.clark1214@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-7131-3301")), person("KANK", "Karunarathna", role = c("ctb"), comment = c("ARMA parameterisations and factor models", ORCID = "0000-0002-8995-5502")), person("Sarah", "Heaps", role = c("ctb"), comment = c("VARMA parameterisations", ORCID = "0000-0002-5543-037X")), person("Scott", "Pease", role = c("ctb"), comment = c("broom enhancements", ORCID = "0009-0006-8977-9285")), person("Matthijs", "Hollanders", role = c("ctb"), comment = c("ggplot visualizations", ORCID = "0000-0003-0796-1018"))) Description: Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) . URL: https://github.com/nicholasjclark/mvgam, https://nicholasjclark.github.io/mvgam/ BugReports: https://github.com/nicholasjclark/mvgam/issues License: MIT + file LICENSE Depends: R (>= 3.6.0) Imports: brms (>= 2.21.0), methods, mgcv (>= 1.8-13), insight (>= 0.19.1), marginaleffects (>= 0.29.0), Rcpp (>= 0.12.0), rstan (>= 2.29.0), posterior (>= 1.0.0), loo (>= 2.3.1), rstantools (>= 2.1.1), bayesplot (>= 1.5.0), ggplot2 (>= 3.5.0), mvnfast, purrr, dplyr, magrittr, rlang, generics, tibble (>= 3.0.0), patchwork (>= 1.2.0) Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.2 Suggests: scoringRules, matrixStats, cmdstanr (>= 0.5.0), tweedie, splines2, extraDistr, corpcor, wrswoR, ggrepel, ggpp, ggarrow, xts, lubridate, knitr, collapse, rmarkdown, rjags, coda, runjags, usethis, testthat, colorspace Enhances: gratia (>= 0.9.0), tidyr Additional_repositories: https://mc-stan.org/r-packages/ LinkingTo: Rcpp, RcppArmadillo VignetteBuilder: knitr ================================================ FILE: LICENSE ================================================ YEAR: 2021 COPYRIGHT HOLDER: Nicholas Clark ================================================ FILE: LICENSE.md ================================================ # MIT License Copyright (c) 2021 Nicholas Clark Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: NAMESPACE ================================================ # Generated by roxygen2: do not edit by hand S3method(Predict.matrix,mod.smooth) S3method(Predict.matrix,moi.smooth) S3method(add_residuals,mvgam) S3method(as.array,mvgam) S3method(as.data.frame,mvgam) S3method(as.matrix,mvgam) S3method(as_draws,mvgam) S3method(as_draws_array,mvgam) S3method(as_draws_df,mvgam) S3method(as_draws_list,mvgam) S3method(as_draws_matrix,mvgam) S3method(as_draws_rvars,mvgam) S3method(augment,mvgam) S3method(coef,mvgam) S3method(conditional_effects,mvgam) S3method(ensemble,mvgam_forecast) S3method(fevd,mvgam) S3method(find_predictors,mvgam) S3method(find_predictors,mvgam_prefit) S3method(fitted,mvgam) S3method(forecast,mvgam) S3method(formula,mvgam) S3method(formula,mvgam_prefit) S3method(get_coef,mvgam) S3method(get_data,mvgam) S3method(get_data,mvgam_prefit) S3method(get_predict,mvgam) S3method(get_vcov,mvgam) S3method(hindcast,mvgam) S3method(how_to_cite,mvgam) S3method(irf,mvgam) S3method(lfo_cv,mvgam) S3method(logLik,mvgam) S3method(log_posterior,mvgam) S3method(loo,mvgam) S3method(loo_compare,mvgam) S3method(mcmc_plot,mvgam) S3method(model.frame,mvgam) S3method(model.frame,mvgam_prefit) S3method(neff_ratio,mvgam) S3method(nuts_params,mvgam) S3method(ordinate,jsdgam) S3method(pairs,mvgam) S3method(plot,mvgam) S3method(plot,mvgam_conditional_effects) S3method(plot,mvgam_fevd) S3method(plot,mvgam_forecast) S3method(plot,mvgam_irf) S3method(plot,mvgam_lfo) S3method(plot,mvgam_residcor) S3method(posterior_epred,mvgam) S3method(posterior_linpred,mvgam) S3method(posterior_predict,mvgam) S3method(pp_check,mvgam) S3method(ppc,mvgam) S3method(predict,mvgam) S3method(print,how_to_cite) S3method(print,mvgam) S3method(print,mvgam_conditional_effects) S3method(print,mvgam_prefit) S3method(print,mvgam_summary) S3method(print,mvgammodel) S3method(residual_cor,jsdgam) S3method(residual_cor,mvgam) S3method(residuals,mvgam) S3method(rhat,mvgam) S3method(score,mvgam_forecast) S3method(set_coef,mvgam) S3method(smooth.construct,mod.smooth.spec) S3method(smooth.construct,moi.smooth.spec) S3method(stability,mvgam) S3method(stancode,mvgam) S3method(stancode,mvgam_prefit) S3method(standata,mvgam_prefit) S3method(summary,mvgam) S3method(summary,mvgam_fevd) S3method(summary,mvgam_forecast) S3method(summary,mvgam_irf) S3method(summary,mvgam_prefit) S3method(tidy,mvgam) S3method(update,jsdgam) S3method(update,mvgam) S3method(variables,mvgam) export("%>%") export(AR) export(CAR) export(GP) export(PW) export(RW) export(VAR) export(ZMVN) export(add_residuals) export(as_draws) export(as_draws_array) export(as_draws_df) export(as_draws_list) export(as_draws_matrix) export(as_draws_rvars) export(augment) export(avg_predictions) export(bernoulli) export(beta_binomial) export(betar) export(code) export(compare_mvgams) export(comparisons) export(conditional_effects) export(datagrid) export(drawDotmvgam) export(dynamic) export(ensemble) export(eval_mvgam) export(eval_smoothDothilbertDotsmooth) export(eval_smoothDotmodDotsmooth) export(eval_smoothDotmoiDotsmooth) export(fevd) export(forecast) export(get_data) export(get_mvgam_priors) export(get_predict) export(gp) export(hindcast) export(how_to_cite) export(hypotheses) export(irf) export(jsdgam) export(lfo_cv) export(lognormal) export(loo) export(loo_compare) export(lv_correlations) export(mcmc_plot) export(mvgam) export(nb) export(neff_ratio) export(nmix) export(nuts_params) export(ordinate) export(plot_comparisons) export(plot_mvgam_factors) export(plot_mvgam_fc) export(plot_mvgam_pterms) export(plot_mvgam_randomeffects) export(plot_mvgam_resids) export(plot_mvgam_series) export(plot_mvgam_smooth) export(plot_mvgam_trend) export(plot_mvgam_uncertainty) export(plot_predictions) export(plot_slopes) export(posterior_epred) export(posterior_linpred) export(posterior_predict) export(pp_check) export(ppc) export(predictions) export(prior) export(prior_) export(prior_string) export(residual_cor) export(rhat) export(roll_eval_mvgam) export(s) export(score) export(series_to_mvgam) export(set_prior) export(sim_mvgam) export(slopes) export(stability) export(stancode) export(standata) export(student) export(student_t) export(t2) export(te) export(ti) export(tidy) export(tweedie) export(variables) importFrom(Rcpp,evalCpp) importFrom(bayesplot,color_scheme_get) importFrom(bayesplot,color_scheme_set) importFrom(bayesplot,log_posterior) importFrom(bayesplot,neff_ratio) importFrom(bayesplot,nuts_params) importFrom(bayesplot,pp_check) importFrom(brms,bernoulli) importFrom(brms,beta_binomial) importFrom(brms,brm) importFrom(brms,brmsterms) importFrom(brms,conditional_effects) importFrom(brms,dbeta_binomial) importFrom(brms,do_call) importFrom(brms,dstudent_t) importFrom(brms,get_prior) importFrom(brms,gp) importFrom(brms,logm1) importFrom(brms,lognormal) importFrom(brms,mcmc_plot) importFrom(brms,ndraws) importFrom(brms,pbeta_binomial) importFrom(brms,prior) importFrom(brms,prior_) importFrom(brms,prior_string) importFrom(brms,pstudent_t) importFrom(brms,qstudent_t) importFrom(brms,rbeta_binomial) importFrom(brms,read_csv_as_stanfit) importFrom(brms,rstudent_t) importFrom(brms,set_prior) importFrom(brms,stancode) importFrom(brms,standata) importFrom(brms,student) importFrom(generics,augment) importFrom(generics,forecast) importFrom(generics,tidy) importFrom(ggplot2,aes) importFrom(ggplot2,facet_wrap) importFrom(ggplot2,geom_bar) importFrom(ggplot2,ggplot) importFrom(ggplot2,labs) importFrom(ggplot2,scale_colour_discrete) importFrom(ggplot2,scale_fill_discrete) importFrom(ggplot2,theme_classic) importFrom(grDevices,devAskNewPage) importFrom(grDevices,hcl.colors) importFrom(grDevices,rgb) importFrom(graphics,abline) importFrom(graphics,axis) importFrom(graphics,barplot) importFrom(graphics,box) importFrom(graphics,boxplot) importFrom(graphics,bxp) importFrom(graphics,hist) importFrom(graphics,layout) importFrom(graphics,legend) importFrom(graphics,lines) importFrom(graphics,par) importFrom(graphics,plot) importFrom(graphics,points) importFrom(graphics,polygon) importFrom(graphics,rect) importFrom(graphics,rug) importFrom(graphics,title) importFrom(grid,arrow) importFrom(grid,unit) importFrom(insight,find_predictors) importFrom(insight,get_data) importFrom(insight,get_predictors) importFrom(loo,is.loo) importFrom(loo,loo) importFrom(loo,loo_compare) importFrom(magrittr,"%>%") importFrom(marginaleffects,avg_predictions) importFrom(marginaleffects,comparisons) importFrom(marginaleffects,datagrid) importFrom(marginaleffects,get_coef) importFrom(marginaleffects,get_predict) importFrom(marginaleffects,get_vcov) importFrom(marginaleffects,hypotheses) importFrom(marginaleffects,plot_comparisons) importFrom(marginaleffects,plot_predictions) importFrom(marginaleffects,plot_slopes) importFrom(marginaleffects,predictions) importFrom(marginaleffects,set_coef) importFrom(marginaleffects,slopes) importFrom(methods,cbind2) importFrom(mgcv,Predict.matrix) importFrom(mgcv,Rrank) importFrom(mgcv,bam) importFrom(mgcv,betar) importFrom(mgcv,gam.control) importFrom(mgcv,gam.side) importFrom(mgcv,get.var) importFrom(mgcv,initial.sp) importFrom(mgcv,interpret.gam) importFrom(mgcv,nb) importFrom(mgcv,s) importFrom(mgcv,smooth.construct) importFrom(mgcv,smoothCon) importFrom(mgcv,t2) importFrom(mgcv,te) importFrom(mgcv,ti) importFrom(parallel,clusterExport) importFrom(parallel,setDefaultCluster) importFrom(parallel,stopCluster) importFrom(posterior,as_draws) importFrom(posterior,as_draws_array) importFrom(posterior,as_draws_df) importFrom(posterior,as_draws_list) importFrom(posterior,as_draws_matrix) importFrom(posterior,as_draws_rvars) importFrom(posterior,rhat) importFrom(posterior,variables) importFrom(rlang,missing_arg) importFrom(rlang,parse_expr) importFrom(rlang,warn) importFrom(rstantools,posterior_epred) importFrom(rstantools,posterior_linpred) importFrom(rstantools,posterior_predict) importFrom(stats,.getXlevels) importFrom(stats,Gamma) importFrom(stats,acf) importFrom(stats,as.dist) importFrom(stats,as.formula) importFrom(stats,binomial) importFrom(stats,coef) importFrom(stats,complete.cases) importFrom(stats,cor) importFrom(stats,cov) importFrom(stats,cov2cor) importFrom(stats,dbeta) importFrom(stats,dbinom) importFrom(stats,density) importFrom(stats,dgamma) importFrom(stats,dlnorm) importFrom(stats,dnbinom) importFrom(stats,dnorm) importFrom(stats,dpois) importFrom(stats,drop.terms) importFrom(stats,ecdf) importFrom(stats,fitted) importFrom(stats,formula) importFrom(stats,frequency) importFrom(stats,gaussian) importFrom(stats,hclust) importFrom(stats,is.ts) importFrom(stats,lag) importFrom(stats,lm) importFrom(stats,logLik) importFrom(stats,mad) importFrom(stats,make.link) importFrom(stats,median) importFrom(stats,model.frame) importFrom(stats,model.matrix) importFrom(stats,model.offset) importFrom(stats,na.fail) importFrom(stats,na.omit) importFrom(stats,na.pass) importFrom(stats,pacf) importFrom(stats,pbeta) importFrom(stats,pbinom) importFrom(stats,pgamma) importFrom(stats,plnorm) importFrom(stats,plogis) importFrom(stats,pnorm) importFrom(stats,poisson) importFrom(stats,ppois) importFrom(stats,predict) importFrom(stats,printCoefmat) importFrom(stats,qbinom) importFrom(stats,qcauchy) importFrom(stats,qlogis) importFrom(stats,qnorm) importFrom(stats,qqline) importFrom(stats,qqnorm) importFrom(stats,quantile) importFrom(stats,rbeta) importFrom(stats,rbinom) importFrom(stats,reformulate) importFrom(stats,residuals) importFrom(stats,rgamma) importFrom(stats,rlnorm) importFrom(stats,rnbinom) importFrom(stats,rnorm) importFrom(stats,rpois) importFrom(stats,runif) importFrom(stats,sd) importFrom(stats,setNames) importFrom(stats,start) importFrom(stats,terms) importFrom(stats,terms.formula) importFrom(stats,time) importFrom(stats,ts) importFrom(stats,update) importFrom(stats,update.formula) importFrom(utils,getFromNamespace) importFrom(utils,head) importFrom(utils,lsf.str) importFrom(utils,tail) useDynLib(mvgam, .registration = TRUE) ================================================ FILE: NEWS.md ================================================ # mvgam 1.1.595 ## New functionalities * Restructured `summary.mvgam()` to now return an object of class `mvgam_summary` that can be re-used for later purposes, or that can be printed with `print.mvgam_summary()` (#119) * Added a new function `ordinate.jsdgam()` to plot two-dimensional ordinations of site and species scores from latent factor models estimated in `jsdgam()` * `residual_cor()` now supports models fitted with `mvgam()` in which latent factors were used or in which correlated dynamic processes were used * Added a `summary.mvgam_forecast()` function to compute and return prediction intervals of posterior hindcasts and forecasts in a `data.frame` format. This will make it easier for users to create their own custom plots of hindcast and forecast distributions (#108) * Added a [`mvgam_use_cases`](https://nicholasjclark.github.io/mvgam/reference/mvgam_use_cases.html) help file to provide links to online resources that discuss how to use 'mvgam' in practice ## Changing defaults * The `forecast()` method is now imported from 'generics' to help avoid conflict issues with other forecasting packages * Deprecated the `incl_dynamics` argument in the `loo()` and `loo_compare()` functions to ensure better consistency in log-likelihood and resulting LOO estimates from models with different observation families * Changed default `type` in `conditional_effects()` to `expected` to match behaviour of 'brms' ## Bug fixes * Bug fix to ensure forecast scores are properly computed when plotting objects of class `mvgam_forecast` if only a single out-of-sample observation was included in `newdata` (#111) * Bug fix to ensure offsets supplied with `offset(...)` in formulae are correctly incorporated when using `gp()` terms * Bug fix to ensure piecewise trends are correctly predicted when using `process_error = TRUE` in `predict()` * Bug fix to ensure variance of continuous time autoregressive processes (using `CAR()`) scales appropriately with time lags (#107) * Bug fix to ensure `summary.mvgam()` uses the correct `max_treedepth` value when checking Stan diagnostics, rather than always assuming the default of 10 (thanks to @StefanoMezzini for reporting) * Bug fix to ensure `NA` residual values are handled properly when plotting residuals (this occurs because response values are allowed to be missing; (thanks to @StefanoMezzini for reporting) ## Deprecations * Altered the structure of objects of class `mvgam_forecast` so that the `train_times` and `test_times` slots now contain lists of length `n_series`. This allows for continuous time data to be better handled, where some series may have been sampled at different timepoints # mvgam 1.1.51 ## New functionalities * Changed default priors for scale parameters (i.e. process errors `"sigma"` and observation errors `"sigma_obs"`) to inverse gammas to provide more sensible prior regularisation away from zero * Improved messaging in `summary()` for better guidance on how to investigate poor HMC sampler behaviours * Converted several more plotting functions to return `ggplot` objects in place of base R plots for broader customisation * Added four new `type`s to the `pp_check()` function to allow more targeted investigations of randomized quantile residual distributions * Added a `plot.mvgam_residcor()` function for nicer plotting of estimated residual correlations from `jsdgam` objects * Added `summary()` functions to calculate useful posterior summaries from objects of class `mvgam_irf` and `mvgam_fevd` (see `?irf` and `?fevd` for examples) * Improved efficiency of `nmix()` models with some slight restructuring of the model objects (#102) ## Bug fixes * Bug fix to ensure piecewise trends are extrapolated the correct number of timepoints when forecasting using the `forecast()` function # mvgam 1.1.4 ## New functionalities * Added the `how_to_cite.mvgam()` function to generate a scaffold methods description of fitted models, which can hopefully make it easier for users to fully describe their programming environment * Improved various plotting functions by returning `ggplot` objects in place of base plots (thanks to @mhollanders #38) * Added the brier score (`score = 'brier'`) as an option in `score.mvgam_forecast()` for scoring forecasts of binary variables when using `family = bernoulli()` (#80) * Added `augment()` function to add residuals and fitted values to an mvgam object's observed data (thanks to @swpease #83) * Added support for approximate `gp()` effects with more than one covariate and with different kernel functions (#79) * Added function `jsdgam()` to estimate Joint Species Distribution Models in which both the latent factors and the observation model components can include any of mvgam's complex linear predictor effects. Also added a function `residual_cor()` to compute residual correlation, covariance and precision matrices from `jsdgam` models. See `?mvgam::jsdgam` and `?mvgam::residual_cor` for details * Added a `stability.mvgam()` method to compute stability metrics from models fit with Vector Autoregressive dynamics (#21 and #76) * Added functionality to estimate hierarchical error correlations when using multivariate latent process models and when the data are nested among levels of a relevant grouping factor (#75); see `?mvgam::AR` for an example * Added `ZMVN()` error models for estimating Zero-Mean Multivariate Normal errors; convenient for working with non time-series data where latent residuals are expected to be correlated (such as when fitting Joint Species Distribution Models); see `?mvgam::ZMVN` for examples * Added a `fevd.mvgam()` method to compute forecast error variance decompositions from models fit with Vector Autoregressive dynamics (#21 and #76) ## Deprecations * Arguments `use_stan`, `jags_path`, `data_train`, `data_test`, `adapt_delta`, `max_treedepth` and `drift` have been removed from primary functions to streamline documentation and reflect the package's mission to deprecate 'JAGS' as a suitable backend. Both `adapt_delta` and `max_treedepth` should now be supplied in a named `list()` to the new argument `control` ## Bug fixes * Bug fix to ensure `marginaleffects::comparisons` functions appropriately recognise internal `rowid` variables * Updates to ensure `ensemble` provides appropriate weighting of forecast draws (#98) * Not necessarily a "bug fix", but this update removes several dependencies to lighten installation and improve efficiency of the workflow (#93) * Fixed a minor bug in the way `trend_map` recognises levels of the `series` factor * Bug fix to ensure `lfo_cv` recognises the actual times in `time`, just in case the user supplies data that doesn't start at `t = 1`. Also updated documentation to better reflect this * Bug fix to ensure `update.mvgam` captures any `knots` or `trend_knots` arguments that were passed to the original model call # mvgam 1.1.3 ## New functionalities * Allow intercepts to be included in process models when `trend_formula` is supplied. This breaks the assumption that the process has to be zero-centred, adding more modelling flexibility but also potentially inducing nonidentifiabilities with respect to any observation model intercepts. Thoughtful priors are a must for these models * Added `standata.mvgam_prefit`, `stancode.mvgam` and `stancode.mvgam_prefit` methods for better alignment with 'brms' workflows * Added 'gratia' to *Enhancements* to allow popular methods such as `draw()` to be used for 'mvgam' models if 'gratia' is already installed * Added an `ensemble.mvgam_forecast()` method to generate evenly weighted combinations of probabilistic forecast distributions * Added an `irf.mvgam()` method to compute Generalized and Orthogonalized Impulse Response Functions (IRFs) from models fit with Vector Autoregressive dynamics ## Deprecations * The `drift` argument has been deprecated. It is now recommended for users to include parametric fixed effects of "time" in their respective GAM formulae to capture any expected drift effects ## Bug fixes * Added a new check to ensure that exception messages are only suppressed by the `silent` argument if the user's version of 'cmdstanr' is adequate * Updated dependency for 'brms' to version >= '2.21.0' so that `read_csv_as_stanfit` can be imported, which should future-proof the conversion of 'cmdstanr' models to `stanfit` objects (#70) # mvgam 1.1.2 ## New functionalities * Added options for silencing some of the 'Stan' compiler and modeling messages using the `silent` argument in `mvgam()` * Moved a number of packages from 'Depends' to 'Imports' for simpler package loading and fewer potential masking conflicts * Improved efficiency of the model initialisation by tweaking parameters of the underlying 'mgcv' `gam` object's convergence criteria, resulting in much faster model setups * Added an option to use `trend_model = 'None'` in State-Space models, increasing flexibility by ensuring the process error evolves as white noise (#51) * Added an option to use the non-centred parameterisation for some autoregressive trend models, which speeds up mixing most of the time * Updated support for multithreading so that all observation families (apart from `nmix()`) can now be modeled with multiple threads * Changed default priors on autoregressive coefficients (AR1, AR2, AR3) to enforce stationarity, which is a much more sensible prior in the majority of contexts ## Bug fixes * Fixed a small bug that prevented `conditional_effects.mvgam()` from handling effects with three-way interactions # mvgam 1.1.1 ## New functionalities * Changed indexing of an internal c++ function after Prof Brian Ripley’s email: Dear maintainer, Please see the problems shown on https://cran.r-project.org/web/checks/check_results_mvgam.html. Please correct before 2024-05-22 to safely retain your package on CRAN. The CRAN Team # mvgam 1.1.0 * First release of `mvgam` to CRAN ================================================ FILE: R/RcppExports.R ================================================ # Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #' @noRd ar3_recursC <- function(drift, ar1, ar2, ar3, linpreds, errors, last_trends, h) { .Call(`_mvgam_ar3_recursC`, drift, ar1, ar2, ar3, linpreds, errors, last_trends, h) } #' @noRd var1_recursC <- function(A, linpreds, errors, drift, last_trends, h) { .Call(`_mvgam_var1_recursC`, A, linpreds, errors, drift, last_trends, h) } #' @noRd varma_recursC <- function(A, A2, A3, theta, linpreds, errors, drift, last_trends, h) { .Call(`_mvgam_varma_recursC`, A, A2, A3, theta, linpreds, errors, drift, last_trends, h) } ================================================ FILE: R/add_MACor.R ================================================ #' Function to add moving average processes and/or #' correlated process errors to an existing Stan model file #' When adding MA for univariate trends, 'error' needs to take same form #' as trend / LV (array[n] vector[n_lv]) so it can be #' extracted in the same way #' @noRd add_MaCor = function( model_file, model_data, data_train, data_test, add_ma = FALSE, add_cor = FALSE, trend_model = 'VAR1', drift = FALSE ) { if (inherits(trend_model, 'mvgam_trend')) { trend_char <- ma_cor_additions(validate_trend_model( trend_model ))$trend_model } else { trend_char <- trend_model } if (trend_char == 'ZMVN') { # Update transformed data if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) {} else { if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);' ) } else { model_file[grep('parameters {', model_file, fixed = TRUE)[1]] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '}\nparameters {' ) } } model_file <- readLines(textConnection(model_file), n = -1) # Update parameters block if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { model_file[grep('[n_lv] sigma;', model_file, fixed = TRUE)] <- paste0( model_file[grep('[n_lv] sigma;', model_file, fixed = TRUE)], '\n\n', '// correlated latent residuals\n', 'array[n] vector[n_lv] LV_raw;\n', 'cholesky_factor_corr[n_lv] L_Omega;' ) starts <- grep("matrix[n, n_lv] LV;", model_file, fixed = TRUE) - 1 ends <- starts + 1 model_file <- model_file[-(starts:ends)] } else { model_file[grep( 'vector[n_series] sigma;', model_file, fixed = TRUE )] <- paste0( 'vector[n_series] sigma;\n\n', '// correlated latent residuals\n', 'array[n] vector[n_series] trend_raw;\n', 'cholesky_factor_corr[n_series] L_Omega;' ) starts <- grep("matrix[n, n_series] trend;", model_file, fixed = TRUE) - 1 ends <- starts + 1 model_file <- model_file[-(starts:ends)] } model_file <- readLines(textConnection(model_file), n = -1) # Update transformed parameters block if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { model_file[grep('transformed parameters {', model_file, fixed = TRUE)] <- paste0( 'transformed parameters {\n', paste0( 'matrix[n, n_lv] LV;\n', '// LKJ form of covariance matrix\n', 'matrix[n_lv, n_lv] L_Sigma;' ) ) model_file[grep('// derived latent states', model_file, fixed = TRUE)] <- paste0( '// correlated residuals\n', '\nL_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'for (i in 1:n) {\n', 'LV[i, 1:n_lv] = to_row_vector(LV_raw[i]);\n', '}\n', '// derived latent states' ) } else { model_file[grep('transformed parameters {', model_file, fixed = TRUE)] <- paste0( 'transformed parameters {\n', paste0( 'matrix[n, n_series] trend;\n', '// LKJ form of covariance matrix\n', 'matrix[n_series, n_series] L_Sigma;' ) ) last <- grep('model {', model_file, fixed = TRUE) for (i in last:(last - 5)) { last <- i if (trimws(model_file[i]) != '}') {} else { break } } model_file[last] <- paste0( '// correlated residuals\n', '\nL_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'for (i in 1:n) {\n', 'trend[i, 1:n_series] = to_row_vector(trend_raw[i]);\n', '}\n}' ) } model_file <- readLines(textConnection(model_file), n = -1) # Update model block if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { starts <- grep( "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);", model_file, fixed = TRUE ) - 1 ends <- grep( "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]], sigma[j]);", model_file, fixed = TRUE ) + 2 model_file <- model_file[-(starts:ends)] model_file[starts] <- paste0( '// residual error correlations\n', 'L_Omega ~ lkj_corr_cholesky(2);\n', 'for (i in 1:n){\n', 'LV_raw[i] ~ multi_normal_cholesky(trend_mus[ytimes_trend[i, 1:n_lv]], L_Sigma);\n', '}\n', model_file[starts] ) } else { starts <- grep("// trend estimates", model_file, fixed = TRUE) ends <- grep( "trend[2:n, s] ~ normal(trend[1:(n - 1), s], sigma[s]);", model_file, fixed = TRUE ) + 1 model_file <- model_file[-(starts:ends)] model_file[starts] <- paste0( '// residual error correlations\n', 'L_Omega ~ lkj_corr_cholesky(2);\n', 'for (i in 1:n){\n', 'trend_raw[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);\n', '}\n', model_file[starts] ) } model_file <- readLines(textConnection(model_file), n = -1) # Update generated quantities if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { model_file[grep('// posterior predictions', model_file, fixed = TRUE)] <- paste0( '// computed error covariance matrix\n', 'cov_matrix[n_lv] Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', '// posterior predictions' ) } else { model_file[grep('// posterior predictions', model_file, fixed = TRUE)] <- paste0( '// computed error covariance matrix\n', 'cov_matrix[n_series] Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', '// posterior predictions' ) } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_char %in% c('RW', 'AR1', 'AR2', 'AR3')) { if (any(grepl('ytimes_trend', model_file))) { remove_trendmus <- FALSE } else { remove_trendmus <- TRUE } # Update transformed data if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);' ) } else { model_file[grep('parameters {', model_file, fixed = TRUE)[1]] <- paste0( 'transformed data {\n', 'vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);\n', '}\nparameters {' ) } } else { if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);' ) } else { model_file[grep('parameters {', model_file, fixed = TRUE)[1]] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '}\nparameters {' ) } } model_file <- readLines(textConnection(model_file), n = -1) # Update parameters block if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { if (add_cor) { model_file[grep('[n_lv] sigma;', model_file, fixed = TRUE)] <- paste0( model_file[grep('[n_lv] sigma;', model_file, fixed = TRUE)], '\n', 'cholesky_factor_corr[n_lv] L_Omega;' ) } model_file[grep('matrix[n, n_lv] LV;', model_file, fixed = TRUE)] <- paste0( 'matrix[n, n_lv] LV;\n', if (add_ma) { paste0( '// ma coefficients\n', if (add_cor) { 'matrix[n_lv, n_lv] theta;' } else { 'vector[n_lv] theta;' } ) } else { NULL }, '\n// dynamic error parameters\n', 'vector[n_lv] error[n];' ) model_file <- readLines(textConnection(model_file), n = -1) end <- grep('matrix[n, n_lv] LV;', model_file, fixed = TRUE) start <- end - 1 model_file <- model_file[-c(start:end)] } else { if (add_cor) { model_file[grep( 'vector[n_series] sigma;', model_file, fixed = TRUE )] <- paste0( 'vector[n_series] sigma;\n', 'cholesky_factor_corr[n_series] L_Omega;' ) } model_file[grep( 'matrix[n, n_series] trend;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] trend;\n', if (add_ma) { paste0( '// ma coefficients\n', if (add_cor) { 'matrix[n_series, n_series] theta;' } else { 'vector[n_series] theta;' } ) } else { NULL }, '\n// dynamic error parameters\n', 'vector[n_series] error[n];' ) model_file <- readLines(textConnection(model_file), n = -1) end <- grep('matrix[n, n_series] trend;', model_file, fixed = TRUE) start <- end - 1 model_file <- model_file[-c(start:end)] } model_file <- readLines(textConnection(model_file), n = -1) # Update transformed parameters if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { model_file[grep( 'matrix[n, n_series] trend;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] trend;\n', if (add_cor) { paste0( 'vector[n_lv] LV[n];\n', if (add_ma) { 'vector[n_lv] epsilon[n];\n' } else { NULL }, '// LKJ form of covariance matrix\n', 'matrix[n_lv, n_lv] L_Sigma;\n', '// computed error covariance matrix\n', 'cov_matrix[n_lv] Sigma;' ) } else { paste0( 'matrix[n, n_lv] LV;\n', if (add_ma) { 'matrix[n, n_lv] epsilon;' } else { NULL } ) } ) if (add_cor) { if (trend_char %in% c('AR1', 'RW')) { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'LV[1] = ', 'trend_mus[ytimes_trend[1, 1:n_lv]] + error[1];\n', if (add_ma) { 'epsilon[1] = error[1];\n' }, 'for (i in 2:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { '// full AR process\n' }, 'LV[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, 1:n_lv]] + ', if (trend_char == 'AR1') { 'ar1 .* ' } else { NULL }, '(LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1:n_lv]])', if (add_ma) { '+ epsilon[i] + error[i];\n' } else { '+ error[i];\n' }, '}\n' ) } if (trend_char == 'AR2') { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'LV[1] = ', 'trend_mus[ytimes_trend[1, 1:n_lv]] + error[1];\n', if (add_ma) { paste0( 'epsilon[1] = error[1];\n', 'epsilon[2] = theta * error[1];\n' ) } else { NULL }, 'LV[2] = ', if (drift) { 'drift + ' } else { NULL }, 'trend_mus[ytimes_trend[2, 1:n_lv]] + ', 'ar1 .* (LV[1] - trend_mus[ytimes_trend[1, 1:n_lv]]) + ', if (add_ma) { 'epsilon[2] + error[2];\n' } else { 'error[2];\n' }, 'for (i in 3:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { '// full AR process\n' }, 'LV[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, 1:n_lv]] + ', 'ar1 .* (LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1:n_lv]]) + ', 'ar2 .* (LV[i - 2] - trend_mus[ytimes_trend[i - 2, 1:n_lv]]) + ', if (add_ma) { 'epsilon[i] + error[i];\n' } else { 'error[i];\n' }, '}\n' ) } if (trend_char == 'AR3') { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'LV[1] = ', 'trend_mus[ytimes_trend[1, 1:n_lv]] + error[1];\n', if (add_ma) { paste0( 'epsilon[1] = error[1];\n', 'epsilon[2] = theta * error[1];\n', 'epsilon[3] = theta * error[2];\n' ) } else { NULL }, 'LV[2] = ', if (drift) { 'drift + ' } else { NULL }, 'trend_mus[ytimes_trend[2, 1:n_lv]] + ', 'ar1 .* (LV[1] - trend_mus[ytimes_trend[1, 1:n_lv]]) + ', if (add_ma) { 'epsilon[2] + error[2];\n' } else { 'error[2];\n' }, 'LV[3] = ', if (drift) { 'drift * 2 + ' } else { NULL }, 'trend_mus[ytimes_trend[3, 1:n_lv]] + ', 'ar1 .* (LV[2] - trend_mus[ytimes_trend[2, 1:n_lv]]) + ', 'ar2 .* (LV[1] - trend_mus[ytimes_trend[1, 1:n_lv]]) + ', if (add_ma) { 'epsilon[3] + error[3];\n' } else { 'error[3];\n' }, 'for (i in 4:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { '// full AR process\n' }, 'LV[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, 1:n_lv]] + ', 'ar1 .* (LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1:n_lv]]) + ', 'ar2 .* (LV[i - 2] - trend_mus[ytimes_trend[i - 2, 1:n_lv]]) + ', 'ar3 .* (LV[i - 3] - trend_mus[ytimes_trend[i - 3, 1:n_lv]]) + ', if (add_ma) { 'epsilon[i] + error[i];\n' } else { 'error[i];\n' }, '}\n' ) } } else { if (trend_char %in% c('AR1', 'RW')) { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'for(j in 1:n_lv){\n', 'LV[1, j] = ', 'trend_mus[ytimes_trend[1, j]] + error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'for(i in 2:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'LV[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, j]] + ', if (trend_char == 'AR1') { 'ar1[j] * ' } else { NULL }, '(LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } if (trend_char == 'AR2') { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'for(j in 1:n_lv){\n', 'LV[1, j] = ', 'trend_mus[ytimes_trend[1, j]] + error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'epsilon[2, j] = theta[j] * error[1, j];\n', 'LV[2, j] = ', if (drift) { 'drift[j] + ' } else { NULL }, 'trend_mus[ytimes_trend[1, j]] + ', 'ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]) + ', 'epsilon[2, j] + error[2, j];\n', 'for(i in 3:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'LV[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, j]] + ', 'ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ', 'ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]) + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } if (trend_char == 'AR3') { if (any(grep('// derived latent states', model_file, fixed = TRUE))) { to_modify <- grep( '// derived latent states', model_file, fixed = TRUE ) } else { to_modify <- grep( '// derived latent trends', model_file, fixed = TRUE ) } model_file[to_modify] <- paste0( '// derived latent states\n', 'for(j in 1:n_lv){\n', 'LV[1, j] = ', 'trend_mus[ytimes_trend[1, j]] + error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'epsilon[2, j] = theta[j] * error[1, j];\n', 'epsilon[3, j] = theta[j] * error[2, j];\n', 'LV[2, j] = ', if (drift) { 'drift[j] + ' } else { NULL }, 'trend_mus[ytimes_trend[2, j]] + ', 'ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]) + ', 'epsilon[2, j] + error[2, j];\n', 'LV[3, j] = ', if (drift) { 'drift[j] * 2 + ' } else { NULL }, 'trend_mus[ytimes_trend[1, j]] + ', 'ar1[j] * (LV[2, j] - trend_mus[ytimes_trend[2, j]]) + ', 'ar2[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]) + ', 'epsilon[3, j] + error[3, j];\n', 'for(i in 4:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'LV[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, 'trend_mus[ytimes_trend[i, j]] + ', 'ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ', 'ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]) + ', 'ar3[j] * (LV[i - 3, j] - trend_mus[ytimes_trend[i - 3, j]]) + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } } if (add_cor) { model_file[grep('lv_coefs = Z;', model_file, fixed = TRUE)] <- paste0( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', 'lv_coefs = Z;' ) } } else { model_file[grep('transformed parameters {', model_file, fixed = TRUE)] <- paste0( 'transformed parameters {\n', if (add_cor) { paste0( 'vector[n_series] trend_raw[n];\n', 'matrix[n, n_series] trend;\n', if (add_ma) { 'vector[n_series] epsilon[n];\n' } else { NULL }, '// LKJ form of covariance matrix\n', 'matrix[n_series, n_series] L_Sigma;\n', '// computed error covariance matrix\n', 'cov_matrix[n_series] Sigma;' ) } else { paste0( 'matrix[n, n_series] trend;\n', if (add_ma) { 'matrix[n, n_series] epsilon;' } else { NULL } ) } ) if (add_cor) { if (trend_char %in% c('AR1', 'RW')) { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\n// derived latent states\n', 'trend_raw[1] = ', 'error[1];\n', if (add_ma) { 'epsilon[1] = error[1];\n' } else { NULL }, 'for (i in 2:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { paste0('// full AR process\n') }, 'trend_raw[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, if (trend_char == 'AR1') { 'ar1 .* ' } else { NULL }, 'trend_raw[i - 1] + ', if (add_ma) { 'epsilon[i] + error[i];\n' } else { 'error[i];\n' }, '}\n' ) } if (trend_char == 'AR2') { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\n// derived latent states\n', 'trend_raw[1] = ', 'error[1];\n', if (add_ma) { paste0( 'epsilon[1] = error[1];\n', 'epsilon[2] = theta * error[1];\n' ) } else { NULL }, 'trend_raw[2] = ', if (drift) { 'drift + ' } else { NULL }, 'ar1 .* trend_raw[1] + ', if (add_ma) { 'epsilon[2] + error[2];\n' } else { 'error[2];\n' }, 'for (i in 3:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { '// full AR process\n' }, 'trend_raw[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, 'ar1 .* trend_raw[i - 1] + ', 'ar2 .* trend_raw[i - 2] + ', if (add_ma) { 'epsilon[i] + error[i];\n' } else { 'error[i];\n' }, '}\n' ) } if (trend_char == 'AR3') { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\n// derived latent states\n', 'trend_raw[1] = ', 'error[1];\n', if (add_ma) { paste0( 'epsilon[1] = error[1];\n', 'epsilon[2] = theta * error[1];\n', 'epsilon[3] = theta * error[2];\n' ) } else { NULL }, 'trend_raw[2] = ', if (drift) { 'drift + ' } else { NULL }, 'ar1 .* trend_raw[1] + ', if (add_ma) { 'epsilon[2] + error[2];\n' } else { 'error[2];\n' }, 'trend_raw[3] = ', if (drift) { 'drift * 2 + ' } else { NULL }, 'ar1 .* trend_raw[2] + ', 'ar2 .* trend_raw[1] + ', if (add_ma) { 'epsilon[3] + error[3];\n' } else { 'error[3];\n' }, 'for (i in 4:n) {\n', if (add_ma) { paste0( '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full ARMA process\n' ) } else { '// full AR process\n' }, 'trend_raw[i] = ', if (drift) { 'drift * (i - 1) + ' } else { NULL }, 'ar1 .* trend_raw[i - 1] + ', 'ar2 .* trend_raw[i - 2] + ', 'ar3 .* trend_raw[i - 3] + ', if (add_ma) { 'epsilon[i] + error[i];\n' } else { 'error[i];\n' }, '}\n' ) } } else { if (trend_char %in% c('AR1', 'RW')) { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\nfor(j in 1:n_series){\n', 'trend[1, j] = ', 'error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'for(i in 2:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'trend[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, if (trend_char == 'AR1') { 'ar1[j] * ' } else { NULL }, 'trend[i - 1, j] + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } if (trend_char == 'AR2') { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\nfor(j in 1:n_series){\n', 'trend[1, j] = ', 'error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'epsilon[2, j] = theta[j] * error[1, j];\n', 'trend[2, j] = ', if (drift) { 'drift[j] + ' } else { NULL }, 'ar1[j] * trend[1, j] + ', 'epsilon[2, j] + error[2, j];\n', 'for(i in 3:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'trend[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, 'ar1[j] * trend[i - 1, j] + ', 'ar2[j] * trend[i - 2, j] + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } if (trend_char == 'AR3') { if (any(grepl('= mu_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= mu_raw[', model_file, fixed = TRUE)) } else if (any(grepl('= b_raw[', model_file, fixed = TRUE))) { insert_line <- max(grep('= b_raw[', model_file, fixed = TRUE)) } model_file[insert_line] <- paste0( model_file[insert_line], '\nfor(j in 1:n_series){\n', 'trend[1, j] = ', 'error[1, j];\n', 'epsilon[1, j] = error[1, j];\n', 'epsilon[2, j] = theta[j] * error[1, j];\n', 'epsilon[3, j] = theta[j] * error[2, j];\n', 'trend[2, j] = ', if (drift) { 'drift[j] + ' } else { NULL }, 'ar1[j] * trend[1, j] + ', 'epsilon[2, j] + error[2, j];\n', 'trend[3, j] = ', if (drift) { 'drift[j] * 2 + ' } else { NULL }, 'ar1[j] * trend[2, j] + ', 'ar2[j] * trend[1, j] + ', 'epsilon[2, j] + error[2, j];\n', 'for(i in 4:n){\n', '// lagged error ma process\n', 'epsilon[i, j] = theta[j] * error[i-1, j];\n', '// full ARMA process\n', 'trend[i, j] = ', if (drift) { 'drift[j] * (i - 1) + ' } else { NULL }, 'ar1[j] * trend[i - 1, j] + ', 'ar2[j] * trend[i - 2, j] + ', 'ar3[j] * trend[i - 3, j] + ', 'epsilon[i, j] + error[i, j];\n', '}\n}' ) } } model_file <- readLines(textConnection(model_file), n = -1) if (add_cor) { last <- grep('model {', model_file, fixed = TRUE) for (i in last:(last - 5)) { last <- i if (trimws(model_file[i]) != '}') {} else { break } } model_file[last] <- paste0( '\nL_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', 'for (i in 1:n) {\n', 'trend[i, 1:n_series] = to_row_vector(trend_raw[i]);\n', '}\n}' ) } } model_file <- readLines(textConnection(model_file), n = -1) # Update model block if (any(grepl('[n_lv] sigma;', model_file, fixed = TRUE))) { if (any(grepl('LV[1, j] ~ normal', model_file, fixed = TRUE))) { start <- grep('LV[1, j] ~ normal', model_file, fixed = TRUE) - 1 end <- grep('LV[i, j] ~ normal', model_file, fixed = TRUE) + 2 } else { start <- grep('LV[1, 1:n_lv] ~ normal(', model_file, fixed = TRUE) - 1 first <- grep(':n, j] ~ normal(', model_file, fixed = TRUE) second <- grep('sigma[j]);', model_file, fixed = TRUE) end <- intersect(first, second) + 1 } model_file <- model_file[-c(start:end)] model_file[start] <- paste0( '// contemporaneous errors\n', if (add_cor) { paste0( 'L_Omega ~ lkj_corr_cholesky(2);\n', 'for(i in 1:n) {\n', 'error[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);\n', '}' ) } else { paste0( 'for(i in 1:n) {\n', 'error[i] ~ normal(trend_zeros, sigma);\n', '}' ) }, if (add_ma) { paste0( '\n// ma coefficients\n', if (add_cor) { paste0( 'for(i in 1:n_lv){\n', 'for(j in 1:n_lv){\n', 'if (i != j)\n', 'theta[i, j] ~ normal(0, 0.2);\n', '}\n}' ) } else { 'theta ~ normal(0, 0.2);' } ) } else { NULL }, '\n', model_file[start] ) } else { start <- grep( 'trend[1, 1:n_series] ~ normal(', model_file, fixed = TRUE ) - 1 first <- grep(':n, s] ~ normal(', model_file, fixed = TRUE) second <- grep('sigma[s]);', model_file, fixed = TRUE) end <- intersect(first, second) + 1 model_file <- model_file[-c(start:end)] model_file[start] <- paste0( '// contemporaneous errors\n', if (add_cor) { paste0( 'L_Omega ~ lkj_corr_cholesky(2);\n', 'for(i in 1:n) {\n', 'error[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);\n', '}' ) } else { paste0( 'for(i in 1:n) {\n', 'error[i] ~ normal(trend_zeros, sigma);\n', '}' ) }, if (add_ma) { paste0( '\n// ma coefficients\n', if (add_cor) { paste0( 'for(i in 1:n_series){\n', 'for(j in 1:n_series){\n', 'if (i != j)\n', 'theta[i, j] ~ normal(0, 0.2);\n', '}\n}' ) } else { 'theta ~ normal(0, 0.2);' } ) } else { NULL }, '\n', model_file[start] ) } if (remove_trendmus) { model_file <- gsub( 'trend_mus[ytimes_trend[1, 1:n_lv]] +', '', model_file, fixed = TRUE ) model_file <- gsub( 'trend_mus[ytimes_trend[i, 1:n_lv]] + ', '', model_file, fixed = TRUE ) model_file <- gsub( ' - trend_mus[ytimes_trend[i - 1, 1:n_lv]]', '', model_file, fixed = TRUE ) model_file <- gsub( ' - trend_mus[ytimes_trend[1, 1:n_lv]]', '', model_file, fixed = TRUE ) model_file <- gsub( ' - trend_mus[ytimes_trend[i - 2, 1:n_lv]]', '', model_file, fixed = TRUE ) model_file <- gsub( 'trend_mus[ytimes_trend[2, 1:n_lv]] + ', '', model_file, fixed = TRUE ) model_file <- gsub( 'trend_mus[ytimes_trend[3, 1:n_lv]] + ', '', model_file, fixed = TRUE ) model_file <- gsub( ' - trend_mus[ytimes_trend[2, 1:n_lv]]', '', model_file, fixed = TRUE ) model_file <- gsub( ' - trend_mus[ytimes_trend[i - 3, 1:n_lv]]', '', model_file, fixed = TRUE ) } model_file <- readLines(textConnection(model_file), n = -1) } if (grepl('VAR', trend_char) & add_ma) { # Only ma can be added for VAR models currently # Replace the reverse mapping function with the MA representation start <- grep( '/* Function to perform the reverse mapping*/', model_file, fixed = TRUE ) end <- grep('return phiGamma;', model_file, fixed = TRUE) + 1 model_file <- model_file[-c(start:end)] model_file[ grep( 'return mdivide_left_spd(sqrtm(B), P_real);', model_file, fixed = TRUE ) + 1 ] <- paste0( '}\n', '/* Function to compute Kronecker product */\n\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix kronecker_prod(matrix A, matrix B) {\n', 'matrix[rows(A) * rows(B), cols(A) * cols(B)] C;\n', 'int m = rows(A);\n', 'int n = cols(A);\n', 'int p = rows(B);\n', 'int q = cols(B);\n', 'for (i in 1:m) {\n', 'for (j in 1:n) {\n', 'int row_start = (i - 1) * p + 1;\n', 'int row_end = (i - 1) * p + p;\n', 'int col_start = (j - 1) * q + 1;\n', 'int col_end = (j - 1) * q + q;\n', 'C[row_start:row_end, col_start:col_end] = A[i, j] * B;\n', '}\n', '}\n', 'return C;\n', '}\n', '/* Function to perform the reverse mapping\n\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix[] rev_mapping(matrix[] P, matrix Sigma) {\n', 'int p = size(P);\n', 'int m = rows(Sigma);\n', 'matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];\n', 'matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];\n', 'matrix[m, m] S_for; matrix[m, m] S_rev;\n', 'matrix[m, m] S_for_list[p+1];\n', '// Step 1:\n', 'Sigma_for[p+1] = Sigma;\n', 'S_for_list[p+1] = sqrtm(Sigma);\n', 'for(s in 1:p) {\n', '// In this block of code S_rev is B^{-1} and S_for is a working matrix\n', 'S_for = - tcrossprod(P[p-s+1]);\n', 'for(i in 1:m) S_for[i, i] += 1.0;\n', 'S_rev = sqrtm(S_for);\n', 'S_for_list[p-s+1] = mdivide_right_spd(mdivide_left_spd(S_rev,\n', 'sqrtm(quad_form_sym(Sigma_for[p-s+2], S_rev))), S_rev);\n', 'Sigma_for[p-s+1] = tcrossprod(S_for_list[p-s+1]);\n', '}\n', '// Step 2:\n', 'Sigma_rev[1] = Sigma_for[1];\n', 'for(s in 0:(p-1)) {\n', 'S_for = S_for_list[s+1];\n', 'S_rev = sqrtm(Sigma_rev[s+1]);\n', 'phi_for[s+1, s+1] = mdivide_right_spd(S_for * P[s+1], S_rev);\n', "phi_rev[s+1, s+1] = mdivide_right_spd(S_rev * P[s+1]', S_for);\n", 'if(s>=1) {\n', 'for(k in 1:s) {\n', 'phi_for[s+1, k] = phi_for[s, k] - phi_for[s+1, s+1] * phi_rev[s, s-k+1];\n', 'phi_rev[s+1, k] = phi_rev[s, k] - phi_rev[s+1, s+1] * phi_for[s, s-k+1];\n', '}\n', '}\n', 'Sigma_rev[s+2] = Sigma_rev[s+1] - quad_form_sym(Sigma_for[s+1],\n', "phi_rev[s+1, s+1]');\n", '}\n', 'return phi_for[p];\n', '}\n', '/* Function to compute the joint (stationary) distribution of\n', '(y_0, ..., y_{1-p}, eps_0, ..., eps_{1-q})\n\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix initial_joint_var(matrix Sigma, matrix[] phi, matrix[] theta) {\n', 'int p = size(phi);\n', 'int q = size(theta);\n', 'int m = rows(Sigma);\n', 'matrix[(p+q)*m, (p+q)*m] companion_mat = rep_matrix(0.0, (p+q)*m, (p+q)*m);\n', 'matrix[(p+q)*m, (p+q)*m] companion_var = rep_matrix(0.0, (p+q)*m, (p+q)*m);\n', 'matrix[(p+q)*m*(p+q)*m, (p+q)*m*(p+q)*m] tmp = diag_matrix(rep_vector(1.0,\n', '(p+q)*m*(p+q)*m));\n', 'matrix[(p+q)*m, (p+q)*m] Omega;\n', '// Construct phi_tilde:\n', 'for(i in 1:p) {\n', 'companion_mat[1:m, ((i-1)*m+1):(i*m)] = phi[i];\n', 'if(i>1) {\n', 'for(j in 1:m) {\n', 'companion_mat[(i-1)*m+j, (i-2)*m+j] = 1.0;\n', '}\n', '}\n', '}\n', 'for(i in 1:q) {\n', 'companion_mat[1:m, ((p+i-1)*m+1):((p+i)*m)] = theta[i];\n', '}\n', 'if(q>1) {\n', 'for(i in 2:q) {\n', 'for(j in 1:m) {\n', 'companion_mat[(p+i-1)*m+j, (p+i-2)*m+j] = 1.0;\n', '}\n', '}\n', '}\n', '// Construct Sigma_tilde:\n', 'companion_var[1:m, 1:m] = Sigma;\n', 'companion_var[(p*m+1):((p+1)*m), (p*m+1):((p+1)*m)] = Sigma;\n', 'companion_var[1:m, (p*m+1):((p+1)*m)] = Sigma;\n', 'companion_var[(p*m+1):((p+1)*m), 1:m] = Sigma;\n', '// Compute Gamma0_tilde\n', 'tmp -= kronecker_prod(companion_mat, companion_mat);\n', "Omega = to_matrix(tmp \\ to_vector(companion_var), (p+q)*m, (p+q)*m);\n", '// Ensure Omega is symmetric:\n', 'for(i in 1:(rows(Omega)-1)) {\n', 'for(j in (i+1):rows(Omega)) {\n', 'Omega[j, i] = Omega[i, j];\n', '}\n', '}\n', 'return Omega;\n', '}\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Update transformed data if ( any(grepl( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE )) ) { model_file[grep( 'transformed data {', model_file, fixed = TRUE )] <- paste0( 'transformed data {\n', 'vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);\n', 'vector[n_lv*2] init_zeros = rep_vector(0.0, n_lv*2);\n' ) } else { model_file[grep( 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);', model_file, fixed = TRUE )] <- paste0( 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', 'vector[n_series*2] init_zeros = rep_vector(0.0, n_series*2);\n' ) } model_file <- readLines(textConnection(model_file), n = -1) # Update parameters if ( any(grepl( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE )) ) { model_file[grep( 'matrix[n_lv, n_lv] P_real;', model_file, fixed = TRUE )] <- paste0( 'matrix[n_lv, n_lv] P_real;\n', '// unconstrained MA partial autocorrelations\n', 'matrix[n_lv, n_lv] R_real;\n', '// initial joint stationary VARMA process\n', 'vector[2 * n_lv] init;\n', '// ma error parameters\n', 'vector[n_lv] error[n];' ) } else { model_file[grep( 'matrix[n_series, n_series] P_real;', model_file, fixed = TRUE )] <- paste0( 'matrix[n_series, n_series] P_real;\n', '// unconstrained MA partial autocorrelations\n', 'matrix[n_series, n_series] R_real;\n', '// initial joint stationary VARMA process\n', 'vector[2 * n_series] init;\n', '// ma error parameters\n', 'vector[n_series] error[n];' ) } # Update transformed parameters if ( any(grepl( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE )) ) { model_file[grep( 'matrix[n_lv, n_lv] A;', model_file, fixed = TRUE )] <- paste0( 'matrix[n_lv, n_lv] A;\n', '// latent trend MA autoregressive terms\n', 'matrix[n_lv, n_lv] theta;\n', '// ma process\n', 'array[n] vector[n_lv] epsilon;\n' ) end <- grep('vector[n_lv] LV[n];', model_file, fixed = TRUE) start <- end - 1 model_file <- model_file[-c(start:end)] model_file[grep( 'cov_matrix[n_lv] Gamma;', model_file, fixed = TRUE )] <- paste0( 'cov_matrix[n_lv * 2] Omega;\n', "// latent states\n", "vector[n_lv] LV[n];" ) start <- grep('// derived latent states', model_file, fixed = TRUE) end <- grep('Gamma = phiGamma[2, 1];', model_file, fixed = TRUE) + 1 model_file <- model_file[-c(start:end)] model_file[start] <- paste0( model_file[start], '\n', '// stationary VARMA reparameterisation\n', 'L_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', '{\n', '// constrained partial autocorrelations\n', 'matrix[n_lv, n_lv] P[1];\n', 'matrix[n_lv, n_lv] R[1];\n', '// stationary autoregressive coefficients\n', 'matrix[n_lv, n_lv] A_init[1];\n', 'matrix[n_lv, n_lv] theta_init[1];\n', 'P[1] = P_realtoP(P_real);\n', 'R[1] = P_realtoP(R_real);\n', '// stationary autoregressive and ma coef matrices\n', 'A_init = rev_mapping(P, Sigma);\n', 'theta_init = rev_mapping(R, Sigma);\n', 'theta_init[1] = -theta_init[1];\n', '// initial stationary covariance structure\n', 'Omega = initial_joint_var(Sigma, A_init, theta_init);\n', 'A = A_init[1];\n', 'theta = theta_init[1];\n', '}\n', '// computed VARMA trends\n', 'epsilon[1] = theta * init[(n_lv + 1) : (n_lv * 2)];\n', 'LV[1] = (A * init[1 : n_lv]) + trend_mus[ytimes_trend[1, 1 : n_lv]] + epsilon[1] + error[1];\n', 'for (i in 2 : n) {\n', '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full VARMA process\n', 'LV[i] = trend_mus[ytimes_trend[i, 1 : n_lv]] + A * (LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1 : n_lv]]) + epsilon[i] + error[i];\n', '}\n', '// derived latent states\n', 'lv_coefs = Z;\n', 'for (i in 1 : n) {\n', 'for (s in 1 : n_series) {\n', 'trend[i, s] = dot_product(lv_coefs[s, : ], LV[i]);\n', '}\n}' ) } else { model_file[grep( 'matrix[n_series, n_series] A;', model_file, fixed = TRUE )] <- paste0( 'matrix[n_series, n_series] A;\n', '// latent trend MA autoregressive terms\n', 'matrix[n_series, n_series] theta;\n', '// ma process\n', 'array[n] vector[n_series] epsilon;\n' ) start <- grep('// raw latent trends', model_file, fixed = TRUE) end <- start + 1 model_file <- model_file[-c(start:end)] start <- grep( '// trend estimates in matrix-form', model_file, fixed = TRUE ) end <- grep('Gamma = phiGamma[2, 1];', model_file, fixed = TRUE) + 1 model_file <- model_file[-c(start:end)] model_file[grep( 'cov_matrix[n_series] Gamma;', model_file, fixed = TRUE )] <- paste0( 'cov_matrix[n_series * 2] Omega;\n', '// raw latent trends\n', 'vector[n_series] trend_raw[n];\n', '// trend estimates in matrix-form\n', 'matrix[n, n_series] trend;' ) model_file[start] <- paste0( model_file[start], '\n', '// stationary VARMA reparameterisation\n', 'L_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', '{\n', '// constrained partial autocorrelations\n', 'matrix[n_series, n_series] P[1];\n', 'matrix[n_series, n_series] R[1];\n', '// stationary autoregressive coefficients\n', 'matrix[n_series, n_series] A_init[1];\n', 'matrix[n_series, n_series] theta_init[1];\n', 'P[1] = P_realtoP(P_real);\n', 'R[1] = P_realtoP(R_real);\n', '// stationary autoregressive and ma coef matrices\n', 'A_init = rev_mapping(P, Sigma);\n', 'theta_init = rev_mapping(R, Sigma);\n', 'theta_init[1] = -theta_init[1];\n', '// initial stationary covariance structure\n', 'Omega = initial_joint_var(Sigma, A_init, theta_init);\n', 'A = A_init[1];\n', 'theta = theta_init[1];\n', '}\n', '// computed VARMA trends\n', 'epsilon[1] = theta * init[(n_series + 1) : (n_series * 2)];\n', 'trend_raw[1] = (A * init[1 : n_series]) + epsilon[1] + error[1];\n', 'for (i in 2 : n) {\n', '// lagged error ma process\n', 'epsilon[i] = theta * error[i - 1];\n', '// full VARMA process\n', 'trend_raw[i] = (A * trend_raw[i - 1]) + epsilon[i] + error[i];\n', '}\n', '// computed trends in matrix form\n', 'for (i in 1 : n) {\n', 'trend[i, 1 : n_series] = to_row_vector(trend_raw[i]);\n', '}' ) } model_file <- readLines(textConnection(model_file), n = -1) # Update model if ( any(grepl( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE )) ) { start <- grep('// latent state mean parameters', model_file, fixed = TRUE) end <- start + 1 model_file <- model_file[-c(start:end)] model_file[grep('// latent state means', model_file, fixed = TRUE)] <- paste0( '// unconstrained ma inverse partial autocorrelations\n', 'diagonal(R_real) ~ std_normal();\n', 'for (i in 1 : n_lv) {\n', 'for (j in 1 : n_lv) {\n', 'if (i != j)\n', 'R_real[i, j] ~ std_normal();\n', '}\n', '}\n', '// initial joint stationary distribution\n', 'init ~ multi_normal(init_zeros, Omega);\n', '// correlated contemporaneous errors\n', 'for (i in 1 : n) {\n', 'error[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);\n', '}\n', '// latent state means' ) model_file <- readLines(textConnection(model_file), n = -1) end <- grep( '(LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1:n_lv]]);', model_file, fixed = TRUE ) + 1 start <- grep('// latent state means', model_file, fixed = TRUE) model_file <- model_file[-c(start:end)] start <- grep( 'LV[1] ~ multi_normal(trend_mus[ytimes_trend[1, 1:n_lv]], Gamma);', model_file, fixed = TRUE ) end <- max(grep('L_Sigma);', model_file, fixed = TRUE)) + 1 model_file <- model_file[-c(start:end)] } else { start <- grep('// latent trend mean parameters', model_file, fixed = TRUE) end <- start + 1 model_file <- model_file[-c(start:end)] model_file[grep('// trend means', model_file, fixed = TRUE)] <- paste0( '// unconstrained ma inverse partial autocorrelations\n', 'diagonal(R_real) ~ std_normal();\n', 'for (i in 1 : n_series) {\n', 'for (j in 1 : n_series) {\n', 'if (i != j)\n', 'R_real[i, j] ~ std_normal();\n', '}\n', '}\n', '// initial joint stationary distribution\n', 'init ~ multi_normal(init_zeros, Omega);\n', '// correlated contemporaneous errors\n', 'for (i in 1 : n) {\n', 'error[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);\n', '}\n', '// trend means' ) model_file <- readLines(textConnection(model_file), n = -1) start <- grep('// trend means', model_file, fixed = TRUE) end <- max(grep( 'trend_raw[i] ~ multi_normal_cholesky(mu[i - 1], L_Sigma);', model_file, fixed = TRUE )) + 1 model_file <- model_file[-c(start:end)] } model_file <- readLines(textConnection(model_file), n = -1) } # Now do any rearrangements needed for hierarchical correlations if (grepl('hiercor', validate_trend_model(trend_model))) { # Add the function to calculate a convex combination of correlation matrices if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', '/* Function to compute a partially pooled correlation matrix */\n', '/* https://discourse.mc-stan.org/t/hierarchical-prior-for-partial-pooling-on-correlation-matrices*/\n', 'matrix combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){', 'int dim = rows(local_chol_cor);\n', 'matrix[dim, dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);\n', 'matrix[dim, dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);\n', 'matrix[dim, dim] combined_chol_cor;\n', 'combined_chol_cor = cholesky_decompose(alpha * global_cor +\n', ' (1 - alpha) * local_cor);\n', 'return(combined_chol_cor);\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', '/* Function to compute a partially pooled correlation matrix */\n', '/* https://discourse.mc-stan.org/t/hierarchical-prior-for-partial-pooling-on-correlation-matrices*/\n', 'matrix combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){', 'int dim = rows(local_chol_cor);\n', 'matrix[dim, dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);\n', 'matrix[dim, dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);\n', 'matrix[dim, dim] combined_chol_cor;\n', 'combined_chol_cor = cholesky_decompose(alpha * global_cor +\n', ' (1 - alpha) * local_cor);\n', 'return(combined_chol_cor);\n', '}\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) # Add group information to data block model_file[grep('int n_series;', model_file, fixed = TRUE)] <- paste0( "int n_groups; // number of groups (correlations apply within grouping levels)\n", "int n_subgroups; // number of subgroups (units whose errors will be correlated)\n", "int n_series; // total number of unique series (n_groups * n_subgroups)\n", "array[n_groups, n_subgroups] int group_inds; // indices of group membership" ) model_file <- readLines(textConnection(model_file), n = -1) #### Changes for VAR models #### if (grepl('VAR', trend_char)) { if ( any(grepl( "cholesky_factor_corr[n_lv] L_Omega;", model_file, fixed = TRUE )) ) { use_lv <- TRUE } else { use_lv <- FALSE } #### Parameters #### # Need arrays of cholesky factors and partial autocorrelation matrices if (use_lv) { # Changes for State-Space models model_file[grep( "cholesky_factor_corr[n_lv] L_Omega;", model_file, fixed = TRUE )] <- paste0( 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;' ) model_file[grep( "matrix[n_lv, n_lv] P_real;", model_file, fixed = TRUE )] <- paste0( 'array[n_groups] matrix[n_subgroups, n_subgroups] P_real_group;' ) } else { # Changes for non State-Space models model_file[grep( "cholesky_factor_corr[n_series] L_Omega;", model_file, fixed = TRUE )] <- paste0( 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;' ) model_file[grep( "matrix[n_series, n_series] P_real;", model_file, fixed = TRUE )] <- paste0( 'array[n_groups] matrix[n_subgroups, n_subgroups] P_real_group;' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Transformed parameters #### # Need arrays of autocorrelation matrices, Gamma and Sigma matrices if (use_lv) { # Changes for State-Space models model_file[grep("matrix[n_lv, n_lv] A;", model_file, fixed = TRUE)] <- paste0( 'array[n_groups] matrix[n_subgroups, n_subgroups] A_group;\n', 'matrix[n_lv, n_lv] A;' ) model_file[grep("cov_matrix[n_lv] Sigma;", model_file, fixed = TRUE)] <- paste0( 'array[n_groups] cov_matrix[n_subgroups] Sigma_group;\n', "matrix[n_lv, n_lv] Sigma;" ) model_file[grep("cov_matrix[n_lv] Gamma;", model_file, fixed = TRUE)] <- paste0( 'array[n_groups] cov_matrix[n_subgroups] Gamma_group;\n', "matrix[n_lv, n_lv] Gamma;" ) model_file <- model_file[ -grep( 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);', model_file, fixed = TRUE ) ] model_file[grep( "L_Sigma = diag_pre_multiply(sigma, L_Omega);", model_file, fixed = TRUE )] <- paste0( '// derived group-level VAR covariance matrices\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', 'Sigma_group[g] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}\n' ) starts <- grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE ) + 1 ends <- grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE ) + 8 model_file <- model_file[-(starts:ends)] model_file[grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE )] <- paste0( '// stationary VAR reparameterisation\n', '{\n', "array[1] matrix[n_subgroups, n_subgroups] P;\n", "array[2, 1] matrix[n_subgroups, n_subgroups] phiGamma;\n", 'for (g in 1 : n_groups){\n', "P[1] = P_realtoP(P_real_group[g]);\n", "phiGamma = rev_mapping(P, Sigma_group[g]);\n", "A_group[g] = phiGamma[1, 1];\n", "Gamma_group[g] = phiGamma[2, 1];\n", "}\n\n", "// computed (full) VAR matrices\n", 'Sigma = rep_matrix(0, n_lv, n_lv);\n', 'Gamma = rep_matrix(0, n_lv, n_lv);\n', 'A = rep_matrix(0, n_lv, n_lv);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', 'Gamma[group_inds[g], group_inds[g]] = Gamma_group[g];\n', 'A[group_inds[g], group_inds[g]] = A_group[g];\n', '}\n', 'L_Sigma = cholesky_decompose(Sigma);\n', "}\n\n" ) } else { # Changes for non State-Space models model_file[grep( "matrix[n_series, n_series] A;", model_file, fixed = TRUE )] <- paste0( 'array[n_groups] matrix[n_subgroups, n_subgroups] A_group;\n', 'matrix[n_series, n_series] A;' ) model_file[grep( "cov_matrix[n_series] Sigma;", model_file, fixed = TRUE )] <- paste0( 'array[n_groups] cov_matrix[n_subgroups] Sigma_group;\n', "matrix[n_series, n_series] Sigma;" ) model_file[grep( "cov_matrix[n_series] Gamma;", model_file, fixed = TRUE )] <- paste0( 'array[n_groups] cov_matrix[n_subgroups] Gamma_group;\n', 'matrix[n_series, n_series] Gamma;' ) model_file <- model_file[ -grep( 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);', model_file, fixed = TRUE ) ] model_file[grep( "L_Sigma = diag_pre_multiply(sigma, L_Omega);", model_file, fixed = TRUE )] <- paste0( '// derived group-level VAR covariance matrices\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', 'Sigma_group[g] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}\n' ) starts <- grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE ) + 1 ends <- grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE ) + 8 model_file <- model_file[-(starts:ends)] model_file[grep( "// stationary VAR reparameterisation", model_file, fixed = TRUE )] <- paste0( '// stationary VAR reparameterisation\n', '{\n', "array[1] matrix[n_subgroups, n_subgroups] P;\n", "array[2, 1] matrix[n_subgroups, n_subgroups] phiGamma;\n", 'for (g in 1 : n_groups){\n', "P[1] = P_realtoP(P_real_group[g]);\n", "phiGamma = rev_mapping(P, Sigma_group[g]);\n", "A_group[g] = phiGamma[1, 1];\n", "Gamma_group[g] = phiGamma[2, 1];\n", "}\n\n", "// computed (full) VAR matrices\n", 'Sigma = rep_matrix(0, n_series, n_series);\n', 'Gamma = rep_matrix(0, n_series, n_series);\n', 'A = rep_matrix(0, n_series, n_series);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', 'A[group_inds[g], group_inds[g]] = A_group[g];\n', 'Gamma[group_inds[g], group_inds[g]] = Gamma_group[g];\n', '}\n', 'L_Sigma = cholesky_decompose(Sigma);\n', "}\n\n" ) } model_file <- readLines(textConnection(model_file), n = -1) #### Model #### model_file[grep( "L_Omega ~ lkj_corr_cholesky(2);", model_file, fixed = TRUE )] <- paste0( 'alpha_cor ~ beta(3, 2);\n', 'L_Omega_global ~ lkj_corr_cholesky(1);\n', 'for (g in 1 : n_groups){\n', 'L_deviation_group[g] ~ lkj_corr_cholesky(6);\n', '}' ) starts <- grep( "// unconstrained partial autocorrelations", model_file, fixed = TRUE ) + 1 ends <- grep( "// unconstrained partial autocorrelations", model_file, fixed = TRUE ) + 6 model_file <- model_file[-(starts:ends)] model_file[grep( "// unconstrained partial autocorrelations", model_file, fixed = TRUE )] <- paste0( 'for (g in 1 : n_groups){\n', 'diagonal(P_real_group[g]) ~ normal(Pmu[1], 1 / sqrt(Pomega[1]));\n', 'for (i in 1:n_subgroups) {\n', 'for (j in 1:n_subgroups) {\n', 'if(i != j) P_real_group[g, i, j] ~ normal(Pmu[2], 1 / sqrt(Pomega[2]));\n', '}\n}\n}' ) model_file <- readLines(textConnection(model_file), n = -1) } else { if (grepl('ZMVN', trend_char)) { #### Zero-mean multinormals #### if ( any(grepl( "matrix[n_series, n_lv] lv_coefs;", model_file, fixed = TRUE )) ) { use_lv <- TRUE } else { use_lv <- FALSE } #### Transformed data #### if (use_lv) { if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_subgroups] trend_zeros = rep_vector(0.0, n_subgroups);' ) } else { model_file[grep('parameters {', model_file, fixed = TRUE)[1]] <- paste0( 'transformed data {\n', 'vector[n_subgroups] trend_zeros = rep_vector(0.0, n_subgroups);\n', '}\nparameters {' ) } } else { model_file[grep( "vector[n_series] trend_zeros = rep_vector(0.0, n_series);", model_file, fixed = TRUE )] <- paste0( 'vector[n_subgroups] trend_zeros = rep_vector(0.0, n_subgroups);' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Parameters #### if (use_lv) { model_file <- model_file[ -grep( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep("// correlated latent residuals", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("array[n] vector[n_lv] LV_raw;", model_file, fixed = TRUE) ] model_file[grep("[n_lv] sigma;", model_file, fixed = TRUE)] <- paste0( model_file[grep("[n_lv] sigma;", model_file, fixed = TRUE)], '\n', '\n\n', '// correlation params and correlated errors per group\n', 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;\n', 'array[n] matrix[n_groups, n_subgroups] sub_error;' ) } else { model_file <- model_file[ -grep( 'cholesky_factor_corr[n_series] L_Omega;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep("// correlated latent residuals", model_file, fixed = TRUE) ] model_file <- model_file[ -grep( "array[n] vector[n_series] trend_raw;", model_file, fixed = TRUE ) ] model_file[grep( "vector[n_series] sigma;", model_file, fixed = TRUE )] <- paste0( 'vector[n_series] sigma;\n', '\n\n', '// correlation params and correlated errors per group\n', 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;\n', 'array[n] matrix[n_groups, n_subgroups] sub_error;' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Transformed parameters #### if (use_lv) { model_file <- model_file[ -grep('matrix[n_lv, n_lv] L_Sigma;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', model_file, fixed = TRUE ) ] model_file[grep( "// LKJ form of covariance matrix", model_file, fixed = TRUE )] <- paste0( '// reconstructed correlated errors\n', 'array[n] vector[n_lv] error;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', '\n', '// LKJ forms of covariance matrices\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;' ) model_file[grep( "// correlated residuals", model_file, fixed = TRUE )] <- paste0( '// derived error correlation and covariance matrices\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', '}\n', '// derived correlated errors\n', 'for (i in 1 : n){\n', "error[i] = to_vector(sub_error[i]');\n", '}\n' ) model_file[grep( "LV[i, 1:n_lv] = to_row_vector(LV_raw[i]);", model_file, fixed = TRUE )] <- "LV[i, 1:n_lv] = to_row_vector(error[i]);" } else { model_file <- model_file[ -grep( 'matrix[n_series, n_series] L_Sigma;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', model_file, fixed = TRUE ) ] model_file[grep( "// LKJ form of covariance matrix", model_file, fixed = TRUE )] <- paste0( '// reconstructed correlated errors\n', 'array[n] vector[n_series] error;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', '\n', '// LKJ forms of covariance matrices\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;' ) model_file[grep( "// correlated residuals", model_file, fixed = TRUE )] <- paste0( '// derived error correlation and covariance matrices\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', '}\n', '// derived correlated errors\n', 'for (i in 1 : n){\n', "error[i] = to_vector(sub_error[i]');\n", '}\n' ) model_file[grep( "trend[i, 1:n_series] = to_row_vector(trend_raw[i]);", model_file, fixed = TRUE )] <- "trend[i, 1:n_series] = to_row_vector(error[i]);" } model_file <- readLines(textConnection(model_file), n = -1) #### Model #### starts <- grep( "// residual error correlations", model_file, fixed = TRUE ) + 1 ends <- grep( "// residual error correlations", model_file, fixed = TRUE ) + 4 model_file <- model_file[-(starts:ends)] model_file[grep( "// residual error correlations", model_file, fixed = TRUE )] <- paste0( '// hierarchical latent error correlations\n', 'alpha_cor ~ beta(3, 2);\n', 'L_Omega_global ~ lkj_corr_cholesky(1);\n', 'for (g in 1 : n_groups){\n', 'L_deviation_group[g] ~ lkj_corr_cholesky(6);\n', '}\n', '\n', '// contemporaneous errors\n', 'for (i in 1 : n) {\n', 'for (g in 1 : n_groups){\n', 'to_vector(sub_error[i, g]) ~ multi_normal_cholesky(trend_zeros, L_Sigma_group[g]);\n', '}\n', '}' ) model_file <- readLines(textConnection(model_file), n = -1) #### Generated quantities #### if (use_lv) { model_file <- model_file[ -grep( "cov_matrix[n_lv] Sigma = multiply_lower_tri_self_transpose(L_Sigma);", model_file, fixed = TRUE ) ] model_file[grep( "// computed error covariance matrix", model_file, fixed = TRUE )] <- paste0( '// computed (full) error covariance matrix\n', 'matrix[n_lv, n_lv] Sigma;\n', 'Sigma = rep_matrix(0, n_lv, n_lv);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}' ) } else { model_file <- model_file[ -grep( "cov_matrix[n_series] Sigma = multiply_lower_tri_self_transpose(L_Sigma);", model_file, fixed = TRUE ) ] model_file[grep( "// computed error covariance matrix", model_file, fixed = TRUE )] <- paste0( '// computed (full) error covariance matrix\n', 'matrix[n_series, n_series] Sigma;\n', 'Sigma = rep_matrix(0, n_series, n_series);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}' ) } model_file <- readLines(textConnection(model_file), n = -1) } else { #### Random walk and AR models #### if (any(grepl("vector[n_lv] trend_zeros", model_file, fixed = TRUE))) { use_lv <- TRUE } else { use_lv <- FALSE } #### Transformed data #### if (use_lv) { model_file[grep( "vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);", model_file, fixed = TRUE )] <- paste0( 'vector[n_subgroups] trend_zeros = rep_vector(0.0, n_subgroups);' ) } else { model_file[grep( "vector[n_series] trend_zeros = rep_vector(0.0, n_series);", model_file, fixed = TRUE )] <- paste0( 'vector[n_subgroups] trend_zeros = rep_vector(0.0, n_subgroups);' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Parameters #### if (use_lv) { model_file <- model_file[ -grep( 'cholesky_factor_corr[n_lv] L_Omega;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep('// dynamic error parameters', model_file, fixed = TRUE) ] model_file <- model_file[ -grep("vector[n_lv] error[n];", model_file, fixed = TRUE) ] model_file[grep("[n_lv] sigma;", model_file, fixed = TRUE)] <- paste0( model_file[grep("[n_lv] sigma;", model_file, fixed = TRUE)], '\n', '\n\n', '// correlation params and dynamic error parameters per group\n', 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;\n', 'array[n] matrix[n_groups, n_subgroups] sub_error;' ) } else { model_file <- model_file[ -grep( 'cholesky_factor_corr[n_series] L_Omega;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep('// dynamic error parameters', model_file, fixed = TRUE) ] model_file <- model_file[ -grep("vector[n_series] error[n];", model_file, fixed = TRUE) ] model_file[grep( "vector[n_series] sigma;", model_file, fixed = TRUE )] <- paste0( 'vector[n_series] sigma;\n', '\n\n', '// correlation params and dynamic error parameters per group\n', 'cholesky_factor_corr[n_subgroups] L_Omega_global;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_deviation_group;\n', 'real alpha_cor;\n', 'array[n] matrix[n_groups, n_subgroups] sub_error;' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Transformed parameters #### if (use_lv) { model_file <- model_file[ -grep( '// computed error covariance matrix', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep('cov_matrix[n_lv] Sigma;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep('matrix[n_lv, n_lv] L_Sigma;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);', model_file, fixed = TRUE ) ] model_file[grep( "// LKJ form of covariance matrix", model_file, fixed = TRUE )] <- paste0( '// reconstructed correlated errors\n', 'array[n] vector[n_lv] error;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', '\n', '// LKJ forms of covariance matrices\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;' ) model_file[grep( "// derived latent states", model_file, fixed = TRUE )] <- paste0( '// derived error correlation and covariance matrices\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', '}\n', '// derived correlated errors\n', 'for (i in 1 : n){\n', "error[i] = to_vector(sub_error[i]');\n", '}\n', '// derived latent states' ) } else { model_file <- model_file[ -grep( '// computed error covariance matrix', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep('cov_matrix[n_series] Sigma;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep( 'matrix[n_series, n_series] L_Sigma;', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);', model_file, fixed = TRUE ) ] model_file[grep( "// LKJ form of covariance matrix", model_file, fixed = TRUE )] <- paste0( '// reconstructed correlated errors\n', 'array[n] vector[n_series] error;\n', 'array[n_groups] cholesky_factor_corr[n_subgroups] L_Omega_group;\n', '\n', '// LKJ forms of covariance matrices\n', 'array[n_groups] matrix[n_subgroups, n_subgroups] L_Sigma_group;' ) model_file[grep( "// derived latent states", model_file, fixed = TRUE )] <- paste0( '// derived error correlation and covariance matrices\n', 'for (g in 1 : n_groups){\n', 'L_Omega_group[g] = combine_cholesky(L_Omega_global, L_deviation_group[g], alpha_cor);\n', 'L_Sigma_group[g] = diag_pre_multiply(sigma[group_inds[g]], L_Omega_group[g]);\n', '}\n', '// derived correlated errors\n', 'for (i in 1 : n){\n', "error[i] = to_vector(sub_error[i]');\n", '}\n', '// derived latent states' ) } model_file <- readLines(textConnection(model_file), n = -1) #### Model #### starts <- grep("// contemporaneous errors", model_file, fixed = TRUE) + 1 ends <- grep("// contemporaneous errors", model_file, fixed = TRUE) + 4 model_file <- model_file[-(starts:ends)] model_file[grep( "// contemporaneous errors", model_file, fixed = TRUE )] <- paste0( '// hierarchical process error correlations\n', 'alpha_cor ~ beta(3, 2);\n', 'L_Omega_global ~ lkj_corr_cholesky(1);\n', 'for (g in 1 : n_groups){\n', 'L_deviation_group[g] ~ lkj_corr_cholesky(6);\n', '}\n', '\n', '// contemporaneous errors\n', 'for (i in 1 : n) {\n', 'for (g in 1 : n_groups){\n', 'to_vector(sub_error[i, g]) ~ multi_normal_cholesky(trend_zeros, L_Sigma_group[g]);\n', '}\n', '}' ) model_file <- readLines(textConnection(model_file), n = -1) #### Generated quantities #### if (use_lv) { model_file[grep( "// posterior predictions", model_file, fixed = TRUE )] <- paste0( '// computed (full) error covariance matrix\n', 'matrix[n_lv, n_lv] Sigma;\n', 'Sigma = rep_matrix(0, n_lv, n_lv);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}\n', '\n', '// posterior predictions' ) } else { model_file[grep( "// posterior predictions", model_file, fixed = TRUE )] <- paste0( '// computed (full) error covariance matrix\n', 'matrix[n_series, n_series] Sigma;\n', 'Sigma = rep_matrix(0, n_series, n_series);\n', 'for (g in 1 : n_groups){\n', 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);\n', '}\n', '\n', '// posterior predictions' ) } model_file <- readLines(textConnection(model_file), n = -1) } } #### Add grouping information to model_data #### model_data$group_inds <- matrix( 1:nlevels(data_train$series), nrow = nlevels(data_train[[trend_model$gr]]), ncol = nlevels(data_train[[trend_model$subgr]]), byrow = TRUE ) model_data$n_groups <- nlevels(data_train[[trend_model$gr]]) model_data$n_subgroups <- nlevels(data_train[[trend_model$subgr]]) } return(list(model_file = model_file, model_data = model_data)) } ================================================ FILE: R/add_base_dgam_lines.R ================================================ #' Dynamic GAM model file additions #' #' @noRd #' #' @param use_lv Logical (use latent variables or not?) #' #' @param stan Logical (convert existing model to a Stan model?) #' #' @param offset Logical (include an offset in the linear predictor?) #' #' @return A character string to add to the mgcv jagam model file add_base_dgam_lines <- function(use_lv, stan = FALSE, offset = FALSE) { if (stan) { if (use_lv) { add <- " ##insert data transformed data { // Number of non-zero lower triangular factor loadings // Ensures identifiability of the model - no rotation of factors int M; M = n_lv * (n_series - n_lv) + n_lv * (n_lv - 1) / 2 + n_lv; } parameters { // raw basis coefficients row_vector[num_basis] b_raw; // dynamic factors matrix[n, n_lv] LV_raw; // dynamic factor lower triangle loading coefficients vector[M] L; // smoothing parameters vector[n_sp] lambda; } transformed parameters { // GAM contribution to expectations (log scale) vector[total_obs] eta; // trends and dynamic factor loading matrix matrix[n, n_series] trend; matrix[n_series, n_lv] lv_coefs_raw; // basis coefficients row_vector[num_basis] b; // constraints allow identifiability of loadings for (i in 1:(n_lv - 1)) { for (j in (i + 1):(n_lv)){ lv_coefs_raw[i, j] = 0; } } { int index; index = 0; for (j in 1:n_lv) { for (i in j:n_series) { index = index + 1; lv_coefs_raw[i, j] = L[index]; } } } // derived latent trends for (i in 1:n){ for (s in 1:n_series){ trend[i, s] = dot_product(lv_coefs_raw[s,], LV_raw[i,]); } } eta = to_vector(b * X); } model { ##insert smooths // priors for smoothing parameters lambda ~ normal(5, 30); // priors for dynamic factor loading coefficients L ~ student_t(5, 0, 1); // dynamic factor estimates for (j in 1:n_lv) { LV_raw[1, j] ~ normal(0, 0.1); } for (j in 1:n_lv) { LV_raw[2:n, j] ~ normal(LV_raw[1:(n - 1), j], 0.1); } // likelihood functions for (i in 1:n) { for (s in 1:n_series) { if (y_observed[i, s]) y[i, s] ~ poisson_log(eta[ytimes[i, s]] + trend[i, s]); } } } generated quantities { matrix[n, n_lv] LV; matrix[n_series, n_lv] lv_coefs; vector[n_sp] rho; vector[n_lv] penalty; matrix[n, n_series] ypred; rho = log(lambda); penalty = rep_vector(100.0, n_lv); // Sign correct factor loadings and factors for(j in 1:n_lv){ if(lv_coefs_raw[j, j] < 0){ lv_coefs[,j] = -1 * lv_coefs_raw[,j]; LV[,j] = -1 * LV_raw[,j]; } else { lv_coefs[,j] = lv_coefs_raw[,j]; LV[,j] = LV_raw[,j]; } } // posterior predictions for(i in 1:n){ for(s in 1:n_series){ ypred[i, s] = poisson_log_rng(eta[ytimes[i, s]] + trend[i, s]); } } } " } else { add <- " ##insert data parameters { // raw basis coefficients row_vector[num_basis] b_raw; // latent trend variance parameters vector[n_series] sigma; // latent trends matrix[n, n_series] trend; // smoothing parameters vector[n_sp] lambda; } transformed parameters { // GAM contribution to expectations (log scale) vector[total_obs] eta; // basis coefficients row_vector[num_basis] b; eta = to_vector(b * X); } model { ##insert smooths // priors for smoothing parameters lambda ~ normal(5, 30); // priors for latent trend variance parameters sigma ~ exponential(2); // trend estimates for (s in 1:n_series) { trend[1, s] ~ normal(0, sigma[s]); } for (s in 1:n_series) { trend[2:n, s] ~ normal(trend[1:(n - 1), s], sigma[s]); } // likelihood functions for (i in 1:n) { for (s in 1:n_series) { if (y_observed[i, s]) y[i, s] ~ poisson_log(eta[ytimes[i, s]] + trend[i, s]); } } } generated quantities { vector[n_sp] rho; vector[n_series] tau; matrix[n, n_series] ypred; rho = log(lambda); for (s in 1:n_series) { tau[s] = pow(sigma[s], -2.0); } // posterior predictions for(i in 1:n){ for(s in 1:n_series){ ypred[i, s] = poisson_log_rng(eta[ytimes[i, s]] + trend[i, s]); } } } " } } else { if (use_lv) { add <- c( " #### Begin model #### model { ## GAM linear predictor eta <- X %*% b ## mean expectations for (i in 1:n) { for (s in 1:n_series) { mus[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s]) } } ## latent factors evolve as time series with penalised precisions; ## the penalty terms force any un-needed factors to evolve as flat lines for (j in 1:n_lv) { LV_raw[1, j] ~ dnorm(0, penalty[j]) } for (j in 1:n_lv) { LV_raw[2, j] ~ dnorm(drift[j] + ar1[j]*LV_raw[1, j], penalty[j]) } for (j in 1:n_lv) { LV_raw[3, j] ~ dnorm(drift[j]*2 + ar1[j]*LV_raw[2, j] + ar2[j]*LV_raw[1, j], penalty[j]) } for (i in 4:n) { for (j in 1:n_lv) { LV_raw[i, j] ~ dnorm(drift[j]*(i - 1) + ar1[j]*LV_raw[i - 1, j] + ar2[j]*LV_raw[i - 2, j] + ar3[j]*LV_raw[i - 3, j], penalty[j]) } } ## AR components for (s in 1:n_lv) { drift[s] ~ dnorm(0, 10) ar1[s] ~ dnorm(0, 10) ar2[s] ~ dnorm(0, 10) ar3[s] ~ dnorm(0, 10) } ## shrinkage penalties for each factor's precision parameter act to squeeze ## the entire factor toward a flat white noise process if supported by ## the data. The prior for individual factor penalties allows each factor to possibly ## have a relatively large penalty, which shrinks the prior for that factor's variance ## substantially. Penalties increase exponentially with the number of factors following ## Welty, Leah J., et al. Bayesian distributed lag models: estimating effects of particulate ## matter air pollution on daily mortality Biometrics 65.1 (2009): 282-291. pi ~ dunif(0, n_lv) X2 ~ dnorm(0, 1)T(0, ) # eta1 controls the baseline penalty eta1 ~ dunif(-1, 1) # eta2 controls how quickly the penalties exponentially increase eta2 ~ dunif(-1, 1) for (t in 1:n_lv) { X1[t] ~ dnorm(0, 1)T(0, ) l.dist[t] <- max(t, pi[]) l.weight[t] <- exp(eta2[] * l.dist[t]) l.var[t] <- exp(eta1[] * l.dist[t] / 2) * 1 theta.prime[t] <- l.weight[t] * X1[t] + (1 - l.weight[t]) * X2[] penalty[t] <- max(0.0001, theta.prime[t] * l.var[t]) } ## latent factor loadings: standard normal with identifiability constraints ## upper triangle of loading matrix set to zero for (j in 1:(n_lv - 1)) { for (j2 in (j + 1):n_lv) { lv_coefs_raw[j, j2] <- 0 } } ## positive constraints on loading diagonals for (j in 1:n_lv) { lv_coefs_raw[j, j] ~ dnorm(0, 1)T(0, 1); } ## lower diagonal free for (j in 2:n_lv) { for (j2 in 1:(j - 1)) { lv_coefs_raw[j, j2] ~ dnorm(0, 1)T(-1, 1); } } ## other elements also free for (j in (n_lv + 1):n_series) { for (j2 in 1:n_lv) { lv_coefs_raw[j, j2] ~ dnorm(0, 1)T(-1, 1); } } ## trend evolution depends on latent factors for (i in 1:n) { for (s in 1:n_series) { trend[i, s] <- inprod(lv_coefs_raw[s,], LV_raw[i,]) } } # sign-correct factor loadings and coefficients for (j in 1:n_lv){ if(lv_coefs[j,j] < 0){ lv_coefs[,j] <- -1 * lv_coefs_raw[,j] LV[,j] <- -1 * LV_raw[,j] } else { lv_coefs[,j] <- lv_coefs_raw[,j] LV[,j] <- LV_raw[,j] } } ## likelihood functions for (i in 1:n) { for (s in 1:n_series) { y[i, s] ~ dnegbin(rate[i, s], phi[s])T(, upper_bound[s]); rate[i, s] <- ifelse((phi[s] / (phi[s] + mus[i, s])) < min_eps, min_eps, (phi[s] / (phi[s] + mus[i, s]))) } } ## complexity penalising prior for the overdispersion parameter; ## where the likelihood reduces to a 'base' model (Poisson) unless ## the data support overdispersion for (s in 1:n_series) { phi[s] <- 1 / phi_inv[s] phi_inv[s] ~ dexp(5) } ## posterior predictions for (i in 1:n) { for (s in 1:n_series) { ypred[i, s] ~ dnegbin(rate[i, s], phi[s])T(, upper_bound[s]) } } ## GAM-specific priors" ) } else { add <- c( " #### Begin model #### model { ## GAM linear predictor eta <- X %*% b ## mean expectations for (i in 1:n) { for (s in 1:n_series) { mus[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s]) } } ## trend estimates for (s in 1:n_series) { trend[1, s] ~ dnorm(0, tau[s]) } for (s in 1:n_series) { trend[2, s] ~ dnorm(drift[s] + ar1[s]*trend[1, s], tau[s]) } for (s in 1:n_series) { trend[3, s] ~ dnorm(drift[s]*2 + ar1[s]*trend[2, s] + ar2[s]*trend[1, s], tau[s]) } for (i in 4:n) { for (s in 1:n_series){ trend[i, s] ~ dnorm(drift[s]*(i - 1) + ar1[s]*trend[i - 1, s] + ar2[s]*trend[i - 2, s] + ar3[s]*trend[i - 3, s], tau[s]) } } ## AR components for (s in 1:n_series){ drift[s] ~ dnorm(0, 10) ar1[s] ~ dnorm(0, 10) ar2[s] ~ dnorm(0, 10) ar3[s] ~ dnorm(0, 10) tau[s] <- pow(sigma[s], -2) sigma[s] ~ dexp(2)T(0.075, 5) } ## likelihood functions for (i in 1:n) { for (s in 1:n_series) { y[i, s] ~ dnegbin(rate[i, s], phi[s])T(, upper_bound[s]); rate[i, s] <- ifelse((phi[s] / (phi[s] + mus[i, s])) < min_eps, min_eps, (phi[s] / (phi[s] + mus[i, s]))) } } ## complexity penalising prior for the overdispersion parameter; ## where the likelihood reduces to a 'base' model (Poisson) unless ## the data support overdispersion for (s in 1:n_series) { phi[s] <- 1 / phi_inv[s] phi_inv[s] ~ dexp(5) } ## posterior predictions for (i in 1:n) { for (s in 1:n_series) { ypred[i, s] ~ dnegbin(rate[i, s], phi[s])T(, upper_bound[s]) } } ## GAM-specific priors" ) } } return(add) } ================================================ FILE: R/add_binomial.R ================================================ #' @noRd add_binomial = function( formula, model_file, model_data, data_train, data_test, family_char ) { # Add trial information if necessary if (family_char %in% c('binomial', 'beta_binomial')) { # Identify which variable in data represents the number of trials resp_terms <- as.character(terms(formula(formula))[[2]]) resp_terms <- resp_terms[-grepl('cbind', resp_terms)] trial_name <- resp_terms[2] # Pull the trials variable from the data and validate train_trials <- data_train[[trial_name]] if (any(is.na(train_trials))) { stop( paste0('variable ', trial_name, ' contains missing values'), call. = FALSE ) } if (any(is.infinite(train_trials))) { stop( paste0('variable ', trial_name, ' contains infinite values'), call. = FALSE ) } # Matrix of trials per series all_trials <- data.frame( series = as.numeric(data_train$series), time = data_train$time, trials = data_train[[trial_name]] ) %>% dplyr::arrange(time, series) # Same for data_test if (!is.null(data_test)) { if (!(exists(trial_name, where = data_test))) { stop( 'Number of trials must also be supplied in "newdata" for Binomial models', call. = FALSE ) } all_trials <- rbind( all_trials, data.frame( series = as.numeric(data_test$series), time = data_test$time, trials = data_test[[trial_name]] ) ) %>% dplyr::arrange(time, series) if (any(is.na(all_trials$trial)) | any(is.infinite(all_trials$trial))) { stop( paste0( 'Missing or infinite values found in ', trial_name, ' variable' ), call. = FALSE ) } } # Construct matrix of N-trials in the correct format so it can be # flattened into one long vector trials <- matrix( NA, nrow = length(unique(all_trials$time)), ncol = length(unique(all_trials$series)) ) for (i in 1:length(unique(all_trials$series))) { trials[, i] <- all_trials$trials[which(all_trials$series == i)] } # Add trial info to the model data model_data$flat_trials <- as.vector(trials) model_data$flat_trials_train <- as.vector(trials)[which( as.vector(model_data$y_observed) == 1 )] # Add trial vectors to model block model_file[grep( "int flat_ys[n_nonmissing]; // flattened nonmissing observations", model_file, fixed = TRUE )] <- paste0( "array[n_nonmissing] int flat_ys; // flattened nonmissing observations\n", "array[total_obs] int flat_trials; // flattened trial vector\n", "array[n_nonmissing] int flat_trials_train; // flattened nonmissing trial vector\n" ) model_file <- readLines(textConnection(model_file), n = -1) } else { trials <- NULL } # Update parameters block if (family_char == 'beta_binomial') { model_file[grep("vector[num_basis] b_raw;", model_file, fixed = TRUE)] <- paste0("vector[num_basis] b_raw;\n", "vector[n_series] phi;") model_file <- readLines(textConnection(model_file), n = -1) } # Update functions block if (family_char == 'beta_binomial') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( "functions {\n", "vector rep_each(vector x, int K) {\n", "int N = rows(x);\n", "vector[N * K] y;\n", "int pos = 1;\n", "for (n in 1 : N) {\n", "for (k in 1 : K) {\n", "y[pos] = x[n];\n", "pos += 1;\n", "}\n", "}\n", "return y;\n", "}" ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', "vector rep_each(vector x, int K) {\n", "int N = rows(x);\n", "vector[N * K] y;\n", "int pos = 1;\n", "for (n in 1 : N) {\n", "for (k in 1 : K) {\n", "y[pos] = x[n];\n", "pos += 1;\n", "}\n", "}\n", "return y;\n", "}\n}" ) } } # Update model block if (family_char == 'binomial') { if ( any(grepl("flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE)) ) { linenum <- grep( "flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE ) model_file[linenum] <- paste0("flat_ys ~ binomial(flat_trials_train, inv_logit(flat_xs * b));") model_file <- model_file[-(linenum + 1)] model_file <- readLines(textConnection(model_file), n = -1) } if ( any(grepl( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE )) ) { linenum <- grep( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE ) model_file[linenum] <- paste0( "flat_ys ~ binomial(flat_trials_train, inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0)));" ) model_file <- model_file[-(linenum + 1)] model_file <- readLines(textConnection(model_file), n = -1) } } if (family_char == 'beta_binomial') { model_file[grep("model {", model_file, fixed = TRUE)] <- paste0( "model {\n", "// priors for Beta dispersion parameters\n", "phi ~ gamma(0.01, 0.01);" ) if ( any(grepl("flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE)) ) { linenum <- grep( "flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE ) model_file[linenum] <- paste0( "vector[n_nonmissing] flat_phis;\n", "flat_phis = rep_each(phi, n)[obs_ind];\n", "flat_ys ~ beta_binomial(flat_trials_train, inv_logit(flat_xs * b) .* flat_phis, (1 - inv_logit(flat_xs * b)) .* flat_phis);" ) model_file <- model_file[-(linenum + 1)] model_file <- readLines(textConnection(model_file), n = -1) } if ( any(grepl( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE )) ) { linenum <- grep( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE ) model_file[linenum] <- paste0( "vector[n_nonmissing] flat_phis;\n", "flat_phis = rep_each(phi, n)[obs_ind];\n", "flat_ys ~ beta_binomial(flat_trials_train, inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0)) .* flat_phis, (1 - inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0))) .* flat_phis);" ) model_file <- model_file[-(linenum + 1)] model_file <- readLines(textConnection(model_file), n = -1) } } if (family_char == 'bernoulli') { if ( any(grepl("flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE)) ) { model_file[grep( "flat_ys ~ poisson_log_glm(flat_xs,", model_file, fixed = TRUE )] <- "flat_ys ~ bernoulli_logit_glm(flat_xs," } if ( any(grepl( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE )) ) { model_file[grep( "flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),", model_file, fixed = TRUE )] <- "flat_ys ~ bernoulli_logit_glm(append_col(flat_xs, flat_trends)," } } # Update the generated quantities block if (family_char == 'binomial') { model_file[grep( "ypred[1:n, s] = poisson_log_rng(mus[1:n, s]);", model_file, fixed = TRUE )] <- "ypred[1:n, s] = binomial_rng(flat_trials[ytimes[1:n, s]], inv_logit(mus[1:n, s]));" } if (family_char == 'beta_binomial') { model_file[grep("vector[total_obs] eta;", model_file, fixed = TRUE)] <- paste0("vector[total_obs] eta;\n", "matrix[n, n_series] phi_vec;") model_file[grep("eta = X * b;", model_file, fixed = TRUE)] <- paste0( "eta = X * b;;\n", "for (s in 1 : n_series) {\n", "phi_vec[1 : n, s] = rep_vector(phi[s], n);\n", "}" ) model_file[grep( "ypred[1:n, s] = poisson_log_rng(mus[1:n, s]);", model_file, fixed = TRUE )] <- "ypred[1:n, s] = beta_binomial_rng(flat_trials[ytimes[1:n, s]], inv_logit(mus[1:n, s]) .* phi_vec[1:n, s], (1 - inv_logit(mus[1:n, s])) .* phi_vec[1:n, s]);" } if (family_char == 'bernoulli') { model_file[grep( "ypred[1:n, s] = poisson_log_rng(mus[1:n, s]);", model_file, fixed = TRUE )] <- "ypred[1:n, s] = bernoulli_logit_rng(mus[1:n, s]);" } #### Return #### return(list( model_file = model_file, model_data = model_data, trials = trials )) } ================================================ FILE: R/add_corcar.R ================================================ #' Updates for adding continuous time AR data #' @noRd add_corcar = function(model_data, data_train, data_test = NULL) { # Calculate temporal separation among observed points if (!is.null(data_test)) { all_times <- rbind( data.frame( series = as.numeric(data_train$series), time = data_train$time, index..time..index = data_train$index..time..index ), (data.frame( series = as.numeric(data_test$series), time = data_test$time, index..time..index = data_test$index..time..index )) ) %>% dplyr::group_by(series) %>% dplyr::arrange(index..time..index) %>% dplyr::mutate( time_lag = dplyr::lag(time), dis_time = time - time_lag, dis_time = ifelse(is.na(dis_time), 1, dis_time), dis_time = pmax(1e-3, dis_time) ) %>% dplyr::arrange(index..time..index, series) } else { all_times <- data.frame( series = as.numeric(data_train$series), time = data_train$time, index..time..index = data_train$index..time..index ) %>% dplyr::group_by(series) %>% dplyr::arrange(index..time..index) %>% dplyr::mutate( time_lag = dplyr::lag(time), dis_time = time - time_lag, dis_time = ifelse(is.na(dis_time), 1, dis_time), dis_time = pmax(1e-3, dis_time) ) %>% dplyr::ungroup() %>% dplyr::arrange(index..time..index, series) } time_dis <- matrix( NA, nrow = length(unique(all_times$index..time..index)), ncol = length(unique(all_times$series)) ) for (i in 1:length(unique(all_times$series))) { time_dis[, i] <- all_times$dis_time[which(all_times$series == i)] } model_data$time_dis <- time_dis return(model_data) } ================================================ FILE: R/add_nmixture.R ================================================ #' Updates for adding N-mixture processes #' @noRd add_nmixture = function( model_file, model_data, data_train, data_test = NULL, trend_map = NULL, nmix_trendmap = TRUE, orig_trend_model ) { insight::check_if_installed( "extraDistr", reason = 'to simulate from N-Mixture distributions' ) insight::check_if_installed( "wrswoR", reason = 'to simulate from N-Mixture distributions' ) if (inherits(orig_trend_model, 'mvgam_trend')) { orig_trend_model <- orig_trend_model$trend_model } # Update model data model_data <- add_nmix_data( model_data, data_train, data_test, trend_map, nmix_trendmap ) #### Update the model file appropriately #### # If orig_trend_model is 'None', this will be set up as a RW model so need # to remove sigma and change the process model lines if (orig_trend_model == 'None') { # Replace Random Walk trends with no dynamic trend start_replace <- grep( 'LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);', model_file, fixed = TRUE ) - 1 end_replace <- grep( 'LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]], sigma[j]);', model_file, fixed = TRUE ) + 2 model_file <- model_file[-c(start_replace:end_replace)] model_file[grep( 'trend_mus = X_trend * b_trend;', model_file, fixed = TRUE )] <- paste0( 'trend_mus = X_trend * b_trend;', '\n', 'for(j in 1:n_lv){\n', 'LV[1:n, j] = trend_mus[ytimes_trend[1:n, j]];\n', '}\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Remove sigma parameters start_replace <- grep('// latent state SD terms', model_file, fixed = TRUE) end_replace <- start_replace + 1 model_file <- model_file[-c(start_replace:end_replace)] # model_file <- model_file[-grep('vector[n_lv] penalty;', # model_file, fixed = TRUE)] # model_file <- model_file[-grep('penalty = 1.0 / (sigma .* sigma);', # model_file, fixed = TRUE)] model_file[grep( "penalty = 1.0 / (sigma .* sigma);", model_file, fixed = TRUE )] <- 'penalty = rep_vector(1e12, n_lv);' model_file <- model_file[ -c( grep( '// priors for latent state SD parameters', model_file, fixed = TRUE ), grep( '// priors for latent state SD parameters', model_file, fixed = TRUE ) + 1 ) ] # LV has to be declared in transformed params, not params model_file <- model_file[ -c( grep('matrix[n, n_lv] LV;', model_file, fixed = TRUE) - 1, grep('matrix[n, n_lv] LV;', model_file, fixed = TRUE) ) ] model_file[grep("transformed parameters {", model_file, fixed = TRUE)] <- paste0( "transformed parameters {\n", "// latent states\n", "matrix[n, n_lv] LV;\n" ) } # Update functions block model_file <- add_nmix_functions(model_file, trend_map, nmix_trendmap) # Update the data block model_file[grep( 'int n_nonmissing; // number of nonmissing observations', model_file, fixed = TRUE )] <- paste0( "int n_nonmissing; // number of nonmissing observations\n", "int cap[total_obs]; // upper limits of latent abundances\n", 'array[total_obs] int ytimes_array; // sorted ytimes\n' ) model_file <- readLines(textConnection(model_file), n = -1) if (nmix_trendmap) { model_file[grep( 'array[total_obs] int ytimes_array; // sorted ytimes', model_file, fixed = TRUE )] <- paste0( 'array[total_obs] int ytimes_array; // sorted ytimes\n', 'array[n, n_series] int ytimes_pred; // time-ordered matrix for prediction\n', 'int K_groups; // number of unique replicated observations\n', 'int K_reps; // maximum number of replicate observations\n', 'array[K_groups] int K_starts; // col of K_inds where each group starts\n', 'array[K_groups] int K_stops; // col of K_inds where each group ends\n', 'array[K_groups, K_reps] int K_inds; // indices of replicated observations' ) model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'int flat_ys[n_nonmissing]; // flattened nonmissing observations', model_file, fixed = TRUE )] <- 'array[total_obs] int flat_ys; // flattened observations' model_file <- model_file[ -grep( 'matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'int obs_ind[n_nonmissing]; // indices of nonmissing observations', model_file, fixed = TRUE ) ] } # Update transformed data block if (nmix_trendmap) { model_file[grep("transformed data {", model_file, fixed = TRUE)] <- paste0( "transformed data {\n", "matrix[total_obs, num_basis] X_ordered = X[ytimes_array, : ];\n", "array[K_groups] int Y_max;\n", "array[K_groups] int N_max;\n", "for ( k in 1 : K_groups ) {\n", "Y_max[k] = max(flat_ys[K_inds[k, K_starts[k] : K_stops[k]]]);\n", "N_max[k] = max(cap[K_inds[k, K_starts[k] : K_stops[k]]]);\n", "}" ) } model_file <- readLines(textConnection(model_file), n = -1) # Update the transformed parameters block model_file[grep("transformed parameters {", model_file, fixed = TRUE)] <- paste0( "transformed parameters {\n", "// detection probability\n", "vector[total_obs] p;\n" ) model_file[grep( '// latent process linear predictors', model_file, fixed = TRUE )] <- paste0( '// detection probability\n', 'p = X_ordered * b;\n\n', '// latent process linear predictors' ) model_file <- readLines(textConnection(model_file), n = -1) # Update the model block model_file <- add_nmix_model(model_file, trend_map, nmix_trendmap) # Update the generated quantities block model_file <- add_nmix_genquant(model_file, trend_map, nmix_trendmap) #### Return #### return(list(model_file = model_file, model_data = model_data)) } add_nmix_data = function( model_data, data_train, data_test, trend_map, nmix_trendmap = TRUE ) { model_data$ytimes_array <- as.vector(model_data$ytimes) #### Perform necessary checks on 'cap' (positive integers, no missing values) #### if (!(exists('cap', where = data_train))) { stop( 'Max abundances must be supplied as a variable named "cap" for N-mixture models', call. = FALSE ) } if (inherits(data_train, 'data.frame')) { cap = data_train %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } else { cap = data.frame( series = data_train$series, cap = data_train$cap, time = data_train$time ) %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } if (!is.null(data_test)) { if (!(exists('cap', where = data_test))) { stop( 'Max abundances must be supplied in test data as a variable named "cap" for N-mixture models', call. = FALSE ) } if (inherits(data_test, 'data.frame')) { captest = data_test %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } else { captest = data.frame( series = data_test$series, cap = data_test$cap, time = data_test$time ) %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } cap <- c(cap, captest) } validate_pos_integers(cap) if (any(is.na(cap)) | any(is.infinite(cap))) { stop( paste0('Missing or infinite values found for some "cap" terms'), call. = FALSE ) } model_data$cap <- as.vector(cap) if (any(model_data$cap[model_data$obs_ind] < model_data$flat_ys)) { stop( paste0('Some "cap" terms are < the observed counts. This is not allowed'), call. = FALSE ) } # Additional data objects for trend_map situations if (nmix_trendmap) { obs_ind <- model_data$obs_ind # Don't need to exclude non-missing obs anymore thanks to the grouping # indices model_data$flat_ys <- as.vector(model_data$y) model_data$flat_ys[model_data$flat_ys == -1] <- 0 ytimes <- model_data$ytimes Z <- model_data$Z # For all observations, which factor do they belong to? which_series <- matrix(NA, nrow = NROW(ytimes), ncol = NCOL(ytimes)) for (j in 1:NCOL(ytimes)) { which_series[, j] <- j } which_series <- as.vector(which_series) which_factor <- vector(length = length(ytimes)) for (i in 1:NCOL(Z)) { Z_obs <- which(which_series %in% which(Z[, i] == 1)) which_factor[Z_obs] <- i } # Replicate group sizes for each factor * time sample n_replicates <- colSums(Z) shift_nas = function(dat) { # Shift NAs to the right dat_new <- t(apply(dat, 1, function(x) { c(x[!is.na(x)], x[is.na(x)]) })) # Delete any rows that are all NA dat_new[rowSums(is.na(dat_new)) != ncol(dat_new), , drop = FALSE] } length_reps = function(dat) { apply(dat, 1, function(x) { length(x[!is.na(x)]) }) } K_inds <- dplyr::bind_rows(lapply(seq_len(NCOL(Z)), function(i) { factor_inds <- which(which_factor == i) group_mat <- matrix(NA, nrow = model_data$n, ncol = n_replicates[i]) for (j in 1:model_data$n) { group_mat[j, ] <- seq( factor_inds[j], max(factor_inds), by = model_data$n ) } group_mat[!group_mat %in% obs_ind] <- NA data.frame(shift_nas(group_mat)) })) # A second version of K_inds is needed for later generation # of properly-constrained latent N predictions; for this version, # all observations must be included (no NAs) K_inds_all <- dplyr::bind_rows(lapply(seq_len(NCOL(Z)), function(i) { factor_inds <- which(which_factor == i) group_mat <- matrix(NA, nrow = model_data$n, ncol = n_replicates[i]) for (j in 1:model_data$n) { group_mat[j, ] <- seq( factor_inds[j], max(factor_inds), by = model_data$n ) } data.frame(group_mat) })) # Add starting and ending indices for each group to model_data model_data$K_starts <- rep(1, NROW(K_inds)) model_data$K_stops <- length_reps(K_inds) # Change any remaining NAs to 1 so they are integers K_inds[is.na(K_inds)] <- 1 # Add remaining group information to the model_data model_data$K_reps <- NCOL(K_inds) model_data$K_groups <- NROW(K_inds) model_data$K_inds <- as.matrix(K_inds) model_data$K_inds_all <- as.matrix(K_inds_all) model_data$ytimes_pred <- matrix( 1:model_data$total_obs, nrow = model_data$n, byrow = FALSE ) } return(model_data) } add_nmix_genquant = function(model_file, trend_map, nmix_trendmap) { rho_included <- any(grepl('rho = log(lambda);', model_file, fixed = TRUE)) rho_trend_included <- any(grepl( 'rho_trend = log(lambda_trend);', model_file, fixed = TRUE )) if ( any(grepl("penalty = 1.0 / (sigma .* sigma);", model_file, fixed = TRUE)) ) { penalty_line <- "vector[n_lv] penalty = 1.0 / (sigma .* sigma);" } else { penalty_line <- "vector[n_lv] penalty = rep_vector(1e12, n_lv);" } # Delete most generated quantities so that they can be produced after model # fitting; this dramatically speeds up model time for nmixture models starts <- grep('generated quantities {', model_file, fixed = TRUE) + 1 ends <- max(grep('}', model_file, fixed = TRUE)) model_file <- model_file[-c(starts:ends)] model_file[grep('generated quantities {', model_file, fixed = TRUE)] <- paste0( 'generated quantities {\n', penalty_line, '\n', 'vector[total_obs] detprob = inv_logit(p);\n', if (rho_included) { 'vector[n_sp] rho = log(lambda);\n' } else { NULL }, if (rho_trend_included) { 'vector[n_sp_trend] rho_trend = log(lambda_trend);\n' } else { NULL }, '}' ) model_file <- readLines(textConnection(model_file), n = -1) return(model_file) } add_nmix_model = function(model_file, trend_map, nmix_trendmap) { if (nmix_trendmap) { model_file[grep( 'vector[n_nonmissing] flat_trends;', model_file, fixed = TRUE )] <- 'array[total_obs] real flat_trends;\narray[total_obs] real flat_ps;' model_file[grep( 'flat_trends = (to_vector(trend))[obs_ind];', model_file, fixed = TRUE )] <- 'flat_trends = (to_array_1d(trend));\nflat_ps = to_array_1d(p);' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),', model_file, fixed = TRUE )] <- paste0( '// loop over replicate sampling window (each site*time*species combination)\n', 'for (k in 1 : K_groups) {\n', '// all log_lambdas are identical because they represent site*time\n', '// covariates; so just use the first measurement\n', 'real log_lambda = flat_trends[K_inds[k, 1]];\n', '// logit-scale detection probilities for the replicate observations\n', 'vector[size(K_inds[k, K_starts[k] : K_stops[k]])] logit_p = to_vector(flat_ps[K_inds[k, K_starts[k] : K_stops[k]]]);\n', '// K values and observed counts for these replicates\n', 'int K_max = N_max[k];\n', 'int K_min = Y_max[k];\n', 'array[size(K_inds[k, K_starts[k] : K_stops[k]])] int N_obs = flat_ys[K_inds[k, K_starts[k] : K_stops[k]]];\n', 'int possible_N = K_max - K_min;\n', '// marginalize over possible latent counts analytically\n', 'real ff = exp(log_lambda) * prod(1 - inv_logit(logit_p));\n', 'real prob_n = 1;\n', 'for (i in 1 : possible_N){\n', 'real N = K_max - i + 1;\n', 'real k_obs = 1;\n', 'for (j in 1 : size(N_obs)){\n', 'k_obs *= N / (N - N_obs[j]);\n', '}\n', 'prob_n = 1 + prob_n * ff * k_obs / N;\n', '}\n', '// add log(pr_n) to prob(K_min)\n', 'target += poisson_log_lpmf(K_min | log_lambda) +\n', 'binomial_logit_lpmf(N_obs | K_min, logit_p) +\n', 'log(prob_n);\n', '}' ) model_file <- model_file[ -grep('0.0,append_row(b, 1.0));', model_file, fixed = TRUE) ] model_file <- readLines(textConnection(model_file), n = -1) } return(model_file) } add_nmix_functions = function(model_file, trend_map, nmix_trendmap) { model_file <- readLines(textConnection(model_file), n = -1) return(model_file) } #' Function to add generated quantities for nmixture models, which #' saves huge computational time #' @noRd add_nmix_posterior = function( model_output, obs_data, test_data, mgcv_model, n_lv, Z, K_inds ) { # Function to add samples to the 'sim' slot of a stanfit object add_samples = function( model_output, names, samples, nsamples, nchains, parname ) { samp_starts <- seq(1, NROW(samples), by = nsamples) samp_ends <- seq(nsamples, NROW(samples), by = nsamples) for (i in 1:nchains) { samps_df <- data.frame(samples[samp_starts[i]:samp_ends[i], ]) colnames(samps_df) <- names if (is.list(model_output@sim$samples[[i]])) { old <- attributes(model_output@sim$samples[[i]]) oldnames <- attr(model_output@sim$samples[[i]], 'names') model_output@sim$samples[[i]] <- append(model_output@sim$samples[[i]], as.list(samps_df)) mostattributes(model_output@sim$samples[[i]]) <- old attr(model_output@sim$samples[[i]], 'names') <- c(oldnames, colnames(samps_df)) } else { model_output@sim$samples[[i]] <- dplyr::bind_cols(model_output@sim$samples[[i]], samps_df) } } model_output@sim$fnames_oi <- c(model_output@sim$fnames_oi, names) model_output@model_pars <- c(model_output@model_pars, parname) model_output@sim$pars_oi <- c(model_output@sim$pars_oi, parname) return(model_output) } # Number of chains nchains <- model_output@sim$chains # Trend samples (for getting dimnames needed for ypred, latent_ypred) trend <- mcmc_chains(model_output, 'trend') # Construct latent_ypred samples (arranged by time, then series) detprob <- mcmc_chains(model_output, 'detprob') ps <- qlogis(detprob) Xp <- matrix(as.vector(ps)) attr(Xp, 'model.offset') <- 0 if (!is.null(test_data)) { cap <- rbind( data.frame( time = obs_data$time, series = obs_data$series, cap = obs_data$cap ), data.frame( time = test_data$time, series = test_data$series, cap = test_data$cap ) ) %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } else { cap <- data.frame( time = obs_data$time, series = obs_data$series, cap = obs_data$cap ) %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) } cap <- as.vector(t(replicate(NROW(ps), cap))) # Unconditional latent_N predictions if (!is.null(test_data)) { truth_df <- rbind( data.frame( time = obs_data$time, series = obs_data$series, y = obs_data$y ), data.frame( time = test_data$time, series = test_data$series, y = test_data$y ) ) } else { truth_df <- data.frame( time = obs_data$time, series = obs_data$series, y = obs_data$y ) } get_min_cap = function(truth, K_inds) { rowgroup = function(x) { which(K_inds == x, arr.ind = TRUE)[1] } data.frame(index = 1:length(truth), truth = truth) %>% dplyr::rowwise() %>% dplyr::mutate(group = rowgroup(index)) %>% dplyr::ungroup() %>% dplyr::group_by(group) %>% dplyr::mutate(min_cap = max(truth, na.rm = TRUE)) %>% dplyr::pull(min_cap) } # K_inds was originally supplied in series, time order # so the corresponding truth must be supplied that way truth_df %>% dplyr::arrange(series, time) %>% dplyr::pull(y) -> orig_y if (is.null(K_inds)) { K_inds <- matrix(1:length(orig_y), ncol = 1) } min_cap <- suppressWarnings(get_min_cap(orig_y, K_inds)) min_cap[!is.finite(min_cap)] <- 0 # min_cap is now in the wrong order, so we need to change it truth_df %>% dplyr::arrange(series, time) %>% dplyr::bind_cols(min_cap = min_cap) %>% dplyr::arrange(time, series) %>% dplyr::pull(min_cap) -> min_cap # truth now also needs to be in the correct time, series # order truth_df %>% dplyr::arrange(time, series) %>% dplyr::pull(y) -> mod_y truth <- as.vector(t(replicate(NROW(ps), mod_y))) min_cap <- as.vector(t(replicate(NROW(ps), min_cap))) latentypreds_vec <- mvgam_predict( Xp = Xp, family = 'nmix', betas = 1, latent_lambdas = exp(as.vector(trend)), cap = cap, min_cap = min_cap, type = 'latent_N' ) # Conditional latent_N predictions (when observations were not NA) whichobs <- which(!is.na(truth)) Xp <- Xp[whichobs, , drop = FALSE] attr(Xp, 'model.offset') <- 0 condpreds_vec <- mvgam_predict( Xp = Xp, family = 'nmix', betas = 1, latent_lambdas = exp(as.vector(trend)[whichobs]), cap = cap[whichobs], min_cap = min_cap[whichobs], truth = truth[whichobs], type = 'latent_N' ) # Fill in the unconditionals using the conditionals when there were actually # observations latentypreds_vec[whichobs] <- condpreds_vec latentypreds <- matrix(latentypreds_vec, nrow = NROW(ps)) # Update parameter names and samples to match expected order expand.grid( time = 1:model_output@sim$dims_oi$trend[1], series = 1:model_output@sim$dims_oi$trend[2] ) %>% dplyr::arrange(time, series) %>% dplyr::mutate(current = dplyr::row_number()) %>% dplyr::arrange(series, time) %>% dplyr::mutate(needed = dplyr::row_number()) %>% dplyr::mutate(name = paste0('trend[', time, ',', series, ']')) %>% dplyr::arrange(current) -> ordering_needed parnames <- ordering_needed %>% dplyr::arrange(needed) %>% dplyr::pull(name) indices <- ordering_needed %>% dplyr::arrange(needed) %>% dplyr::pull(current) # Add latent_ypreds to the posterior samples model_output <- add_samples( model_output = model_output, names = gsub('trend', 'latent_ypred', parnames), samples = latentypreds[, indices], nsamples = NROW(latentypreds) / nchains, nchains = nchains, parname = 'latent_ypred' ) model_output@sim$dims_oi$latent_ypred <- model_output@sim$dims_oi$trend # Now construct the detprob samples # model_output <- add_samples( # model_output = model_output, # names = gsub('p', 'detprob', dimnames(ps)[[2]]), # samples = detprob, # nsamples = NROW(detprob) / nchains, # nchains = nchains, # parname = 'detprob' # ) # model_output@sim$dims_oi$detprob <- # model_output@sim$dims_oi$p # Now construct ypred samples ypreds_vec <- rbinom( length(latentypreds_vec), size = latentypreds_vec, prob = as.vector(detprob) ) ypreds <- matrix(ypreds_vec, nrow = NROW(ps)) model_output <- add_samples( model_output = model_output, names = gsub('trend', 'ypred', parnames), samples = ypreds[, indices], nsamples = NROW(ypreds) / nchains, nchains = nchains, parname = 'ypred' ) model_output@sim$dims_oi$ypred <- model_output@sim$dims_oi$trend # Now construct mus (expectations) samples mus_vec <- as.vector(detprob) * latentypreds_vec mus <- matrix(mus_vec, nrow = NROW(ps)) model_output <- add_samples( model_output = model_output, names = gsub('trend', 'mus', parnames), samples = mus[, indices], nsamples = NROW(mus) / nchains, nchains = nchains, parname = 'mus' ) model_output@sim$dims_oi$mus <- model_output@sim$dims_oi$trend # Now the lv_coefs samples n_series <- length(unique(obs_data$series)) combinations <- expand.grid(1:n_series, 1:n_lv) %>% dplyr::arrange(Var2) lv_coef_names <- apply( combinations, 1, function(x) paste0('lv_coefs[', x[1], ',', x[2], ']') ) lv_coef_samps <- t(as.matrix(replicate(NROW(ps), as.vector(t(Z))))) model_output <- add_samples( model_output = model_output, names = lv_coef_names, samples = lv_coef_samps, nsamples = NROW(lv_coef_samps) / nchains, nchains = nchains, parname = 'lv_coefs' ) model_output@sim$dims_oi$lv_coefs <- c(n_series, n_lv) # Update number of total parameters model_output@sim$n_flatnames <- sum(unlist(lapply(model_output@sim$dims_oi, prod), use.names = FALSE)) return(model_output) } ================================================ FILE: R/add_poisson_lines.R ================================================ #' Poisson JAGS modifications #' #' #' @noRd #' @param model_file A template `JAGS` model file to be modified #' @param upper_bounds Optional upper bounds for the truncated observation likelihood #' @return A modified `JAGS` model file add_poisson_lines = function(model_file, upper_bounds) { odis_begin <- grep('phi\\[s\\] <- ', model_file) - 4 odis_end <- odis_begin + 7 model_file <- model_file[-c(odis_begin:odis_end)] rate_begin <- grep('rate\\[i, s\\] <- ', model_file) rate_end <- rate_begin + 1 model_file <- model_file[-c(rate_begin:rate_end)] if (missing(upper_bounds)) { model_file[grep( 'y\\[i, s\\] ~', model_file )] <- ' y[i, s] ~ dpois(mus[i, s])' model_file[grep( 'ypred\\[i, s\\] ~', model_file )] <- ' ypred[i, s] ~ dpois(mus[i, s])' } else { model_file[grep( 'y\\[i, s\\] ~', model_file )] <- ' y[i, s] ~ dpois(mus[i, s])T(, upper_bound[s])' model_file[grep( 'ypred\\[i, s\\] ~', model_file )] <- ' ypred[i, s] ~ dpois(mus[i, s])T(, upper_bound[s])' } model_file } ================================================ FILE: R/add_residuals.R ================================================ #'@title Calculate randomized quantile residuals for \pkg{mvgam} objects #'@name add_residuals.mvgam #'@param object \code{list} object of class \code{mvgam}. See [mvgam()] #'@param ... unused #'@details For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics #'If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no #'autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent #'draws from the model's posterior distribution #'@return A list object of class `mvgam` with residuals included in the `'resids'` slot #'@export add_residuals <- function(object, ...) { UseMethod("add_residuals", object) } #'@rdname add_residuals.mvgam #'@method add_residuals mvgam #'@export add_residuals.mvgam = function(object, ...) { resids <- dsresids_vec(object) object$resids <- resids return(object) } ================================================ FILE: R/add_stan_data.R ================================================ #' Add remaining data, model and parameter blocks to a Stan model #' #' #' @noRd #' @param jags_file Prepared JAGS mvgam model file #' @param stan_file Incomplete Stan model file to be edited #' @param ss_gam The GAM setup object #' @param use_lv logical #' @param n_lv \code{integer} number of latent dynamic factors (if \code{use_lv = TRUE}) #' @param jags_data Prepared mvgam data for JAGS modelling #' @param family \code{character}. #' @param upper_bounds Optional \code{vector} of \code{integer} values specifying upper limits for each series. If supplied, #' this generates a modified likelihood where values above the bound are given a likelihood of zero. Note this modification #' is computationally expensive in \code{JAGS} but can lead to better estimates when true bounds exist. Default is to remove #' truncation entirely (i.e. there is no upper bound for each series) #' @return A `list` containing the updated Stan model and model data add_stan_data = function( jags_file, stan_file, ss_gam, use_lv = FALSE, n_lv, jags_data, family = 'poisson', upper_bounds ) { #### Modify the Stan file #### # Update lines associated with particular family if (family == 'poisson') { if (!is.null(upper_bounds)) { stan_file[grep('~ poisson_log', stan_file)] <- gsub(';', 'T[,U[s]];', stan_file[grep('~ poisson_log', stan_file)]) stan_file[grep('~ poisson_log', stan_file)] <- gsub( 'poisson_log(', 'poisson(exp(', stan_file[grep('~ poisson_log', stan_file)], fixed = TRUE ) stan_file[grep('~ poisson', stan_file)] <- gsub(')', '))', stan_file[grep('~ poisson', stan_file)], fixed = TRUE) } } if (family == 'negative binomial') { stan_file[grep('// raw basis', stan_file) + 2] <- '\n// negative binomial overdispersion\nvector[n_series] phi_inv;\n' stan_file[grep('// priors for smoothing', stan_file) + 2] <- paste0( '\n// priors for overdispersion parameters\n', 'phi_inv ~ student_t(3, 0, 0.1);\n' ) to_negbin <- gsub( 'poisson_log', 'neg_binomial_2', stan_file[grep('y[i, s] ~ poisson', stan_file, fixed = T)] ) stan_file[grep('y[i, s] ~ poisson', stan_file, fixed = T)] <- gsub(');', ', inv(phi_inv[s]));', to_negbin) add_exp_open <- gsub( '\\(eta', '(exp(eta', stan_file[grep('y[i, s] ~ neg_binomial', stan_file, fixed = T)] ) add_exp_cl <- gsub('],', ']),', add_exp_open) stan_file[grep('y[i, s] ~ neg_binomial', stan_file, fixed = T)] <- add_exp_cl stan_file[grep('matrix[n, n_series] ypred;', stan_file, fixed = T)] <- paste0( 'matrix[n, n_series] ypred;\n', 'matrix[n, n_series] phi_vec;\n', 'vector[n_series] phi;\n', 'phi = inv(phi_inv);\n', 'for (s in 1:n_series) {\n', 'phi_vec[1:n,s] = rep_vector(phi[s], n);\n}\n' ) to_negbin <- gsub( 'poisson_log_rng', 'neg_binomial_2_rng', stan_file[grep('ypred[i, s] = poisson_log_rng', stan_file, fixed = T)] ) stan_file[grep('ypred[i, s] = poisson_log_rng', stan_file, fixed = T)] <- gsub(');', ', phi_vec[i, s]);', to_negbin) add_exp_open <- gsub( '\\(eta', '(exp(eta', stan_file[grep('ypred[i, s] = neg_binomial', stan_file, fixed = T)] ) if (any(grepl('trend[i, s]', stan_file, fixed = T))) { add_exp_cl <- gsub('trend[i, s]', 'trend[i, s])', add_exp_open, fixed = T) } else { add_exp_cl <- gsub( 'eta[ytimes[i, s]]', 'eta[ytimes[i, s]])', add_exp_open, fixed = T ) } stan_file[grep('ypred[i, s] = neg_binomial', stan_file, fixed = T)] <- add_exp_cl if (!is.null(upper_bounds)) { stan_file[grep('~ neg_binomial_2', stan_file)] <- gsub(';', 'T[,U[s]];', stan_file[grep('~ neg_binomial_2', stan_file)]) } stan_file <- readLines(textConnection(stan_file), n = -1) } # Get dimensions and numbers of smooth terms snames <- names(jags_data)[grep('S.*', names(jags_data))] if (length(snames) == 0) { smooth_penalty_data <- NULL } else { smooth_dims <- matrix(NA, ncol = 2, nrow = length(snames)) for (i in 1:length(snames)) { smooth_dims[i, ] <- dim(jags_data[[snames[i]]]) } # Insert the data block for the model smooth_penalty_data <- vector() for (i in 1:length(snames)) { smooth_penalty_data[i] <- paste0( 'matrix[', smooth_dims[i, 1], ',', smooth_dims[i, 2], '] ', snames[i], '; // mgcv smooth penalty matrix ', snames[i] ) } } # Get parametric prior locations and precisions if necessary if ('p_taus' %in% names(jags_data)) { p_terms <- paste0( 'real p_taus[', length(jags_data$p_taus), ']; // prior precisions for parametric coefficients\n', 'real p_coefs[', length(jags_data$p_taus), ']; // prior locations for parametric coefficients\n' ) } else { p_terms <- NULL } # Add lines for upper bounds if supplied if (!is.null(upper_bounds)) { bounds <- paste0('int U[', length(upper_bounds), ']; // upper bounds\n') } else { bounds <- NULL } # Remove smooth parameter info if no smooth terms are included if (any(grepl('## smoothing parameter priors...', jags_file))) { zero_data <- paste0( 'vector[num_basis] zero; // prior locations for basis coefficients\n' ) n_sp_data <- paste0( 'int n_sp; // number of smoothing parameters\n' ) } else { zero_data <- NULL n_sp_data <- NULL } # Occasionally there are smooths with no zero vector # (i.e. for bs = 'fs', they are often just normal(0, lambda)) if (is.null(jags_data$zero)) { zero_data <- NULL } # latent variable lines if (use_lv) { lv_data <- paste0('int n_lv; // number of dynamic factors\n') } else { lv_data <- NULL } # shared smoothing parameter lines if ('L' %in% names(jags_data)) { lambda_links <- paste0( 'matrix[', NROW(jags_data$L), ',', NCOL(jags_data$L), '] lambda_links; // smooth parameter linking matrix\n', 'int n_raw_sp; // number of raw smoothing parameters to estimate\n' ) } else { lambda_links <- NULL } # Offset information if (any(grepl('eta <- X %*% b + offset', jags_file, fixed = TRUE))) { offset_line <- paste0('vector[total_obs] offset; // offset vector\n') } else { offset_line <- NULL } if (any(grepl('eta <- X * b + offset', jags_file, fixed = TRUE))) { offset_line <- paste0('vector[total_obs] offset; // offset vector\n') } else { offset_line <- NULL } if ( any(grepl( 'offset; offset vector of length (n x n_series)', jags_file, fixed = TRUE )) ) { offset_line <- paste0('vector[total_obs] offset; // offset vector\n') } else { offset_line <- NULL } # Search for any non-contiguous indices that sometimes are used by mgcv if (any(grep('in c\\(', jags_file))) { add_idxs <- TRUE seq_character = function(x) { all_nums <- as.numeric(unlist(strsplit(x, ':'))) if (length(all_nums) > 1) { out <- seq(all_nums[1], all_nums[2]) } else { out <- all_nums } out } idx_locations <- grep('in c\\(', jags_file) idx_vals <- list() idx_data <- vector() for (i in 1:length(idx_locations)) { list_vals <- unlist(strsplit( gsub('^.*c\\(*|\\s*).*$', '', jags_file[idx_locations[i]]), ',' )) idx_vals[[i]] <- array( unlist(lapply(list_vals, seq_character)), dim = length(unlist(lapply(list_vals, seq_character))) ) idx_data[i] <- paste0( 'int idx', i, '[', length(idx_vals[[i]]), ']; // discontiguous index values' ) jags_file[idx_locations][i] <- sub( "in.*\\)\\)", paste0("in idx", i, ')'), jags_file[idx_locations][i] ) } # Update the Stan data block stan_file[grep('##insert data', stan_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'data {', '\n', bounds, paste0(idx_data, collapse = '\n'), '\n', 'int total_obs; // total number of observations\n', 'int n; // number of timepoints per series\n', lv_data, n_sp_data, lambda_links, 'int n_series; // number of series\n', 'int num_basis; // total number of basis coefficients\n', #p_terms, zero_data, offset_line, 'matrix[num_basis, total_obs] X; // transposed mgcv GAM design matrix\n', 'int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)\n', paste0(smooth_penalty_data, collapse = '\n'), '\n', 'int y_observed[n, n_series]; // indices of missing vs observed\n', 'int y[n, n_series]; // time-ordered observations, with -1 indicating missing\n', '}\n' ) } else { add_idxs <- FALSE stan_file[grep('##insert data', stan_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'data {', '\n', bounds, 'int total_obs; // total number of observations\n', 'int n; // number of timepoints per series\n', lv_data, n_sp_data, lambda_links, 'int n_series; // number of series\n', 'int num_basis; // total number of basis coefficients\n', zero_data, offset_line, #p_terms, 'matrix[num_basis, total_obs] X; // transposed mgcv GAM design matrix\n', 'int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)\n', paste0(smooth_penalty_data, collapse = '\n'), '\n', 'int y_observed[n, n_series]; // indices of missing vs observed\n', 'int y[n, n_series]; // time-ordered observations, with -1 indicating missing\n', '}\n' ) } stan_file <- readLines(textConnection(stan_file), n = -1) # Modify the model block to include each smooth term if (any(grepl('## smoothing parameter priors...', jags_file))) { smooths_start <- grep('## GAM-specific priors', jags_file) + 1 smooths_end <- grep('## smoothing parameter priors...', jags_file) - 1 jags_smooth_text <- jags_file[smooths_start:smooths_end] jags_smooth_text <- gsub('##', '//', jags_smooth_text) jags_smooth_text <- gsub('dexp', 'exponential', jags_smooth_text) smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, term = paste(ss_gam$smooth[[x]]$term, collapse = ','), class = class(ss_gam$smooth[[x]])[1] ) }) ) if (length(ss_gam$sp) > 0 & !all(smooth_labs$class == 'random.effect')) { any_ks <- TRUE } else { any_ks <- FALSE } # any_ks <- any(grep('K.* <- ', jags_smooth_text)) any_timevarying <- any(grep( '// prior for s(time):', jags_smooth_text, fixed = TRUE )) if ( any_ks || any_timevarying ) { if (any(grep('K.* <- ', jags_smooth_text))) { K_starts <- grep('K.* <- ', jags_smooth_text) for (i in 1:length(K_starts)) { jags_smooth_text[K_starts[i] + 1] <- gsub( '\\bb\\b', 'b_raw', gsub( 'dmnorm', 'multi_normal_prec', paste0( gsub( 'K.*', trimws(gsub('K.* <- ', '', jags_smooth_text[K_starts[i]])), jags_smooth_text[K_starts[i] + 1] ), ')' ) ) ) } jags_smooth_text <- jags_smooth_text[-K_starts] } } else { # If no K terms or time-varying terms, then there are no smoothing parameters in the model # (probably the only smooth terms included are random effect bases, which don't need # smoothing parameters when we use the non-centred parameterisation) stan_file <- stan_file[ -grep('// priors for smoothing parameters', stan_file, fixed = TRUE) ] stan_file <- stan_file[-grep('lambda ~ ', stan_file, fixed = TRUE)] stan_file <- stan_file[-grep('vector[n_sp] rho', stan_file, fixed = TRUE)] stan_file <- stan_file[-grep('rho = log', stan_file, fixed = TRUE)] stan_file <- stan_file[ -grep('// smoothing parameters', stan_file, fixed = TRUE) ] stan_file <- stan_file[-grep('[n_sp] lambda', stan_file, fixed = TRUE)] # stan_file <- stan_file[-grep('vector[num_basis] zero; //', stan_file, # fixed = TRUE)] stan_file <- stan_file[ -grep('int n_sp; //', stan_file, fixed = TRUE) ] } # If there are no K terms but there are time-varying, we don't need # the zero vector if (any_timevarying & !any_ks) { stan_file <- stan_file[ -grep('vector[num_basis] zero; //', stan_file, fixed = TRUE) ] } # Create a new smooths_included check after working through the # random effects if (!any_timevarying & !any_ks) { smooths_included <- FALSE } else { smooths_included <- TRUE } if (any(grep('b\\[i\\] = b_raw', jags_smooth_text))) { jags_smooth_text <- jags_smooth_text[ -grep('b\\[i\\] = b_raw', jags_smooth_text) ] } jags_smooth_text <- gsub('dnorm', 'normal', jags_smooth_text) jags_smooth_text <- gsub(' ', ' ', jags_smooth_text) jags_smooth_text[-grep('//|\\}|\\{', jags_smooth_text)] <- paste0(jags_smooth_text[-grep('//|\\}|\\{', jags_smooth_text)], ';') jags_smooth_text <- gsub(') }', '); }', jags_smooth_text) jags_smooth_text <- gsub('}', '}\n', jags_smooth_text) jags_smooth_text[(grep('//', jags_smooth_text) - 1)[-1]] <- paste0(jags_smooth_text[(grep('//', jags_smooth_text) - 1)[-1]], '\n') stan_file[grep('##insert smooths', stan_file)] <- paste0( jags_smooth_text, collapse = '\n' ) stan_file <- readLines(textConnection(stan_file), n = -1) # Deal with any random effect priors if (any(grep('b_raw\\[i\\] ~', stan_file))) { b_raw_string <- paste0( stan_file[grep('b_raw\\[i\\] ~', stan_file) - 1], collapse = ',' ) n_b_raw <- max(as.numeric(unlist(regmatches( b_raw_string, gregexpr("[[:digit:]]+", b_raw_string) )))) min_b_raw <- min(as.numeric(unlist(regmatches( b_raw_string, gregexpr("[[:digit:]]+", b_raw_string) )))) n_sigma_raw <- max(as.numeric(unlist(regmatches( unique(sub( ".*(sigma_raw?\\d+).*", "\\1", grep('sigma_raw', stan_file, value = T) )), gregexpr( "[[:digit:]]+", unique(sub( ".*(sigma_raw?\\d+).*", "\\1", grep('sigma_raw', stan_file, value = T) )) ) )))) stan_file <- stan_file[-grep('mu_raw.* ~ ', stan_file)] stan_file <- stan_file[-grep('<- mu_raw', stan_file)] stan_file <- stan_file[-grep('sigma_raw.* ~ ', stan_file)] stan_file[grep('model \\{', stan_file)] <- paste0( 'model {\n// prior for random effect population variances\nsigma_raw ~ exponential(0.5);\n\n', '// prior for random effect population means\nmu_raw ~ std_normal();\n' ) stan_file[grep('parameters \\{', stan_file)[1] + 2] <- paste0( stan_file[grep('parameters \\{', stan_file)[1] + 2], '\n', '\n// random effect variances\n', paste0( 'vector[', n_sigma_raw, '] sigma_raw', ';\n', collapse = '' ), '\n', '\n// random effect means\n', paste0('vector[', n_sigma_raw, '] mu_raw', ';\n', collapse = '') ) b_raw_text <- vector() min_beta <- vector() b_raw_indices <- grep('b_raw\\[i\\] ~', stan_file) for (i in 1:length(b_raw_indices)) { b_raw_text[i] <- paste0( 'for (i in ', as.numeric(sub( "for \\(i in ", "", sub("\\:.*", "", stan_file[b_raw_indices[i] - 1]) )), ':', as.numeric(sub( " ", "", sub( "\\{", "", sub("\\)", "", sub(".*\\:", "", stan_file[b_raw_indices[i] - 1])) ) )), ') {\nb[i] = mu_raw[', i, '] + b_raw[i] * sigma_raw[', i, '];\n}\n' ) min_beta[i] <- as.numeric(sub( "for \\(i in ", "", sub("\\:.*", "", stan_file[b_raw_indices[i] - 1]) )) } # If parametric coefficients are included, they'll come before random effects min_re_betas <- min(min_beta) if (min_re_betas > 1) { b_raw_text <- c( paste0( '\nfor (i in 1:', min_re_betas - 1, ') {\nb[i] = b_raw[i];\n}' ), b_raw_text ) } else { b_raw_text <- b_raw_text } if (n_b_raw < dim(jags_data$X)[2]) { b_raw_text <- c( b_raw_text, paste0( '\nfor (i in ', n_b_raw + 1, ':num_basis) {\nb[i] = b_raw[i];\n}\n' ) ) } stan_file[grep('// basis coefficients', stan_file) + 2] <- paste0( b_raw_text, collapse = '\n' ) stan_file <- readLines(textConnection(stan_file), n = -1) # If no random effects, betas are equal to beta_raws } else { stan_file[grep('// basis coefficients', stan_file) + 2] <- paste0('\nfor (i in ', '1:num_basis) {\nb[i] = b_raw[i];\n}') stan_file <- readLines(textConnection(stan_file), n = -1) } # Update parametric effect priors if (any(grep('// parametric effect', stan_file))) { # Get indices of parametric effects smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, term = paste(ss_gam$smooth[[x]]$term, collapse = ','), class = class(ss_gam$smooth[[x]])[1] ) }) ) lpmat <- predict(ss_gam, type = 'lpmatrix', exclude = smooth_labs$label) para_indices <- which(apply(lpmat, 2, function(x) !all(x == 0)) == TRUE) # min_paras <- as.numeric(sub('.*(?=.$)', '', # sub("\\:.*", "", # stan_file[grep('// parametric effect', stan_file) + 1]), # perl=T)) # max_paras <- as.numeric(substr(sub(".*\\:", "", # stan_file[grep('// parametric effect', stan_file) + 1]), # 1, 1)) # para_indices <- seq(min_paras, max_paras) # Get names of parametric terms # int_included <- attr(ss_gam$pterms, 'intercept') == 1L # other_pterms <- attr(ss_gam$pterms, 'term.labels') # all_paras <- other_pterms # if(int_included){ # all_paras <- c('(Intercept)', all_paras) # } all_paras <- names(para_indices) # Create prior lines for parametric terms para_lines <- vector() for (i in seq_along(all_paras)) { para_lines[i] <- paste0( '// prior for ', all_paras[i], '...\n', 'b_raw[', para_indices[i], '] ~ student_t(3, 0, 2);\n' ) } stan_file <- stan_file[-(grep('// parametric effect', stan_file) + 1)] stan_file[grep('// parametric effect', stan_file)] <- paste0(paste(para_lines, collapse = '\n')) stan_file <- readLines(textConnection(stan_file), n = -1) } # Check for shared smoothing parameters and link them accordingly if ('L' %in% names(jags_data)) { stan_file[grep( 'lambda ~ normal', stan_file, fixed = TRUE )] <- "lambda_raw ~ normal(30, 25);" stan_file[grep( "vector[n_sp] lambda;", stan_file, fixed = TRUE )] <- "vector[n_raw_sp] lambda_raw;" stan_file[grep( '// GAM contribution to expectations', stan_file, fixed = TRUE )] <- "// GAM contribution to expectations (log scale)\n// linked smoothing parameters\nvector[n_sp] lambda;\n" stan_file[grep('model {', stan_file, fixed = TRUE) - 2] <- 'lambda = to_vector(lambda_links * lambda_raw);\n}\n' stan_file <- readLines(textConnection(stan_file), n = -1) } } else { ## No smooths included smooths_included <- FALSE stan_file <- stan_file[ -grep('// priors for smoothing parameters', stan_file, fixed = TRUE) ] stan_file <- stan_file[-grep('lambda ~ normal', stan_file, fixed = TRUE)] stan_file <- stan_file[-grep('vector[n_sp] rho', stan_file, fixed = TRUE)] stan_file <- stan_file[-grep('rho = log', stan_file, fixed = TRUE)] stan_file <- stan_file[ -grep('// smoothing parameters', stan_file, fixed = TRUE) ] stan_file <- stan_file[-grep('[n_sp] lambda', stan_file, fixed = TRUE)] stan_file[grep('// basis coefficients', stan_file) + 2] <- paste0('\nfor (i in ', '1:num_basis) {\nb[i] = b_raw[i];\n}\n') if (any(grep('## parametric effect priors', jags_file))) { # Get indices of parametric effects smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, term = paste(ss_gam$smooth[[x]]$term, collapse = ','), class = class(ss_gam$smooth[[x]])[1] ) }) ) lpmat <- predict(ss_gam, type = 'lpmatrix', exclude = smooth_labs$label) para_indices <- which(apply(lpmat, 2, function(x) !all(x == 0)) == TRUE) all_paras <- names(para_indices) # min_paras <- as.numeric(sub('.*(?=.$)', '', # sub("\\:.*", "", # jags_file[grep('## parametric effect', jags_file) + 1]), perl=T)) # max_paras <- as.numeric(substr(sub(".*\\:", "", # jags_file[grep('## parametric effect', jags_file) + 1]), # 1, 1)) # para_indices <- seq(min_paras, max_paras) # # # Get names of parametric terms # int_included <- attr(ss_gam$pterms, 'intercept') == 1L # other_pterms <- attr(ss_gam$pterms, 'term.labels') # all_paras <- other_pterms # if(int_included){ # all_paras <- c('(Intercept)', all_paras) # } # Create prior lines for parametric terms para_lines <- vector() for (i in seq_along(all_paras)) { para_lines[i] <- paste0( '// prior for ', all_paras[i], '...\n', 'b_raw[', para_indices[i], '] ~ student_t(3, 0, 2);\n' ) } stan_file[grep('##insert smooths', stan_file)] <- paste0(paste(para_lines, collapse = '\n')) stan_file <- readLines(textConnection(stan_file), n = -1) } } #### Minor text changes to improve efficiency of Stan code #### #stan_file <- gsub('...', '', stan_file) clean_up <- vector() for (x in 1:length(stan_file)) { clean_up[x] <- stan_file[x - 1] == "" & stan_file[x] == "" } clean_up[is.na(clean_up)] <- FALSE stan_file <- stan_file[!clean_up] # Use as much vectorization as possible for computing predictions stan_file[grep('vector[total_obs] eta;', stan_file, fixed = TRUE)] <- paste0( 'vector[total_obs] eta;\n\n', '// expectations\n', 'matrix[n, n_series] mus;' ) if (any(grepl('trend[i, s]', stan_file))) { stan_file <- sub( 'eta[ytimes[i, s]] + trend[i, s]', 'mus[i, s]', stan_file, fixed = TRUE ) stan_file[grep('model {', stan_file, fixed = TRUE) - 2] <- paste0( '\nfor(s in 1:n_series){\n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', '}\n', '}' ) } else { stan_file <- sub('eta[ytimes[i, s]]', 'mus[i, s]', stan_file, fixed = TRUE) stan_file[grep('model {', stan_file, fixed = TRUE) - 2] <- paste0( '\nfor(s in 1:n_series){\n', 'mus[1:n, s] = eta[ytimes[1:n, s]];\n', '}\n', '}' ) } stan_file <- stan_file[-(grep('// posterior predictions', stan_file, fixed = TRUE) + 1)] stan_file <- stan_file[-(grep('// posterior predictions', stan_file, fixed = TRUE) + 4)] stan_file[grep('ypred[i, s] =', stan_file, fixed = TRUE)] <- gsub( 'i, s', '1:n, s', stan_file[grep('ypred[i, s] =', stan_file, fixed = TRUE)] ) stan_file[grep('matrix[n, n_series] ypred;', stan_file, fixed = TRUE)] <- 'array[n, n_series] int ypred;' # Remove un-needed loops for transformed beta parameters b_i_indices <- grep('b[i] = ', stan_file, fixed = TRUE) if (length(b_i_indices > 0)) { for (x in b_i_indices) { i_text <- paste0( as.numeric(sub("for \\(i in ", "", sub("\\:.*", "", stan_file[x - 1]))), ':', sub( " ", "", sub("\\{", "", sub("\\)", "", sub(".*\\:", "", stan_file[x - 1]))) ) ) stan_file[x] <- paste0(gsub( '[i]', paste0('[', i_text, ']'), stan_file[x], fixed = TRUE )) } stan_file <- stan_file[-c(b_i_indices - 1, b_i_indices + 1)] } # Remove un-needed loop for random effect priors b_i_indices <- grep('// prior (non-centred) for', stan_file, fixed = TRUE) if (length(b_i_indices > 0)) { for (x in b_i_indices) { x = x + 2 i_text <- paste0( as.numeric(sub("for \\(i in ", "", sub("\\:.*", "", stan_file[x - 1]))), ':', sub( " ", "", sub("\\{", "", sub("\\)", "", sub(".*\\:", "", stan_file[x - 1]))) ) ) stan_file[x] <- paste0(gsub( '[i]', paste0('[', i_text, ']'), stan_file[x], fixed = TRUE )) } stan_file <- stan_file[-c(b_i_indices + 1, b_i_indices + 3)] } # Replace any normal(0, 1) with std_normal() for faster computation stan_file <- readLines(textConnection(stan_file), n = -1) stan_file <- gsub('normal(0, 1)', 'std_normal()', stan_file, fixed = TRUE) # Change b to b_raw for any idx normals if (any(grep('for (i in idx', stan_file, fixed = TRUE))) { lines_matching <- grep('for (i in idx', stan_file, fixed = TRUE) for (i in lines_matching) { stan_file[i] <- gsub('\\bb\\b', 'b_raw', stan_file[i]) } } # Final tidying of the Stan model for readability unlink('base_gam_stan.txt') stan_file <- readLines(textConnection(stan_file), n = -1) clean_up <- vector() for (x in 1:length(stan_file)) { clean_up[x] <- stan_file[x - 1] == "" & stan_file[x] == "" } clean_up[is.na(clean_up)] <- FALSE stan_file <- stan_file[!clean_up] #### Modify the Stan data list #### # Create matrix representing whether an observation was missing or not y_observed <- matrix(NA, ncol = NCOL(jags_data$y), nrow = NROW(jags_data$y)) for (i in 1:dim(jags_data$y)[1]) { for (s in 1:dim(jags_data$y)[2]) { if (is.na(jags_data$y[i, s])) { y_observed[i, s] = 0 } else { y_observed[i, s] = 1 } } } # Use -1 for any missing observations so Stan doesn't throw errors due to NAs y <- jags_data$y y[is.na(y)] <- -1 # The data list for Stan stan_data <- jags_data stan_data$y <- y stan_data$y_observed <- y_observed stan_data$X <- t(stan_data$X) stan_data$total_obs <- NCOL(stan_data$X) stan_data$num_basis <- NROW(stan_data$X) if ( any(grepl('// priors for smoothing parameters', stan_file, fixed = TRUE)) ) { if ('L' %in% names(jags_data)) { stan_data$lambda_links <- jags_data$L stan_data$L <- NULL stan_data$n_raw_sp <- NCOL(stan_data$lambda_links) stan_data$n_sp <- NROW(stan_data$lambda_links) } else { stan_data$n_sp <- as.numeric(sub( '\\) \\{', '', sub( 'for \\(i in 1\\:', '', jags_file[grep('lambda\\[i\\] ~ ', trimws(jags_file)) - 1] ) )) } } # Add discontiguous index values if required if (add_idxs) { names(idx_vals) <- paste0('idx', seq_len(length(idx_vals))) stan_data <- append(stan_data, idx_vals) } # Add parametric prior means and precisions if required # if('p_taus' %in% names(jags_data)){ # stan_data$p_coefs <- array(jags_data$p_coefs, dim = length(jags_data$p_taus)) # stan_data$p_taus <- array(jags_data$p_taus, dim = length(jags_data$p_taus)) # } # Add bounds if required if (!is.null(upper_bounds)) { stan_data$U <- upper_bounds } return(list( stan_file = stan_file, model_data = stan_data, smooths_included = smooths_included )) } ================================================ FILE: R/add_trend_lines.R ================================================ #' Latent trend model file modifications #' #' #' @noRd #' @param model_file A template `JAGS` or `Stan` model file to be modified #' @param stan Logical (convert existing `JAGS` model to a `Stan` model?) #' @param use_lv Logical (use latent variable trends or not) #' @param trend_model The type of trend model to be added to the model file #' @param drift Logical (add drift or not) #' @return A modified `JAGS` or `Stan` model file add_trend_lines = function( model_file, stan = FALSE, use_lv, trend_model, drift ) { if (use_lv & trend_model == 'None') { trend_model <- 'RW' } # Add in necessary trend structure if (stan) { if (trend_model == 'None') { model_file <- model_file[ -c( (grep('// raw basis', model_file) + 3):(grep( '// raw basis', model_file ) + 7) ) ] model_file <- model_file[ -c( (grep('// priors for latent trend', model_file)):(grep( '// priors for latent trend', model_file ) + 2) ) ] model_file <- model_file[ -c( (grep('// trend estimates', model_file)):(grep( '// trend estimates', model_file ) + 3) ) ] model_file <- model_file[ -c( (grepws('trend[2:n', model_file, fixed = T) - 1):(grepws( 'trend[2:n', model_file, fixed = T ) + 1) ) ] model_file[grepws('y[i, s] ~', model_file, fixed = T)] <- 'y[i, s] ~ poisson_log(eta[ytimes[i, s]]);' model_file[grepws('ypred[i, s] =', model_file, fixed = T)] <- "ypred[i, s] = poisson_log_rng(eta[ytimes[i, s]]);" model_file <- model_file[ -c( (grepws('tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE) - 1):(grepws( 'tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -grepws('vector[n_series] tau', model_file, fixed = T) ] model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'GP') { hilbert_approx = T if (hilbert_approx) { if (use_lv) { model_file <- model_file[ -c( (grep('// raw basis', model_file) + 3):(grep( '// raw basis', model_file ) + 5) ) ] model_file <- model_file[ -c( (grep('// dynamic factor estimates', model_file)):(grep( '// dynamic factor estimates', model_file ) + 8) ) ] model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n] times;\n', 'vector[n] times_cent;\n', 'real mean_times;\n', 'real boundary;\n', 'int num_gp_basis;\n', 'num_gp_basis = min(20, n);\n', 'matrix[n, num_gp_basis] gp_phi;\n\n', 'for (t in 1:n){\n', 'times[t] = t;\n', '}\n\n', 'mean_times = mean(times);\n', 'times_cent = times - mean_times;\n', 'boundary = (5.0/4) * (max(times_cent) - min(times_cent));\n', 'for (m in 1:num_gp_basis){\n', 'gp_phi[,m] = phi_SE(boundary, m, times_cent);\n', '}\n' ) model_file[grep('##insert data', model_file) - 1] <- paste0( 'functions {\n', '/* Spectral density GP eigenvalues*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'real lambda_gp(real L, int m) {\n', 'real lam;\n', 'lam = ((m*pi())/(2*L))^2;\n', 'return lam;\n', '}\n\n', '/* Spectral density GP eigenfunctions*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'vector phi_SE(real L, int m, vector x) {\n', 'vector[rows(x)] fi;\n', 'fi = 1/sqrt(L) * sin(m*pi()/(2*L) * (x+L));\n', 'return fi;\n', '}\n\n', '/* Spectral density squared exponential Gaussian Process*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'real spd_SE(real alpha, real rho, real w) {\n', 'real S;\n', 'S = (alpha^2) * sqrt(2*pi()) * rho * exp(-0.5*(rho^2)*(w^2));\n', 'return S;\n', '}\n}\n' ) model_file <- readLines(textConnection(model_file), n = -1) model_file[ grep('// dynamic factor lower triangle', model_file) + 3 ] <- paste0( '// gp parameters\n', 'vector[n_lv] rho_gp;\n\n', '// gp coefficient weights\n', 'matrix[num_gp_basis, n_lv] b_gp;\n', '// smoothing parameters\n' ) model_file[ grep('vector[total_obs] eta;', model_file, fixed = TRUE) + 1 ] <- paste0( '\n// gp spectral densities\n', 'matrix[n, n_lv] LV_raw;\n', 'matrix[num_gp_basis, n_lv] diag_SPD;\n', 'matrix[num_gp_basis, n_lv] SPD_beta;\n' ) model_file[grep('eta = to_vector', model_file) + 1] <- paste0( '\n// gp LV estimates', '\nfor (m in 1:num_gp_basis){\n', 'for (s in 1:n_lv){\n', 'diag_SPD[m, s] = sqrt(spd_SE(0.25, rho_gp[s], sqrt(lambda_gp(boundary, m))));\n', '}\n}\n', 'SPD_beta = diag_SPD .* b_gp;\n', 'LV_raw = gp_phi * SPD_beta;\n}\n' ) rho_line <- 'rho_gp ~ inv_gamma(1.499007, 5.670433);\n' model_file[ grep('// priors for dynamic factor loading', model_file) - 1 ] <- paste0( '\n// priors for gp parameters\n', 'for (s in 1:n_lv){\n', 'b_gp[1:num_gp_basis, s] ~ normal(0, 1);\n', '}\n', rho_line ) model_file <- readLines(textConnection(model_file), n = -1) model_file[grep('vector[n_lv] penalty;', model_file, fixed = TRUE)] <- 'vector[n_lv] alpha_gp;' model_file[grep( 'penalty = rep_vector(100.0, n_lv);', model_file, fixed = TRUE )] <- 'alpha_gp = rep_vector(0.25, n_lv);' model_file <- model_file[ -c( (grep('// derived latent trends', model_file)):(grep( '// derived latent trends', model_file ) + 5) ) ] model_file[grep( 'LV_raw = gp_phi * SPD_beta;', model_file, fixed = TRUE )] <- paste0( 'LV_raw = gp_phi * SPD_beta;\n', '// derived latent trends\n', 'for (i in 1:n){\n', 'for (s in 1:n_series){\n', 'trend[i, s] = dot_product(lv_coefs_raw[s,], LV_raw[i,1:n_lv]);\n', '}\n}\n' ) model_file <- readLines(textConnection(model_file), n = -1) } else { model_file <- model_file[ -c( (grep('// raw basis', model_file) + 3):(grep( '// raw basis', model_file ) + 5) ) ] model_file <- model_file[ -c( (grep('// priors for latent trend', model_file)):(grep( '// priors for latent trend', model_file ) + 2) ) ] model_file <- model_file[ -c( (grep('// trend estimates', model_file)):(grep( '// trend estimates', model_file ) + 3) ) ] model_file <- model_file[ -c( (grep('trend[2:n', model_file, fixed = T) - 1):(grep( 'trend[2:n', model_file, fixed = T ) + 1) ) ] model_file <- model_file[ -c( (grep('tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE) - 1):(grep( 'tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -grep('vector[n_series] tau', model_file, fixed = T) ] model_file[grep('##insert data', model_file) + 1] <- paste0( 'transformed data {\n', 'vector[n] times;\n', 'vector[n] times_cent;\n', 'real mean_times;\n', 'real boundary;\n', 'int num_gp_basis;\n', 'num_gp_basis = min(20, n);\n', 'matrix[n, num_gp_basis] gp_phi;\n\n', 'for (t in 1:n){\n', 'times[t] = t;\n', '}\n\n', 'mean_times = mean(times);\n', 'times_cent = times - mean_times;\n', 'boundary = (5.0/4) * (max(times_cent) - min(times_cent));\n', 'for (m in 1:num_gp_basis){\n', 'gp_phi[,m] = phi_SE(boundary, m, times_cent);\n', '}\n}\n\n', 'parameters {' ) model_file[grep('##insert data', model_file) - 1] <- paste0( 'functions {\n', '/* Spectral density GP eigenvalues*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'real lambda_gp(real L, int m) {\n', 'real lam;\n', 'lam = ((m*pi())/(2*L))^2;\n', 'return lam;\n', '}\n\n', '/* Spectral density GP eigenfunctions*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'vector phi_SE(real L, int m, vector x) {\n', 'vector[rows(x)] fi;\n', 'fi = 1/sqrt(L) * sin(m*pi()/(2*L) * (x+L));\n', 'return fi;\n', '}\n\n', '/* Spectral density squared exponential Gaussian Process*/\n', '/* see Riutort-Mayol et al 2023 for details (https://doi.org/10.1007/s11222-022-10167-2)*/\n', 'real spd_SE(real alpha, real rho, real w) {\n', 'real S;\n', 'S = (alpha^2) * sqrt(2*pi()) * rho * exp(-0.5*(rho^2)*(w^2));\n', 'return S;\n', '}\n}\n' ) model_file[grep('// latent trends', model_file) + 2] <- paste0( '// gp parameters\n', 'vector[n_series] alpha_gp;\n', 'vector[n_series] rho_gp;\n\n', '// gp coefficient weights\n', 'matrix[num_gp_basis, n_series] b_gp;\n' ) model_file <- model_file[ -c( grep('// latent trends', model_file):(grep( '// latent trends', model_file ) + 1) ) ] model_file[ grep('vector[total_obs] eta;', model_file, fixed = TRUE) + 1 ] <- paste0( '\n// gp spectral densities\n', 'matrix[n, n_series] trend;\n', 'matrix[num_gp_basis, n_series] diag_SPD;\n', 'matrix[num_gp_basis, n_series] SPD_beta;\n' ) model_file[grep('eta = to_vector', model_file) + 1] <- paste0( '\n// gp trend estimates', '\nfor (m in 1:num_gp_basis){\n', 'for (s in 1:n_series){\n', 'diag_SPD[m, s] = sqrt(spd_SE(alpha_gp[s], rho_gp[s], sqrt(lambda_gp(boundary, m))));\n', '}\n}\n', 'SPD_beta = diag_SPD .* b_gp;\n', 'trend = gp_phi * SPD_beta;\n}\n' ) rho_line <- 'rho_gp ~ inv_gamma(1.499007, 5.670433);\n' alpha_line <- 'alpha_gp ~ normal(0, 0.5);\n' model_file[grep('// likelihood functions', model_file) - 1] <- paste0( '\n// priors for gp parameters\n', 'for (s in 1:n_series){\n', 'b_gp[1:num_gp_basis, s] ~ normal(0, 1);\n', '}\n', alpha_line, rho_line ) model_file <- readLines(textConnection(model_file), n = -1) } # If not Hilbert approx } else { model_file <- model_file[ -c( (grep('// raw basis', model_file) + 3):(grep( '// raw basis', model_file ) + 5) ) ] model_file <- model_file[ -c( (grep('// priors for latent trend', model_file)):(grep( '// priors for latent trend', model_file ) + 2) ) ] model_file <- model_file[ -c( (grep('// trend estimates', model_file)):(grep( '// trend estimates', model_file ) + 3) ) ] model_file <- model_file[ -c( (grep('trend[2:n', model_file, fixed = T) - 1):(grep( 'trend[2:n', model_file, fixed = T ) + 1) ) ] model_file <- model_file[ -c( (grep('tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE) - 1):(grep( 'tau[s] = pow(sigma[s], -2.0);', model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -grep('vector[n_series] tau', model_file, fixed = T) ] model_file[grep('##insert data', model_file) + 1] <- paste0( 'transformed data {\n', 'real times[n];\n', 'for (t in 1:n)\n', 'times[t] = t;\n', '}\n\n', 'parameters {' ) model_file[grep('// latent trends', model_file) + 1] <- paste0( 'vector[n_series] alpha_gp;\n', 'vector[n_series] rho_gp;\n', 'vector[n] gp_std;\n' ) model_file[grep('// basis coefficients', model_file) + 2] <- paste0( '\n\n// gp estimates\n', 'matrix[n, n_series] trend;\n', 'for (s in 1:n_series) {\n', '// gp covariance matrices\n', 'matrix[n, n] cov;\n', 'matrix[n, n] L_cov;\n', 'cov = cov_exp_quad(times, alpha_gp[s], rho_gp[s]) + diag_matrix(rep_vector(1e-10, n));\n', 'L_cov = cholesky_decompose(cov);\n', '// non-centred parameterisation\n', 'trend[1:n, s] = to_vector(L_cov * gp_std);\n', '}\n' ) model_file[ grep('// priors for smoothing parameters', model_file) + 2 ] <- paste0( '\n// priors for gp parameters\n', 'to_vector(gp_std) ~ normal(0, 1);\n', 'alpha_gp ~ normal(0, 0.5);\n', 'rho_gp ~ inv_gamma(1.499007, 5.670433);\n' ) model_file <- readLines(textConnection(model_file), n = -1) } } if (trend_model == 'RW') { if (drift) { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor drift terms\nvector[n_lv] drift;\n' ) model_file[grep('LV_raw[1, j] ~ ', model_file, fixed = T)] <- "LV_raw[1, j] ~ normal(0, 0.1);" model_file[grep('// dynamic factor estimates', model_file) + 6] <- paste0( 'LV_raw[2:n, j] ~ normal(drift[j]*(n - 1) + LV_raw[1:(n - 1), j], 0.1);' ) } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend drift terms\nvector[n_series] drift;\n' ) model_file[grep('trend[1, s] ~ ', model_file, fixed = T)] <- "trend[1, s] ~ normal(0, sigma[s]);" model_file[grep('// trend estimates', model_file) + 6] <- paste0( 'trend[2:n, s] ~ normal(drift[s]*(n - 1) + trend[1:(n - 1), s], sigma[s]);' ) } model_file[grep('model \\{', model_file) + 2] <- paste0('\n// priors for trend parameters\ndrift ~ normal(0, 0.1);\n') model_file <- readLines(textConnection(model_file), n = -1) } } if (trend_model == 'CAR1') { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;' ) model_file[grep('// dynamic factor estimates', model_file) + 6] <- paste0('LV_raw[2:n, j] ~ normal(ar1[j] * LV_raw[1:(n - 1), j], 0.1);') model_file[grep('model \\{', model_file) + 2] <- paste0('\n// priors for AR parameters\nar1 ~ std_normal();\n') } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;' ) model_file[grep('// trend estimates', model_file) + 6] <- paste0( 'trend[2:n, s] ~ normal(ar1[s] * trend[1:(n - 1), s], sigma[s]);' ) model_file[grep('model \\{', model_file) + 2] <- paste0('\n// priors for AR parameters\nar1 ~ std_normal();\n') } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR1') { if (drift) { if (use_lv) { model_file[grepws('// raw basis', model_file) + 1] <- paste0( c( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;\n\n' ), '// latent factor drift terms\nvector[n_lv] drift;' ) model_file[grepws('LV_raw[1, j] ~ ', model_file, fixed = T)] <- "LV_raw[1, j] ~ normal(0, 0.1);" model_file[grepws('// dynamic factor estimates', model_file) + 6] <- paste0( 'LV_raw[2:n, j] ~ normal(drift[j]*(n - 1) + ar1[j] * LV_raw[1:(n - 1), j], 0.1);' ) } else { model_file[grepws('// raw basis', model_file) + 1] <- paste0( c( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;\n\n' ), '// latent trend drift terms\nvector[n_series] drift;' ) model_file[grepws('trend[1, s] ~ ', model_file, fixed = T)] <- "trend[1, s] ~ normal(0, sigma[s]);" model_file[grep('// trend estimates', model_file) + 6] <- paste0( 'trend[2:n, s] ~ normal(drift[s]*(n - 1) + ar1[s] * trend[1:(n - 1), s], sigma[s]);' ) } model_file[grep('model \\{', model_file) + 2] <- paste0( '\n// priors for AR parameters\nar1 ~ std_normal();\ndrift ~ std_normal();\n' ) } else { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;' ) model_file[grep('// dynamic factor estimates', model_file) + 6] <- paste0( 'LV_raw[2:n, j] ~ normal(ar1[j] * LV_raw[1:(n - 1), j], 0.1);' ) model_file[grep('model \\{', model_file) + 2] <- paste0('\n// priors for AR parameters\nar1 ~ std_normal();\n') } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;' ) model_file[grep('// trend estimates', model_file) + 6] <- paste0( 'trend[2:n, s] ~ normal(ar1[s] * trend[1:(n - 1), s], sigma[s]);' ) model_file[grep('model \\{', model_file) + 2] <- paste0('\n// priors for AR parameters\nar1 ~ std_normal();\n') } } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR2') { if (drift) { if (use_lv) { model_file[grepws('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;\n\n', '// latent factor AR2 terms\nvector[n_lv] ar2;\n\n', '// latent factor drift terms\nvector[n_lv] drift;' ) model_file[grepws('LV_raw[1, j] ~ ', model_file, fixed = T)] <- "LV_raw[1, j] ~ normal(0, 0.1);" model_file <- model_file[ -(grep('// dynamic factor estimates', model_file) + 5:7) ] model_file[grep('// dynamic factor estimates', model_file) + 5] <- paste0( 'for (j in 1:n_lv) {\n', 'LV_raw[2, j] ~ normal(drift[j] + LV_raw[1, j] * ar1[j], 0.1);\n', '}\n\n', 'for (i in 3:n) {\n', 'for (j in 1:n_lv) {\n', 'LV_raw[i, j] ~ normal(drift[j]*(i - 1) + ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j], 0.1);\n', '}\n}\n' ) } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;\n\n', '// latent trend AR2 terms\nvector[n_series] ar2;\n\n', '// latent trend drift terms\nvector[n_series] drift;' ) model_file[grepws('trend[1, s] ~ ', model_file, fixed = T)] <- "trend[1, s] ~ normal(0, sigma[s]);" model_file <- model_file[ -(grep('// trend estimates', model_file) + 5:7) ] model_file[grep('// trend estimates', model_file) + 5] <- paste0( 'for (s in 1:n_series) {\n', 'trend[2, s] ~ normal(drift[s] + trend[1, s] * ar1[s], sigma[s]);\n', '}\n\n', 'for (i in 3:n) {\n', 'for (s in 1:n_series) {\n', 'trend[i, s] ~ normal(drift[s]*(i - 1) + ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s], sigma[s]);\n', '}\n}\n' ) } model_file[grep('model \\{', model_file) + 2] <- paste0( '\n// priors for AR parameters\nar1 ~ std_normal();\nar2 ~ std_normal();\ndrift ~ std_normal();\n' ) } else { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;\n\n', '// latent factor AR2 terms\nvector[n_lv] ar2;' ) model_file[grep('// dynamic factor estimates', model_file) + 2] <- paste0('LV_raw[1, j] ~ normal(0, 0.1);') model_file <- model_file[ -(grep('// dynamic factor estimates', model_file) + 5:7) ] model_file[grep('// dynamic factor estimates', model_file) + 5] <- paste0( 'for (j in 1:n_lv) {\n', 'LV_raw[2, j] ~ normal(LV_raw[1, j] * ar1[j], 0.1);\n', '}\n\n', 'for (i in 3:n) {\n', 'for (j in 1:n_lv) {\n', 'LV_raw[i, j] ~ normal(ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j], 0.1);\n', '}\n}\n' ) } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;\n\n', '// latent trend AR2 terms\nvector[n_series] ar2;' ) model_file[grep('// trend estimates', model_file) + 2] <- paste0('trend[1, s] ~ normal(0, sigma[s]);') model_file <- model_file[ -(grep('// trend estimates', model_file) + 5:7) ] model_file[grep('// trend estimates', model_file) + 5] <- paste0( 'for (s in 1:n_series) {\n', 'trend[2, s] ~ normal(trend[1, s] * ar1[s], sigma[s]);\n', '}\n\n', 'for (i in 3:n) {\n', 'for (s in 1:n_series) {\n', 'trend[i, s] ~ normal(ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s], sigma[s]);\n', '}\n}\n' ) } model_file[grep('model \\{', model_file) + 2] <- paste0( '\n// priors for AR parameters\nar1 ~ std_normal();\nar2 ~ std_normal();\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR3') { if (drift) { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;\n\n', '// latent factor AR2 terms\nvector[n_lv] ar2;\n\n', '// latent factor AR3 terms\nvector[n_lv] ar3;\n\n', '// latent factor drift terms\nvector[n_lv] drift;' ) model_file[grep('LV_raw[1, s] ~ ', model_file, fixed = T)] <- "LV_raw[1, s] ~ normal(0, 0.1);" model_file <- model_file[ -(grep('// dynamic factor estimates', model_file) + 5:7) ] model_file[grep('// dynamic factor estimates', model_file) + 5] <- paste0( 'for (j in 1:n_lv) {\n', 'LV_raw[2, j] ~ normal(drift[j] + LV_raw[1, j] * ar1[j], 0.1);\n', '}\n\n', 'for (j in 1:n_lv) {\n', 'LV_raw[3, j] ~ normal(drift[j]*2 + LV_raw[2, j] * ar1[j] + LV_raw[1, j] * ar2[j], 0.1);\n', '}\n\n', 'for (i in 4:n) {\n', 'for (j in 1:n_lv) {\n', 'LV_raw[i, j] ~ normal(drift[j]*(i - 1) + ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j] + ar3[j] * LV_raw[i - 3, j], 0.1);\n', '}\n}\n' ) } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;\n\n', '// latent trend AR2 terms\nvector[n_series] ar2;\n\n', '// latent trend AR3 terms\nvector[n_series] ar3;\n\n', '// latent trend drift terms\nvector[n_series] drift;' ) model_file[grep('trend[1, s] ~ ', model_file, fixed = T)] <- "trend[1, s] ~ normal(0, sigma[s]);" model_file <- model_file[ -(grep('// trend estimates', model_file) + 5:7) ] model_file[grep('// trend estimates', model_file) + 5] <- paste0( 'for (s in 1:n_series) {\n', 'trend[2, s] ~ normal(drift[s] + trend[1, s] * ar1[s], sigma[s]);\n', '}\n\n', 'for (s in 1:n_series) {\n', 'trend[3, s] ~ normal(drift[s]*2 + trend[2, s] * ar1[s] + trend[1, s] * ar2[s], sigma[s]);\n', '}\n\n', 'for (i in 4:n) {\n', 'for (s in 1:n_series) {\n', 'trend[i, s] ~ normal(drift[s]*(i - 1) + ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s] + ar3[s] * trend[i - 3, s], sigma[s]);\n', '}\n}\n' ) } model_file[grep('model \\{', model_file) + 2] <- paste0( '\n// priors for AR parameters\nar1 ~ std_normal();\nar2 ~ std_normal();\nar3 ~ std_normal();\n', 'drift ~ std_normal();\n' ) } else { if (use_lv) { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent factor AR1 terms\nvector[n_lv] ar1;\n\n', '// latent factor AR2 terms\nvector[n_lv] ar2;\n\n', '// latent factor AR3 terms\nvector[n_lv] ar3;' ) model_file[grep('// dynamic factor estimates', model_file) + 2] <- paste0('LV_raw[1, j] ~ normal(0, 0.1);') model_file <- model_file[ -(grep('// dynamic factor estimates', model_file) + 5:7) ] model_file[grep('// dynamic factor estimates', model_file) + 5] <- paste0( 'for (j in 1:n_lv) {\n', 'LV_raw[2, j] ~ normal(LV_raw[1, j] * ar1[j], 0.1);\n', '}\n\n', 'for (j in 1:n_lv) {\n', 'LV_raw[3, j] ~ normal(LV_raw[2, j] * ar1[j] + LV_raw[1, j] * ar2[j], 0.1);\n', '}\n\n', 'for (i in 4:n) {\n', 'for (j in 1:n_lv) {\n', 'LV_raw[i, j] ~ normal(ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j] + ar3[j] * LV_raw[i - 3, j], 0.1);\n', '}\n}\n' ) } else { model_file[grep('// raw basis', model_file) + 1] <- paste0( 'row_vector[num_basis] b_raw;\n\n// latent trend AR1 terms\nvector[n_series] ar1;\n\n', '// latent trend AR2 terms\nvector[n_series] ar2;\n\n', '// latent trend AR3 terms\nvector[n_series] ar3;' ) model_file[grep('// trend estimates', model_file) + 2] <- paste0('trend[1, s] ~ normal(0, sigma[s]);') model_file <- model_file[ -(grep('// trend estimates', model_file) + 5:7) ] model_file[grep('// trend estimates', model_file) + 5] <- paste0( 'for (s in 1:n_series) {\n', 'trend[2, s] ~ normal(trend[1, s] * ar1[s], sigma[s]);\n', '}\n\n', 'for (s in 1:n_series) {\n', 'trend[3, s] ~ normal(trend[2, s] * ar1[s] + trend[1, s] * ar2[s], sigma[s]);\n', '}\n\n', 'for (i in 4:n) {\n', 'for (s in 1:n_series) {\n', 'trend[i, s] ~ normal(ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s] + ar3[s] * trend[i - 3, s], sigma[s]);\n', '}\n}\n' ) } model_file[grep('model \\{', model_file) + 2] <- paste0( '\n// priors for AR parameters\nar1 ~ std_normal();\nar2 ~ std_normal();\nar3 ~ std_normal();\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } } else { # Modify the JAGS model if (trend_model == 'None') { model_file[grep( 'mus\\[i, s\\] <- exp', model_file )] <- 'mus[i, s] <- exp(eta[ytimes[i, s]])' model_file <- model_file[ -c( grep('## trend estimates', model_file):(grep( '## trend estimates', model_file ) + 27) ) ] } if (use_lv) { if (trend_model == 'RW') { model_file <- model_file[ -c( (grep('## latent factors evolve', model_file) + 6):(grep( '## latent factors evolve', model_file ) + 19) ) ] if (drift) { model_file[grep('## latent factors evolve', model_file) + 5] <- '\nfor (i in 2:n) {\nfor (j in 1:n_lv){\nLV_raw[i, j] ~ dnorm(drift[j]*(n - 1) + LV_raw[i - 1, j], penalty[j])\n}\n}\n' } else { model_file[grep('## latent factors evolve', model_file) + 5] <- '\nfor (i in 2:n) {\nfor (j in 1:n_lv){\nLV_raw[i, j] ~ dnorm(LV_raw[i - 1, j], penalty[j])\n}\n}\n' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[ -c( (grep('## AR components', model_file)):(grep( '## AR components', model_file ) + 5) ) ] } if (trend_model == 'AR1') { model_file <- model_file[ -c( (grep('## latent factors evolve', model_file) + 6):(grep( '## latent factors evolve', model_file ) + 12) ) ] model_file[grep('## latent factors evolve', model_file) + 7] <- 'for (i in 2:n) {' model_file <- model_file[ -c( (grep('## latent factors evolve', model_file) + 9):(grep( '## latent factors evolve', model_file ) + 10) ) ] if (drift) {} else { model_file[grep('## latent factors evolve', model_file) + 9] <- 'LV_raw[i, j] ~ dnorm(ar1[j]*LV_raw[i - 1, j], penalty[j])\n}' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[-grep('ar2\\[s\\] ~', model_file)] model_file <- model_file[-grep('ar3\\[s\\] ~', model_file)] } if (trend_model == 'AR2') { model_file <- model_file[ -c( (grep('## latent factors evolve', model_file) + 10):(grep( '## latent factors evolve', model_file ) + 12) ) ] model_file[grep('## latent factors evolve', model_file) + 11] <- 'for (i in 3:n) {' if (drift) { model_file[grep('## latent factors evolve', model_file) + 14] <- 'ar2[j]*LV_raw[i - 2, j], penalty[j])' } else { model_file[grep('## latent factors evolve', model_file) + 7] <- 'LV_raw[2, j] ~ dnorm(ar1[j]*LV_raw[1, j], penalty[j])' model_file[grep('## latent factors evolve', model_file) + 13] <- 'LV_raw[i, j] ~ dnorm(ar1[j]*LV_raw[i - 1, j] +' model_file[grep('## latent factors evolve', model_file) + 14] <- 'ar2[j]*LV_raw[i - 2, j], penalty[j])' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[-grep('ar3\\[s\\] ~', model_file)] } if (trend_model == 'AR3') { if (drift) {} else { model_file[grep('## latent factors evolve', model_file) + 7] <- 'LV_raw[2, j] ~ dnorm(ar1[j]*LV_raw[1, j], penalty[j])' model_file[grep('## latent factors evolve', model_file) + 11] <- 'LV_raw[3, j] ~ dnorm(ar1[j]*LV_raw[2, j] + ar2[j]*LV_raw[1, j], penalty[j])' model_file[grep('## latent factors evolve', model_file) + 16] <- 'LV_raw[i, j] ~ dnorm(ar1[j]*LV_raw[i - 1, j] +' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) } } if (!use_lv) { if (trend_model == 'RW') { model_file <- model_file[ -c( (grep('## trend estimates', model_file) + 4):(grep( '## trend estimates', model_file ) + 17) ) ] if (drift) { model_file[grep('## trend estimates', model_file) + 2] <- "trend[1, s] ~ dnorm(drift[s], tau[s])" model_file[grep('## trend estimates', model_file) + 4] <- '\nfor (i in 2:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(drift[s]*(i - 1) + trend[i - 1, s], tau[s])\n}\n}\n' } else { model_file[grep('## trend estimates', model_file) + 4] <- '\nfor (i in 2:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(trend[i - 1, s], tau[s])\n}\n}\n' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[-grep('ar1\\[s\\] ~', model_file)] model_file <- model_file[-grep('ar2\\[s\\] ~', model_file)] model_file <- model_file[-grep('ar3\\[s\\] ~', model_file)] } if (trend_model == 'AR1') { model_file <- model_file[ -c( (grep('## trend estimates', model_file) + 4):(grep( '## trend estimates', model_file ) + 17) ) ] if (drift) { model_file[grep('## trend estimates', model_file) + 2] <- "trend[1, s] ~ dnorm(drift[s], tau[s])" model_file[grep('## trend estimates', model_file) + 4] <- '\nfor (i in 2:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(drift[s]*(i - 1) + ar1[s]*trend[i - 1, s], tau[s])\n}\n}\n' } else { model_file[grep('## trend estimates', model_file) + 4] <- '\nfor (i in 2:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(ar1[s]*trend[i - 1, s], tau[s])\n}\n}\n' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[-grep('ar2\\[s\\] ~', model_file)] model_file <- model_file[-grep('ar3\\[s\\] ~', model_file)] } if (trend_model == 'AR2') { model_file <- model_file[ -c( (grep('## trend estimates', model_file) + 9):(grep( '## trend estimates', model_file ) + 17) ) ] if (drift) { model_file[grep('## trend estimates', model_file) + 2] <- "trend[1, s] ~ dnorm(drift[s], tau[s])" model_file[grep('## trend estimates', model_file) + 9] <- '\nfor (i in 3:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(drift[s]*(i - 1) + ar1[s]*trend[i - 1, s] + ar2[s]*trend[i - 2, s], tau[s])\n}\n}\n' } else { model_file[grep('## trend estimates', model_file) + 6] <- 'trend[2, s] ~ dnorm(ar1[s]*trend[1, s], tau[s])' model_file[grep('## trend estimates', model_file) + 9] <- '\nfor (i in 3:n) {\nfor (s in 1:n_series){\ntrend[i, s] ~ dnorm(ar1[s]*trend[i - 1, s] + ar2[s]*trend[i - 2, s], tau[s])\n}\n}\n' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[-grep('ar3\\[s\\] ~', model_file)] } if (trend_model == 'AR3') { if (drift) { model_file[grep('## trend estimates', model_file) + 2] <- "trend[1, s] ~ dnorm(drift[s], tau[s])" } else { model_file[grep('## trend estimates', model_file) + 6] <- 'trend[2, s] ~ dnorm(ar1[s]*trend[1, s], tau[s])' model_file[grep('## trend estimates', model_file) + 10] <- 'trend[3, s] ~ dnorm(ar1[s]*trend[2, s] + ar2[s]*trend[1, s], tau[s])' model_file[grep('## trend estimates', model_file) + 15] <- 'trend[i, s] ~ dnorm(ar1[s]*trend[i - 1, s] + ar2[s]*trend[i - 2, s] + ar3[s]*trend[i - 3, s], tau[s])' model_file <- model_file[-grep('drift\\[s\\] ~', model_file)] } model_file <- readLines(textConnection(model_file), n = -1) } } } return(model_file) } ================================================ FILE: R/add_tweedie_lines.R ================================================ #' Tweedie JAGS modifications #' #' #' @param model_file A template `JAGS` model file to be modified #' @param upper_bounds Optional upper bounds for the truncated observation likelihood #' @return A modified `JAGS` model file #' @noRd add_tweedie_lines = function(model_file, upper_bounds) { rate_begin <- grep('rate\\[i, s\\] <- ', model_file) rate_end <- rate_begin + 1 model_file <- model_file[-c(rate_begin:rate_end)] odis_begin <- grep('phi\\[s\\] <- ', model_file) - 4 odis_end <- odis_begin + 7 model_file <- model_file[-c(odis_begin:odis_end)] if (missing(upper_bounds)) { model_file[grep( 'y\\[i, s\\] ~', model_file )] <- ' y[i, s] ~ dpois(linpred[i, s])\n linpred[i, s] ~' model_file[grep( 'ypred\\[i, s\\] ~', model_file )] <- ' ypred[i, s] ~ dpois(linpred[i, s])' } else { model_file[grep( 'y\\[i, s\\] ~', model_file )] <- ' y[i, s] ~ dpois(linpred[i, s])T(, upper_bound[s])\n linpred[i, s] ~' model_file[grep( 'ypred\\[i, s\\] ~', model_file )] <- ' ypred[i, s] ~ dpois(linpred[i, s])T(, upper_bound[s])' } model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'linpred\\[i, s\\] ~', model_file )] <- ' linpred[i, s] ~ dgamma(shape[i, s, y_ind[i, s]], rate[i, s])\n twlambda[i, s] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'twlambda\\[i, s\\] <-', model_file )] <- ' twlambda[i, s] <- pow(mus[i, s], 2 - p) / (phi[s] * (2 - p))\n N_pois[i, s] ~' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'N_pois\\[i, s\\] ~', model_file )] <- ' N_pois[i, s] ~ dpois(twlambda[i, s])T(1,)\n shape[i, s, 1] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'shape\\[i, s, 1\\] <-', model_file )] <- ' shape[i, s, 1] <- N_pois[i, s] * ((2 - p) / (p - 1))\n shape[i, s, 2] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'shape\\[i, s, 2\\] <-', model_file )] <- ' shape[i, s, 2] <- 1\n rate[i, s] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'rate\\[i, s\\] <-', model_file )] <- ' rate[i, s] <- 1 / (phi[s] * (p - 1) * pow(mus[i, s], p - 1))\n pois_draw[i, s] ~' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'pois_draw\\[i, s\\] ~', model_file )] <- ' pois_draw[i, s] ~ dpois(mus[i, s])\n is_zero[i, s] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'is_zero\\[i, s\\] <-', model_file )] <- ' is_zero[i, s] <- equals(pois_draw[i, s], 0)\n y_ind[i, s] <-' model_file <- readLines(textConnection(model_file), n = -1) model_file[grep( 'y_ind\\[i, s\\] <-', model_file )] <- ' y_ind[i, s] <- is_zero[i, s] + 1' model_file <- readLines(textConnection(model_file), n = -1) yind_begin <- grep('y_ind\\[i, s\\] <-', model_file) prior_line <- yind_begin + 2 model_file[ prior_line ] <- '}\n\n## Tweedie power and overdispersion parameters\np <- 1.5\nfor (s in 1:n_series) {\n phi_raw[s] ~ dnorm(0, 2)T(-3.5, 3.5);\n phi[s] <- exp(phi_raw[s])\n}' model_file <- readLines(textConnection(model_file), n = -1) return(model_file) } ================================================ FILE: R/all_neon_tick_data.R ================================================ #' NEON Amblyomma and Ixodes tick abundance survey data #' #' A dataset containing timeseries of Amblyomma americanum and Ixodes scapularis nymph abundances at NEON sites #' #' @format A tibble/dataframe containing covariate information alongside the main fields of: #' \describe{ #' \item{Year}{Year of sampling} #' \item{epiWeek}{Epidemiological week of sampling} #' \item{plot_ID}{NEON plot ID for survey location} #' \item{siteID}{NEON site ID for survey location} #' \item{amblyomma_americanum}{Counts of A. americanum nymphs} #' \item{ixodes_scapularis}{Counts of I. scapularis nymphs} #' } #' @source \url{https://www.neonscience.org/data} "all_neon_tick_data" ================================================ FILE: R/as.data.frame.mvgam.R ================================================ #' @title Extract posterior draws from fitted \pkg{mvgam} objects #' #' @name mvgam_draws #' #' @description #' Extract posterior draws in conventional formats as data.frames, matrices, #' or arrays. #' #' @param x \code{list} object of class `mvgam` #' #' @param variable A character specifying which parameters to extract. Can #' either be one of the following options: #' \itemize{ #' \item `obs_params` (other parameters specific to the observation model, #' such as overdispersions for negative binomial models or observation error #' SD for gaussian / student-t models) #' \item `betas` (beta coefficients from the GAM observation model linear #' predictor; default) #' \item `smooth_params` (smoothing parameters from the GAM observation model) #' \item `linpreds` (estimated linear predictors on whatever link scale was #' used in the model) #' \item `trend_params` (parameters governing the trend dynamics, such as AR #' parameters, trend SD parameters or Gaussian Process parameters) #' \item `trend_betas` (beta coefficients from the GAM latent process model #' linear predictor; only available if a `trend_formula` was supplied in the #' original model) #' \item `trend_smooth_params` (process model GAM smoothing parameters; only #' available if a `trend_formula` was supplied in the original model) #' \item `trend_linpreds` (process model linear predictors on the identity #' scale; only available if a `trend_formula` was supplied in the original #' model) #' } #' OR can be a character vector providing the variables to extract. #' #' @param regex Logical. If not using one of the prespecified options for #' extractions, should `variable` be treated as a (vector of) regular #' expressions? Any variable in `x` matching at least one of the regular #' expressions will be selected. Defaults to `FALSE`. #' #' @param use_alias Logical. If more informative names for parameters are #' available (i.e. for beta coefficients `b` or for smoothing parameters `rho`), #' replace the uninformative names with the more informative alias. Defaults to #' `TRUE`. #' #' @param inc_warmup Should warmup draws be included? Defaults to \code{FALSE}. #' #' @param row.names Ignored #' #' @param optional Ignored #' #' @param ... Ignored #' #' @return A `data.frame`, `matrix`, or `array` containing the posterior draws. #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' sim <- sim_mvgam(family = Gamma()) #' #' mod1 <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' data = sim$data_train, #' family = Gamma(), #' chains = 2, #' silent = 2 #' ) #' #' beta_draws_df <- as.data.frame(mod1, variable = 'betas') #' head(beta_draws_df) #' str(beta_draws_df) #' #' beta_draws_mat <- as.matrix(mod1, variable = 'betas') #' head(beta_draws_mat) #' str(beta_draws_mat) #' #' shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) #' head(shape_pars) #' } NULL #'@rdname mvgam_draws #'@export as.data.frame.mvgam = function( x, row.names = NULL, optional = TRUE, variable = 'betas', use_alias = TRUE, regex = FALSE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') post <- as.data.frame(dummy, variable = extract_pars$to_extract) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { colnames(post) <- extract_pars$newnames } } return(post) } #'@rdname mvgam_draws #'@export as.matrix.mvgam = function( x, variable = 'betas', regex = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') post <- as.matrix(dummy, variable = extract_pars$to_extract) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { colnames(post) <- extract_pars$newnames } } return(post) } #'@rdname mvgam_draws #'@export as.array.mvgam = function( x, variable = 'betas', regex = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') post <- as.array(dummy, variable = extract_pars$to_extract) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { dimnames(post)$variable <- extract_pars$newnames } } return(post) } #' @rdname mvgam_draws #' @method as_draws mvgam #' @export as_draws.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract post <- as_draws_list( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { for (chain in seq_along(post)) { names(post[[chain]]) <- extract_pars$newnames } } } return(post) } #' @export #' @importFrom posterior as_draws posterior::as_draws #' @rdname mvgam_draws #' @method as_draws_matrix mvgam #' @export as_draws_matrix.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract post <- as_draws_matrix( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { colnames(post) <- extract_pars$newnames } } return(post) } #' @export #' @importFrom posterior as_draws_matrix posterior::as_draws_matrix #' @rdname mvgam_draws #' @method as_draws_df mvgam #' @export as_draws_df.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract post <- as_draws_df( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { colnames(post)[1:length(extract_pars$newnames)] <- extract_pars$newnames } } return(post) } #' @export #' @importFrom posterior as_draws_df posterior::as_draws_df #' @rdname mvgam_draws #' @method as_draws_array mvgam #' @export as_draws_array.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract post <- as_draws_array( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { dimnames(post)$variable <- extract_pars$newnames } } return(post) } #' @export #' @importFrom posterior as_draws_array posterior::as_draws_array #' @rdname mvgam_draws #' @method as_draws_list mvgam #' @export as_draws_list.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract post <- as_draws_list( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) # Rename if needed if (use_alias) { if (!is.null(extract_pars$newnames)) { for (chain in seq_along(post)) { names(post[[chain]]) <- extract_pars$newnames } } } return(post) } #' @export #' @importFrom posterior as_draws_list posterior::as_draws_list #' @rdname mvgam_draws #' @method as_draws_rvars mvgam #' @export as_draws_rvars.mvgam <- function( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ... ) { # Check variable and get more informative names if applicable extract_pars <- validate_variables(x, variable = variable, regex = regex) # Create a slim brmsfit object and use brms machinery to do extraction dummy <- structure(list(fit = x$model_output), class = 'brmsfit') # Extract (can't rename rvars due to the way it is structured) post <- as_draws_rvars( dummy, variable = extract_pars$to_extract, regex = FALSE, inc_warmup = inc_warmup, ... ) return(post) } #' @export #' @importFrom posterior as_draws_rvars posterior::as_draws_rvars #'@noRd validate_variables = function(x, variable, regex = FALSE) { # Get a string of all possible variables to extract all_vars <- variables(x) all_orig_vars <- unlist(purrr::map(all_vars, 'orig_name')) all_alias_vars <- unlist(purrr::map(all_vars, 'alias')) all_orig_walias <- all_orig_vars[!is.na(all_alias_vars)] all_alias_vars <- all_alias_vars[!is.na(all_alias_vars)] # All possible var sets to extract extract_choices = c( "obs_params", "betas", "smooth_params", "linpreds", "trend_params", "trend_betas", "trend_smooth_params", "trend_linpreds", all_orig_vars, all_alias_vars ) if (variable[1] == 'obs_params') { to_extract <- family_param_info(x$family)$param_names newnames <- NULL } if (variable[1] == 'betas') { to_extract <- 'b' newnames <- names(coef(x$mgcv_model)) } if (variable[1] == 'smooth_params') { if (is.null(all_vars$observation_smoothpars)) { stop( 'No observation-level smooth parameters in model; no smooth_params to extract', call. = FALSE ) } to_extract <- 'rho' newnames <- paste0(x$sp_names, '_rho') } if (variable[1] == 'linpreds') { to_extract <- 'mus' newnames <- NULL } if (variable[1] == 'trend_params') { to_extract <- trend_par_names( attr(x$model_data, 'trend_model'), x$use_lv, x$drift ) to_extract <- to_extract[ !to_extract %in% c('tau', 'trend', 'LV', 'penalty', 'lv_coefs') ] # Determine which other trend params to include included <- vector(length = length(to_extract)) for (i in 1:length(to_extract)) { # Check if it can be extracted suppressWarnings( estimates <- try( mcmc_chains(x$model_output, params = to_extract[i]), silent = TRUE ) ) if (inherits(estimates, 'try-error')) { included[i] <- FALSE } else { included[i] <- TRUE } } to_extract <- to_extract[included] newnames <- NULL } if (variable[1] == 'trend_betas') { if (is.null(x$trend_call)) { stop( 'No trend_formula supplied to model; no trend_betas to extract', call. = FALSE ) } to_extract <- 'b_trend' newnames <- paste0(names(coef(x$trend_mgcv_model)), '_trend') } if (variable[1] == "trend_smooth_params") { if (is.null(all_vars$trend_smoothpars)) { stop( 'No smoothing parameters included in trend-level model', call. = FALSE ) } to_extract <- 'rho_trend' newnames <- paste0( unlist(purrr::map(x$trend_mgcv_model$smooth, 'label')), '_rho_trend' ) } if (variable[1] == 'trend_linpreds') { if (is.null(x$trend_call)) { stop( 'No trend_formula supplied to model; no trend_linpreds to extract', call. = FALSE ) } to_extract <- 'trend_mus' newnames <- NULL } # If not one of the standard subsets, get aliases for the chosen variable(s) if ( !variable[1] %in% c( "obs_params", "betas", "smooth_params", "linpreds", "trend_params", "trend_betas", "trend_smooth_params", "trend_linpreds" ) ) { if (regex) { vars_to_extract <- vector(mode = 'list') names_to_use <- vector(mode = 'list') for (i in 1:length(variable)) { if (!any(grepl(variable[i], extract_choices))) { vars_to_extract[[i]] <- NA names_to_use[[i]] <- NA } else { if (any(grepl(variable[i], all_alias_vars))) { vars_to_extract[[i]] <- unname(unlist(purrr::map( all_vars, 'orig_name' ))[ grepl(variable[i], unlist(purrr::map(all_vars, 'alias'))) ]) names_to_use[[i]] <- unname(unlist(purrr::map(all_vars, 'alias'))[ grepl(variable[i], unlist(purrr::map(all_vars, 'alias'))) ]) } else { vars_to_extract[[i]] <- unname(unlist(purrr::map( all_vars, 'orig_name' ))[ grepl(variable[i], unlist(purrr::map(all_vars, 'orig_name'))) ]) names_to_use[[i]] <- unname(unlist(purrr::map(all_vars, 'alias'))[ grepl(variable[i], unlist(purrr::map(all_vars, 'orig_name'))) ]) } } } } else { vars_to_extract <- vector(mode = 'list') names_to_use <- vector(mode = 'list') for (i in 1:length(variable)) { if (!any(extract_choices == variable[i])) { vars_to_extract[[i]] <- NA names_to_use[[i]] <- NA } else { if (any(all_alias_vars == variable[i])) { vars_to_extract[[i]] <- unname(all_orig_walias[ which(all_alias_vars == variable[i]) ]) names_to_use[[i]] <- variable[i] } else { vars_to_extract[[i]] <- unname(unlist(purrr::map( all_vars, 'orig_name' ))[ which(all_orig_vars == variable[i]) ]) names_to_use[[i]] <- unname(unlist(purrr::map(all_vars, 'alias'))[ which(all_orig_vars == variable[i]) ]) } } } } vars_to_extract <- unlist(vars_to_extract) names_to_use <- unlist(names_to_use) names_to_use[is.na(names_to_use)] <- vars_to_extract[is.na(names_to_use)] if (all(is.na(vars_to_extract))) { stop( 'could not find any variables matching the supplied patterns', call. = FALSE ) } to_extract <- vars_to_extract[!is.na(vars_to_extract)] newnames <- names_to_use[!is.na(names_to_use)] } return(list(to_extract = to_extract, newnames = newnames)) } ================================================ FILE: R/backends.R ================================================ #### Helper functions for preparing and manipulating Stan models #### # All functions were modified from `brms` source code and so all credit must # go to the `brms` development team #' parse Stan model code with cmdstanr #' @param model Stan model code #' @return validated Stan model code #' @noRd .model_cmdstanr <- function(model_file, threads = 1, silent = 1, ...) { if (silent < 2) { message('Compiling Stan program using cmdstanr') message() } if (cmdstanr::cmdstan_version() < "2.26.0") { warning( 'Your version of Cmdstan is < 2.26.0; some mvgam models may not work properly!' ) } temp_file <- cmdstanr::write_stan_file(model_file) if (cmdstanr::cmdstan_version() >= "2.29.0") { if (threads > 1) { out <- eval_silent( cmdstanr::cmdstan_model( temp_file, stanc_options = list('O1'), cpp_options = list(stan_threads = TRUE), ... ), type = "message", try = TRUE, silent = silent > 0L ) } else { out <- eval_silent( cmdstanr::cmdstan_model(temp_file, stanc_options = list('O1'), ...), type = "message", try = TRUE, silent = silent > 0L ) } } else { if (threads > 1) { out <- eval_silent( cmdstanr::cmdstan_model( temp_file, cpp_options = list(stan_threads = TRUE), ... ), type = "message", try = TRUE, silent = silent ) } else { out <- eval_silent( cmdstanr::cmdstan_model(temp_file, ...), type = "message", try = TRUE, silent = silent ) } } return(out) } #' fit Stan model with cmdstanr using HMC sampling or variational inference #' @importFrom brms read_csv_as_stanfit #' @param model a compiled Stan model #' @param data named list to be passed to Stan as data #' @return a fitted Stan model #' @noRd .sample_model_cmdstanr <- function( model, algorithm = 'sampling', prior_simulation = FALSE, data, chains = 4, parallel = TRUE, silent = 1L, max_treedepth, adapt_delta, threads = 1, burnin, samples, param = param, save_all_pars = FALSE, ... ) { if (algorithm == 'pathfinder') { if (cmdstanr::cmdstan_version() < "2.33") { stop( 'Your version of Cmdstan is < 2.33; the "pathfinder" algorithm is not available', call. = FALSE ) } if (utils::packageVersion('cmdstanr') < '0.6.1.9000') { stop( 'Your version of cmdstanr is < 0.6.1.9000; the "pathfinder" algorithm is not available', call. = FALSE ) } } # Construct cmdstanr sampling arguments args <- nlist(data = data) dots <- list(...) args[names(dots)] <- dots if (prior_simulation) { burnin <- 200 } # do the actual sampling if (silent < 2) { message("Start sampling") } if (algorithm == 'sampling') { c(args) <- nlist( chains = chains, refresh = 100, max_treedepth, adapt_delta, diagnostics = NULL, iter_sampling = samples, iter_warmup = burnin, show_messages = silent < 2 ) if (utils::packageVersion('cmdstanr') >= '0.7.0') { c(args) <- nlist(show_exceptions = silent == 0) } if (parallel) { c(args) <- nlist( parallel_chains = min(c(chains, parallel::detectCores() - 1)) ) } if (threads > 1) { c(args) <- nlist(threads_per_chain = threads) } out <- do_call(model$sample, args) } else if (algorithm %in% c("fullrank", "meanfield")) { c(args) <- nlist( algorithm = algorithm, refresh = 500, output_samples = samples ) if (threads > 1) { c(args) <- nlist(threads = threads) } out <- do_call(model$variational, args) } else if (algorithm %in% c("laplace")) { c(args) <- nlist(refresh = 500, draws = samples) if (threads > 1) { c(args) <- nlist(threads = threads) } out <- do_call(model$laplace, args) } else if (algorithm %in% c("pathfinder")) { c(args) <- nlist(refresh = 500, draws = samples) if (threads > 1) { c(args) <- nlist(num_threads = threads) } out <- do_call(model$pathfinder, args) } else { stop("Algorithm '", algorithm, "' is not supported.", call. = FALSE) } if (algorithm %in% c('meanfield', 'fullrank', 'laplace', 'pathfinder')) { param <- param[!param %in% 'lp__'] } # Convert model files to stan_fit class for consistency repair_names <- function(x) { x <- sub("\\.", "[", x) x <- gsub("\\.", ",", x) x[grep("\\[", x)] <- paste0(x[grep("\\[", x)], "]") x } if (save_all_pars) { out_gam_mod <- brms::read_csv_as_stanfit( out$output_files(), algorithm = algorithm ) } else { # Exclude certain pars and transformed_pars that are never needed # for mvgam post-processing metadata <- cmdstanr::read_cmdstan_csv( files = out$output_files(), variables = "", sampler_diagnostics = "" ) all_vars <- metadata$metadata$variables out_gam_mod <- brms::read_csv_as_stanfit( out$output_files(), variables = all_vars, exclude = c( 'trend_raw', 'b_raw', 'b_raw_trend', 'p', 'eta', 'phi_vec', 'nu_vec', 'sigma_obs_vec', 'shape_vec', 'phi_inv', 'lv_coefs_raw', 'L_Sigma', 'L_Omega', 'L_Sigma_group', 'Sigma_group', 'Gamma', 'Gamma_group', 'A_group', 'L_deviation_group', 'L_Omega_group', 'sub_error' ), algorithm = algorithm ) } out_gam_mod <- repair_stanfit(out_gam_mod) if (algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')) { out_gam_mod@sim$iter <- samples out_gam_mod@sim$thin <- 1 out_gam_mod@stan_args[[1]]$method <- 'sampling' } return(out_gam_mod) } #' fit Stan model with rstan #' @param model a compiled Stan model #' @param sdata named list to be passed to Stan as data #' @return a fitted Stan model #' @noRd .sample_model_rstan <- function( model, algorithm = 'sampling', prior_simulation = FALSE, data, chains = 4, parallel = TRUE, silent = 1L, max_treedepth, adapt_delta, threads = 1, burnin, samples, thin, ... ) { if (rstan::stan_version() < "2.26.0") { warning( 'Your version of Stan is < 2.26.0; some mvgam models may not work properly!' ) } if (algorithm == 'pathfinder') { stop( 'The "pathfinder" algorithm is not yet available in rstan', call. = FALSE ) } if (algorithm == 'laplace') { stop('The "laplace" algorithm is not yet available in rstan', call. = FALSE) } # Set up parallel cores mc_cores_def <- getOption('mc.cores') options(mc.cores = parallel::detectCores()) on.exit(options(mc.cores = mc_cores_def)) # Fit the model in rstan using custom control parameters if (threads > 1) { if (utils::packageVersion("rstan") >= "2.26") { threads_per_chain_def <- rstan::rstan_options("threads_per_chain") on.exit(rstan::rstan_options(threads_per_chain = threads_per_chain_def)) rstan::rstan_options(threads_per_chain = threads) } else { stop( "Threading is not supported by backend 'rstan' version ", utils::packageVersion("rstan"), ".", call. = FALSE ) } } # Compile the model if (silent < 2L) { message('Compiling Stan program using rstan') message() } stan_mod <- eval_silent( rstan::stan_model(model_code = model, verbose = silent < 1L), type = "message", try = TRUE, silent = silent >= 1L ) # Construct rstan sampling arguments args <- nlist(object = stan_mod, data = data) dots <- list(...) args[names(dots)] <- dots if (samples <= burnin) { samples <- burnin + samples } # do the actual sampling if (silent < 2) { message("Start sampling") } if (algorithm %in% c("sampling", "fixed_param")) { stan_control <- list( max_treedepth = max_treedepth, adapt_delta = adapt_delta ) if (prior_simulation) { burnin = 200 samples = 700 } if (parallel) { c(args) <- nlist(cores = min(c(chains, parallel::detectCores() - 1))) } else { # Explicitly set cores = 1 if parallel = FALSE (#113) c(args) <- nlist(cores = 1) } c(args) <- nlist( warmup = burnin, iter = samples, chains = chains, control = stan_control, show_messages = silent < 1L, verbose = FALSE, thin = thin, pars = NA, refresh = 100, save_warmup = FALSE ) out <- do_call(rstan::sampling, args) } else if (algorithm %in% c("fullrank", "meanfield")) { c(args) <- nlist(algorithm, output_samples = samples, pars = NA) out <- do_call(rstan::vb, args) } else { stop("Algorithm '", algorithm, "' is not supported.", call. = FALSE) } out <- repair_stanfit(out) return(out) } #' @noRd .autoformat <- function( stan_file, overwrite_file = TRUE, backend = 'cmdstanr', silent = TRUE ) { # Can make LV models slightly more efficient by not filling in zeros in a loop if (any(grepl('lv_coefs_raw[i, j] = 0;', stan_file, fixed = TRUE))) { starts <- grepws('lv_coefs_raw[i, j] = 0;', stan_file) - 2 ends <- grepws('lv_coefs_raw[i, j] = 0;', stan_file) + 2 stan_file <- stan_file[-c(starts:ends)] stan_file[grepws('matrix[n_series, n_lv] lv_coefs_raw;', stan_file)] <- 'matrix[n_series, n_lv] lv_coefs_raw = rep_matrix(0, n_series, n_lv);' } # No need to fill lv_coefs in each iteration if this is a # trend_formula model if ( any(grepl('lv_coefs = Z;', stan_file, fixed = TRUE)) & !any(grepl('vector[n_lv] LV[n];', stan_file, fixed = TRUE)) ) { stan_file <- stan_file[-grep('lv_coefs = Z;', stan_file, fixed = TRUE)] stan_file <- stan_file[ -grep('matrix[n_series, n_lv] lv_coefs;', stan_file, fixed = TRUE) ] stan_file[grep( 'trend[i, s] = dot_product(lv_coefs[s,], LV[i,]);', stan_file, fixed = TRUE )] <- 'trend[i, s] = dot_product(Z[s,], LV[i,]);' stan_file[grep('// posterior predictions', stan_file, fixed = TRUE) - 1] <- paste0( stan_file[ grep('// posterior predictions', stan_file, fixed = TRUE) - 1 ], '\n', 'matrix[n_series, n_lv] lv_coefs = Z;' ) stan_file <- readLines(textConnection(stan_file), n = -1) } if (backend == 'rstan' & rstan::stan_version() < '2.29.0') { # normal_id_glm became available in 2.29.0; this needs to be replaced # with the older non-glm version if (any(grepl('normal_id_glm', stan_file, fixed = TRUE))) { if ( any(grepl("flat_ys ~ normal_id_glm(flat_xs,", stan_file, fixed = TRUE)) ) { start <- grep( "flat_ys ~ normal_id_glm(flat_xs,", stan_file, fixed = TRUE ) end <- start + 2 stan_file <- stan_file[-c((start + 1):(start + 2))] stan_file[start] <- 'flat_ys ~ normal(flat_xs * b, flat_sigma_obs);' } } } # Old ways of specifying arrays have been converted to errors in # the latest version of Cmdstan (2.32.0); this coincides with # a decision to stop automatically replacing these deprecations with # the canonicalizer, so we have no choice but to replace the old # syntax with this ugly bit of code # rstan dependency in Description should mean that updates should # always happen (mvgam depends on rstan >= 2.29.0) update_code <- TRUE # Tougher if using cmdstanr if (backend == 'cmdstanr') { if (cmdstanr::cmdstan_version() < "2.32.0") { # If the autoformat options from cmdstanr are available, # make use of them to update any deprecated array syntax update_code <- FALSE } } if (update_code) { # Data modifications stan_file[grep( "int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)", stan_file, fixed = TRUE )] <- 'array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)' stan_file[grep( "int flat_ys[n_nonmissing]; // flattened nonmissing observations", stan_file, fixed = TRUE )] <- 'array[n_nonmissing] int flat_ys; // flattened nonmissing observations' stan_file[grep( "int obs_ind[n_nonmissing]; // indices of nonmissing observations", stan_file, fixed = TRUE )] <- "array[n_nonmissing] int obs_ind; // indices of nonmissing observations" if ( any(grepl( 'int ytimes_trend[n, n_lv]; // time-ordered matrix for latent states', stan_file, fixed = TRUE )) ) { stan_file[grep( "int ytimes_trend[n, n_lv]; // time-ordered matrix for latent states", stan_file, fixed = TRUE )] <- "array[n, n_lv] int ytimes_trend;" } if ( any( grepl('int idx', stan_file) & grepl('// discontiguous index values', stan_file, fixed = TRUE) ) ) { lines_replace <- which( grepl('int idx', stan_file) & grepl('// discontiguous index values', stan_file, fixed = TRUE) ) for (i in lines_replace) { split_line <- strsplit(stan_file[i], ' ')[[1]] idxnum <- gsub( ';', '', gsub("\\s*\\[[^\\]+\\]", "", as.character(split_line[2])) ) idx_length <- gsub( "\\]", "", gsub( "\\[", "", regmatches(split_line[2], gregexpr("\\[.*?\\]", split_line[2]))[[1]] ) ) stan_file[i] <- paste0( 'array[', idx_length, '] int ', idxnum, '; // discontiguous index values' ) } } if ( any(grepl( 'int cap[total_obs]; // upper limits of latent abundances', stan_file, fixed = TRUE )) ) { stan_file[grep( 'int cap[total_obs]; // upper limits of latent abundances', stan_file, fixed = TRUE )] <- 'array[total_obs] int cap; // upper limits of latent abundances' stan_file[grep( 'int flat_caps[n_nonmissing];', stan_file, fixed = TRUE )] <- 'array[n_nonmissing] int flat_caps;' } # Model modifications if (any(grepl('real flat_phis[n_nonmissing];', stan_file, fixed = TRUE))) { stan_file[grep( "real flat_phis[n_nonmissing];", stan_file, fixed = TRUE )] <- "array[n_nonmissing] real flat_phis;" } # n-mixture modifications if ( any(grepl( 'real p_ub = poisson_cdf(max_k, lambda);', stan_file, fixed = TRUE )) ) { stan_file[grep( 'real p_ub = poisson_cdf(max_k, lambda);', stan_file, fixed = TRUE )] <- 'real p_ub = poisson_cdf(max_k | lambda);' } # trend_formula modifications if ( any( grepl('int trend_rand_idx', stan_file) & grepl('// trend random effect indices', stan_file, fixed = TRUE) ) ) { lines_replace <- which( grepl('int trend_rand_idx', stan_file) & grepl('// trend random effect indices', stan_file, fixed = TRUE) ) for (i in lines_replace) { split_line <- strsplit(stan_file[i], ' ')[[1]] trend_idxnum <- gsub( ';', '', gsub("\\s*\\[[^\\]+\\]", "", as.character(split_line[2])) ) idx_length <- gsub( "\\]", "", gsub( "\\[", "", regmatches(split_line[2], gregexpr("\\[.*?\\]", split_line[2]))[[1]] ) ) stan_file[i] <- paste0( 'array[', idx_length, '] int ', trend_idxnum, '; // trend random effect indices' ) } } if ( any( grepl('int trend_idx', stan_file) & grepl('// discontiguous index values', stan_file, fixed = TRUE) ) ) { lines_replace <- which( grepl('int trend_idx', stan_file) & grepl('// discontiguous index values', stan_file, fixed = TRUE) ) for (i in lines_replace) { split_line <- strsplit(stan_file[i], ' ')[[1]] trend_idxnum <- gsub( ';', '', gsub("\\s*\\[[^\\]+\\]", "", as.character(split_line[2])) ) idx_length <- gsub( "\\]", "", gsub( "\\[", "", regmatches(split_line[2], gregexpr("\\[.*?\\]", split_line[2]))[[1]] ) ) stan_file[i] <- paste0( 'array[', idx_length, '] int ', trend_idxnum, '; // discontiguous index values' ) } } if (any(grepl('vector[n_series] trend_raw[n];', stan_file, fixed = TRUE))) { stan_file[grep( "vector[n_series] trend_raw[n];", stan_file, fixed = TRUE )] <- "array[n] vector[n_series] trend_raw;" } if (any(grepl('vector[n_lv] error[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_lv] error[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_lv] error;" } if (any(grepl('vector[n_series] error[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_series] error[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_series] error;" } if (any(grepl('vector[n_lv] LV[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_lv] LV[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_lv] LV;" } if (any(grepl('vector[n_series] mu[n - 1];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_series] mu[n - 1];", stan_file, fixed = TRUE)] <- "array[n - 1] vector[n_series] mu;" } if (any(grepl('vector[n_lv] mu[n - 1];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_lv] mu[n - 1];", stan_file, fixed = TRUE)] <- "array[n - 1] vector[n_lv] mu;" } if (any(grepl('vector[n_series] mu[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_series] mu[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_series] mu;" } if (any(grepl('vector[n_lv] mu[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_lv] mu[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_lv] mu;" } # Generated quantity modifications if ( any(grepl( 'real ypred[n, n_series];', stan_file, fixed = TRUE )) ) { stan_file[grep( "real ypred[n, n_series];", stan_file, fixed = TRUE )] <- "array[n, n_series] real ypred;" } if ( any(grepl('real ypred[n, n_series];', stan_file, fixed = TRUE)) ) { stan_file[grep( "real ypred[n, n_series];", stan_file, fixed = TRUE )] <- "array[n, n_series] real ypred;" } # ARMA model modifications if (any(grepl('vector[n_series] epsilon[n];', stan_file, fixed = TRUE))) { stan_file[grep( "vector[n_series] epsilon[n];", stan_file, fixed = TRUE )] <- "array[n] vector[n_series] epsilon;" } if (any(grepl('vector[n_lv] epsilon[n];', stan_file, fixed = TRUE))) { stan_file[grep("vector[n_lv] epsilon[n];", stan_file, fixed = TRUE)] <- "array[n] vector[n_lv] epsilon;" } # VARMA model modifications if ( any(grepl('matrix[n_series, n_series] P[1];', stan_file, fixed = TRUE)) ) { stan_file[grep( "matrix[n_series, n_series] P[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_series, n_series] P;" stan_file[grep( "matrix[n_series, n_series] phiGamma[2, 1];", stan_file, fixed = TRUE )] <- "array[2, 1] matrix[n_series, n_series] phiGamma;" } if ( any(grepl( 'matrix initial_joint_var(matrix Sigma, matrix[] phi, matrix[] theta) {', stan_file, fixed = TRUE )) ) { stan_file[grep( "matrix initial_joint_var(matrix Sigma, matrix[] phi, matrix[] theta) {", stan_file, fixed = TRUE )] <- "matrix initial_joint_var(matrix Sigma, array[] matrix phi, array[] matrix theta) {" } if (any(grepl('matrix[n_lv, n_lv] P[1];', stan_file, fixed = TRUE))) { stan_file[grep("matrix[n_lv, n_lv] P[1];", stan_file, fixed = TRUE)] <- "array[1] matrix[n_lv, n_lv] P;" stan_file[grep("matrix[n_lv, n_lv] R[1];", stan_file, fixed = TRUE)] <- "array[1] matrix[n_lv, n_lv] R;" stan_file[grep( "matrix[n_lv, n_lv] A_init[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_lv, n_lv] A_init;" stan_file[grep( "matrix[n_lv, n_lv] theta_init[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_lv, n_lv] theta_init;" } if ( any(grepl('matrix[n_series, n_series] R[1];', stan_file, fixed = TRUE)) ) { stan_file[grep( "matrix[n_series, n_series] R[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_series, n_series] R;" stan_file[grep( "matrix[n_series, n_series] A_init[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_series, n_series] A_init;" stan_file[grep( "matrix[n_series, n_series] theta_init[1];", stan_file, fixed = TRUE )] <- "array[1] matrix[n_series, n_series] theta_init;" } if ( any(grepl( 'matrix[] rev_mapping(matrix[] P, matrix Sigma) {', stan_file, fixed = TRUE )) ) { stan_file[grep( "matrix[] rev_mapping(matrix[] P, matrix Sigma) {", stan_file, fixed = TRUE )] <- "array[] matrix rev_mapping(array[] matrix P, matrix Sigma) {" stan_file[grep( "matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];", stan_file, fixed = TRUE )] <- 'array[p, p] matrix[m, m] phi_for; array[p, p] matrix[m, m] phi_rev;' stan_file[grep( "matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];", stan_file, fixed = TRUE )] <- 'array[p+1] matrix[m, m] Sigma_for; array[p+1] matrix[m, m] Sigma_rev;' stan_file[grep( "matrix[m, m] S_for_list[p+1];", stan_file, fixed = TRUE )] <- 'array[p+1] matrix[m, m] S_for_list;' } # VAR model modifications if ( any(grepl('matrix[n_lv, n_lv] phiGamma[2, 1];', stan_file, fixed = TRUE)) ) { stan_file[grep( 'matrix[n_lv, n_lv] phiGamma[2, 1];', stan_file, fixed = TRUE )] <- 'array[2, 1] matrix[n_lv, n_lv] phiGamma;' } if ( any(grepl( 'matrix[,] rev_mapping(matrix[] P, matrix Sigma) {', stan_file, fixed = TRUE )) ) { stan_file[grep( "matrix[,] rev_mapping(matrix[] P, matrix Sigma) {", stan_file, fixed = TRUE )] <- "array[,] matrix rev_mapping(array[] matrix P, matrix Sigma) {" stan_file[grep( "matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];", stan_file, fixed = TRUE )] <- 'array[p, p] matrix[m, m] phi_for; array[p, p] matrix[m, m] phi_rev;' stan_file[grep( "matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];", stan_file, fixed = TRUE )] <- 'array[p+1] matrix[m, m] Sigma_for; array[p+1] matrix[m, m] Sigma_rev;' stan_file[grep( "matrix[m, m] S_for_list[p+1];", stan_file, fixed = TRUE )] <- 'array[p+1] matrix[m, m] S_for_list;' stan_file[grep( "matrix[m, m] Gamma_trans[p+1];", stan_file, fixed = TRUE )] <- 'array[p+1] matrix[m, m] Gamma_trans;' stan_file[grep( "matrix[m, m] phiGamma[2, p];", stan_file, fixed = TRUE )] <- 'array[2, p] matrix[m, m] phiGamma;' } if ( any(grepl( "real partial_log_lik(int[] seq, int start, int end,", stan_file, fixed = TRUE )) ) { stan_file[grepl( "real partial_log_lik(int[] seq, int start, int end,", stan_file, fixed = TRUE )] <- "real partial_log_lik(array[] int seq, int start, int end," } if ( any(grepl( "data vector Y, vector mu, real[] shape) {", stan_file, fixed = TRUE )) ) { stan_file[grepl( "data vector Y, vector mu, real[] shape) {", stan_file, fixed = TRUE )] <- "data vector Y, vector mu, array[] real shape) {" } if ( any(grepl( "int seq[n_nonmissing]; // an integer sequence for reduce_sum slicing", stan_file, fixed = TRUE )) ) { stan_file[grepl( "int seq[n_nonmissing]; // an integer sequence for reduce_sum slicing", stan_file, fixed = TRUE )] <- "array[n_nonmissing] int seq; // an integer sequence for reduce_sum slicing" } } if (backend == 'rstan') { options(stanc.allow_optimizations = TRUE, stanc.auto_format = TRUE) out <- eval_silent( rstan::stanc(model_code = stan_file), type = "message", try = TRUE, silent = silent ) out <- out$model_code } else { stan_file <- cmdstanr::write_stan_file(stan_file) cmdstan_mod <- eval_silent( cmdstanr::cmdstan_model(stan_file, compile = FALSE), type = "message", try = TRUE, silent = silent ) out <- utils::capture.output( cmdstan_mod$format( max_line_length = 80, canonicalize = TRUE, overwrite_file = overwrite_file, backup = FALSE ) ) out <- paste0(out, collapse = "\n") } return(out) } #' @noRd repair_stanfit <- function(x) { if (!length(x@sim$fnames_oi)) { # nothing to rename return(x) } # the posterior package cannot deal with non-unique parameter names # this case happens rarely but might happen when sample_prior = "yes" x@sim$fnames_oi <- make.unique(as.character(x@sim$fnames_oi), "__") for (i in seq_along(x@sim$samples)) { # stanfit may have renamed dimension suffixes (#1218) if (length(x@sim$samples[[i]]) == length(x@sim$fnames_oi)) { names(x@sim$samples[[i]]) <- x@sim$fnames_oi } } x } #' @noRd repair_variable_names <- function(x) { x <- sub("\\.", "[", x) x <- gsub("\\.", ",", x) x[grep("\\[", x)] <- paste0(x[grep("\\[", x)], "]") x } #' @noRd seq_rows = function(x) { seq_len(NROW(x)) } #' @noRd is_equal <- function(x, y, check.attributes = FALSE, ...) { isTRUE(all.equal(x, y, check.attributes = check.attributes, ...)) } #' @noRd ulapply <- function(X, FUN, ..., recursive = TRUE, use.names = TRUE) { unlist(lapply(X, FUN, ...), recursive, use.names) } ================================================ FILE: R/compute_edf.R ================================================ #' Compute approximate EDFs of smooths #' @importFrom stats fitted #'@noRd compute_edf = function( mgcv_model, object, rho_names, sigma_raw_names, conservative = FALSE ) { if (length(mgcv_model$smooth) > 0) { smooth_labs <- do.call( rbind, lapply(seq_along(mgcv_model$smooth), function(x) { data.frame( label = mgcv_model$smooth[[x]]$label, term = paste(mgcv_model$smooth[[x]]$term, collapse = ','), class = class(mgcv_model$smooth[[x]])[1] ) }) ) # Find the best overall posterior draw if (object$family == 'nmix') { best_draw <- 1 } else { liks <- logLik(object, include_forecast = FALSE) best_draw <- which.max(rowMeans(liks, na.rm = TRUE)) } # Extract smoothing parameters sp_names <- names(mgcv_model$sp) rho_sp <- vector() random_sp <- vector() if (!all(smooth_labs$class == 'random.effect')) { rho_estimates <- mcmc_chains(object$model_output, rho_names)[best_draw, ] rho_sp <- rho_estimates names(rho_sp) <- paste0(sp_names[1:length(rho_estimates)], '_', rho_names) } if (any(smooth_labs$class == 'random.effect')) { if (length(rho_sp) > 0) { rho_sp <- rho_sp[-which(smooth_labs$class == 'random.effect')] } pop_sd_estimates <- mcmc_chains(object$model_output, sigma_raw_names)[ best_draw, ] random_sp <- pop_sd_estimates if (rho_names == 'rho_trend') { names(random_sp) <- paste0( 'sd(', sp_names[which(smooth_labs$class == 'random.effect')], ')_trend' ) } else { names(random_sp) <- paste0( 'sd(', sp_names[which(smooth_labs$class == 'random.effect')], ')' ) } names(random_sp) <- gsub('s\\(', '', names(random_sp)) names(random_sp) <- gsub('\\))', ')', names(random_sp)) } mgcv_model$sp <- exp(c(rho_sp, random_sp)) # Compute estimated degrees of freedom based on edf.type = 1 from # https://github.com/cran/mgcv/blob/master/R/jagam.r # using Simon Wood's example calculation X <- predict(mgcv_model, type = 'lpmatrix') if (rho_names == 'rho_trend') { bs <- mcmc_chains(object$model_output, 'b_trend')[best_draw, ] } else { bs <- mcmc_chains(object$model_output, 'b')[best_draw, ] } eta <- X %*% bs if (rho_names == 'rho_trend') { # trend models use Gaussian family; expectations are simply the trend_mus mu <- mcmc_chains(object$model_output, 'trend_mus')[ best_draw, 1:length(eta) ] } else { # observation models may vary in their response family; need to compute # the expectations mu <- fitted(object, summary = FALSE)[best_draw, 1:length(eta)] } # Calculate variance using family's mean-variance relationship mu_variance <- predict( object, process_error = FALSE, type = 'variance', summary = FALSE )[best_draw, ] if (length(mu_variance) > 1) { mu_variance <- mu_variance[1:length(eta)] } if (any(mu_variance == 0)) { mu_variance[which(mu_variance == 0)] <- mu[which(mu_variance == 0)] } if (!conservative) { w <- as.numeric(mgcv_model$family$mu.eta(as.vector(eta))^2 / mu_variance) XWX <- t(X) %*% (w * X) } else { XWX <- t(X) %*% X } lambda <- mgcv_model$sp XWXS <- XWX for (i in 1:length(lambda)) { ind <- mgcv_model$off[i]:(mgcv_model$off[i] + ncol(mgcv_model$S[[i]]) - 1) XWXS[ind, ind] <- XWXS[ind, ind] + mgcv_model$S[[i]] * lambda[i] } suppressWarnings(edf <- try(diag(solve(XWXS, XWX)), silent = TRUE)) if (inherits(edf, 'try-error')) { edf <- mgcv_model$edf names(edf) <- names(coef(mgcv_model)) } mgcv_model$edf <- edf mgcv_model$edf1 <- edf mgcv_model$edf2 <- edf } # Add frequentist version of parameter covariance estimators # for calculation of smooth term p-values; # rV <- mroot(mgcv_model$Vp) # V <- tcrossprod(rV) # XWX <- crossprod(mgcv_model$R) # Ve_hat <- V %*% XWX # mgcv_model$Ve <- Ve_hat %*% V mgcv_model$Ve <- mgcv_model$Vp return(mgcv_model) } ================================================ FILE: R/conditional_effects.R ================================================ #' Display conditional effects of predictors for \pkg{mvgam} models #' #' Display conditional effects of one or more numeric and/or categorical #' predictors in models of class `mvgam` and `jsdgam`, including two-way #' interaction effects. #' #' @importFrom ggplot2 scale_colour_discrete scale_fill_discrete theme_classic #' @importFrom graphics plot #' @importFrom grDevices devAskNewPage #' #' @inheritParams brms::conditional_effects.brmsfit #' @inheritParams brms::plot.brms_conditional_effects #' #' @param x Object of class `mvgam`, `jsdgam` or `mvgam_conditional_effects` #' #' @param points `Logical`. Indicates if the original data points should be #' added, but only if `type == 'response'`. Default is `TRUE`. #' #' @param rug `Logical`. Indicates if displays tick marks should be plotted on #' the axes to mark the distribution of raw data, but only if #' `type == 'response'`. Default is `TRUE`. #' #' @param ask `Logical`. Indicates if the user is prompted before a new page is #' plotted. Only used if plot is `TRUE`. Default is `FALSE`. #' #' @param type `character` specifying the scale of predictions. When this has #' the value \code{link} the linear predictor is calculated on the link #' scale. If \code{expected} is used (the default), predictions reflect the #' expectation of the response (the mean) but ignore uncertainty in the #' observation process. When \code{response} is used, the predictions take #' uncertainty in the observation process into account to return predictions #' on the outcome scale. Two special cases are also allowed: type `latent_N` #' will return the estimated latent abundances from an N-mixture distribution, #' while type `detection` will return the estimated detection probability from #' an N-mixture distribution. #' #' @param ... other arguments to pass to \code{\link[marginaleffects]{plot_predictions}} #' #' @return `conditional_effects` returns an object of class #' \code{mvgam_conditional_effects} which is a named list with one slot per #' effect containing a \code{\link[ggplot2]{ggplot}} object, which can be #' further customized using the \pkg{ggplot2} package. The corresponding #' `plot` method will draw these plots in the active graphic device. #' #' @details This function acts as a wrapper to the more flexible #' \code{\link[marginaleffects]{plot_predictions}}. When creating #' \code{conditional_effects} for a particular predictor (or interaction of #' two predictors), one has to choose the values of all other predictors to #' condition on. By default, the mean is used for continuous variables and the #' reference category is used for factors. Use #' \code{\link[marginaleffects]{plot_predictions}} to change these and create #' more bespoke conditional effects plots. #' #' @name conditional_effects.mvgam #' #' @author Nicholas J Clark #' #' @seealso \code{\link[marginaleffects]{plot_predictions}}, #' \code{\link[marginaleffects]{plot_slopes}} #' #' @examples #' \dontrun{ #' # Simulate some data #' simdat <- sim_mvgam( #' family = poisson(), #' seasonality = 'hierarchical' #' ) #' #' # Fit a model #' mod <- mvgam( #' y ~ s(season, by = series, k = 5) + year:series, #' family = poisson(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Plot all main effects on the response scale #' conditional_effects(mod) #' #' # Change the prediction interval to 70% using plot_predictions() argument #' # 'conf_level' #' conditional_effects(mod, conf_level = 0.7) #' #' # Plot all main effects on the link scale #' conditional_effects(mod, type = 'link') #' #' # Works the same for smooth terms, including smooth interactions #' set.seed(0) #' dat <- mgcv::gamSim(1, n = 200, scale = 2) #' mod <- mvgam( #' y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' conditional_effects(mod) #' conditional_effects(mod, conf_level = 0.5, type = 'link') #' #' # ggplot objects can be modified and combined with the help of many #' # additional packages. Here is an example using the patchwork package #' #' # Simulate some nonlinear data #' dat <- mgcv::gamSim(1, n = 200, scale = 2) #' mod <- mvgam( #' y ~ s(x1, bs = 'moi') + te(x0, x2), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' # Extract the list of ggplot conditional_effect plots #' m <- plot(conditional_effects(mod), plot = FALSE) #' #' # Add custom labels and arrange plots together using patchwork::wrap_plots() #' library(patchwork) #' library(ggplot2) #' wrap_plots( #' m[[1]] + labs(title = 's(x1, bs = "moi")'), #' m[[2]] + labs(title = 'te(x0, x2)') #' ) #' } #' #' @export conditional_effects.mvgam = function( x, effects = NULL, type = 'expected', points = FALSE, rug = FALSE, ... ) { use_def_effects <- is.null(effects) type <- match.arg( type, c('response', 'link', 'detection', 'latent_N', 'expected') ) if (type == 'response') { if (points) { points <- 0.5 } else { points <- 0 } } else { points <- 0 rug <- FALSE } # Can't plot points or rugs with binomial models due to the # cbind syntax if (rug) { if (x$family %in% c('binomial', 'beta_binomial')) { rug <- FALSE message('Cannot show observation rug for binomial models') } } if (points) { if (x$family %in% c('binomial', 'beta_binomial')) { points <- 0 message('Cannot show observation points for binomial models') } } if (use_def_effects) { # Get all term labels in the model termlabs <- attr(terms(formula(x), keep.order = TRUE), 'term.labels') #termlabs <- unlist(find_predictors(x), use.names = FALSE) if (!is.null(x$trend_call)) { termlabs <- c( termlabs, gsub( 'trend', 'series', attr( terms(formula(x, trend_effects = TRUE), keep.order = TRUE), 'term.labels' ) ) ) } # Find all possible (up to 2-way) plot conditions cond_labs <- purrr::flatten(lapply(termlabs, function(i) { split_termlabs(i) })) } else { cond_labs <- strsplit(as.character(effects), split = ":") } if (any(lengths(cond_labs) > 3L)) { stop( "To display interactions of order higher than 3 ", "please use plot_predictions()", call. = FALSE ) } if (length(cond_labs) > 0) { # Make the plot data with plot_predictions out <- list() for (i in seq_along(cond_labs)) { if (length(cond_labs[[i]]) == 1) { out[[i]] <- plot_predictions( x, condition = cond_labs[[i]], draw = TRUE, type = type, points = points, rug = rug, ... ) + scale_fill_discrete(label = roundlabs) + scale_colour_discrete(label = roundlabs) + theme_classic() } if (length(cond_labs[[i]]) == 2) { out[[i]] <- plot_predictions( x, condition = c(cond_labs[[i]][1], cond_labs[[i]][2]), draw = TRUE, type = type, points = points, rug = rug, ... ) + scale_fill_discrete(label = roundlabs) + scale_colour_discrete(label = roundlabs) + theme_classic() } if (length(cond_labs[[i]]) == 3) { out[[i]] <- plot_predictions( x, condition = c( cond_labs[[i]][1], cond_labs[[i]][2], cond_labs[[i]][3] ), draw = TRUE, type = type, points = points, rug = rug, ... ) + scale_fill_discrete(label = roundlabs) + scale_colour_discrete(label = roundlabs) + theme_classic() } } } else { out <- NULL } class(out) <- 'mvgam_conditional_effects' return(out) } #' @export #' @importFrom brms conditional_effects brms::conditional_effects #' @rdname conditional_effects.mvgam #' @export plot.mvgam_conditional_effects = function(x, plot = TRUE, ask = FALSE, ...) { out <- x for (i in seq_along(out)) { if (plot) { plot(out[[i]]) if (i == 1) { devAskNewPage(ask = ask) } } } invisible(out) } #' A helper function so ggplot2 labels in the legend don't have #' ridiculous numbers of digits for numeric bins #' @noRd decimalplaces <- function(x) { x <- as.numeric(x) if (abs(x - round(x)) > .Machine$double.eps^0.5) { nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed = TRUE)[[1]][[ 2 ]]) } else { return(0) } } #' A helper function so ggplot2 labels in the legend don't have #' ridiculous numbers of digits for numeric bins #' @noRd roundlabs = function(x) { if (all(suppressWarnings(is.na(as.numeric(x))))) { out <- x } else if (all(sapply(x, decimalplaces) == 0)) { out <- x } else if (all(sapply(x, decimalplaces) <= 1)) { out <- sprintf("%.1f", as.numeric(x)) } else { out <- sprintf("%.4f", as.numeric(x)) } out } #' @rdname conditional_effects.mvgam #' @export print.mvgam_conditional_effects <- function(x, ...) { plot(x, ...) } #' @noRd split_termlabs = function(lab) { out <- list() if (grepl(':', lab, fixed = TRUE)) { out[[1]] <- strsplit(lab, ':')[[1]] } else if (grepl('*', lab, fixed = TRUE)) { out[[1]] <- strsplit(lab, '\\*')[[1]] } else if ( grepl('s(', lab, fixed = TRUE) | grepl('gp(', lab, fixed = TRUE) | grepl('te(', lab, fixed = TRUE) | grepl('t2(', lab, fixed = TRUE) | grepl('ti(', lab, fixed = TRUE) ) { term_struc <- eval(rlang::parse_expr(lab)) term_struc$by <- if (term_struc$by == 'NA') { NULL } else { term_struc$by } if (length(term_struc$term) <= 2) { out[[1]] <- c(all.vars(parse(text = term_struc$term)), term_struc$by) } if (length(term_struc$term) == 3) { out[[1]] <- c(all.vars(parse(text = term_struc$term[1:2])), term_struc$by) out[[2]] <- c( all.vars(parse(text = term_struc$term[c(1, 3)])), term_struc$by ) out[[3]] <- c( all.vars(parse(text = term_struc$term[c(2, 3)])), term_struc$by ) } if (length(term_struc$term) == 4) { out[[1]] <- c(all.vars(parse(text = term_struc$term[1:2])), term_struc$by) out[[2]] <- c( all.vars(parse(text = term_struc$term[c(1, 3)])), term_struc$by ) out[[3]] <- c( all.vars(parse(text = term_struc$term[c(1, 4)])), term_struc$by ) out[[4]] <- c( all.vars(parse(text = term_struc$term[c(2, 3)])), term_struc$by ) out[[5]] <- c( all.vars(parse(text = term_struc$term[c(2, 4)])), term_struc$by ) } } else if (grepl('dynamic(', lab, fixed = TRUE)) { term_struc <- eval(rlang::parse_expr(lab)) out[[1]] <- c('time', all.vars(parse(text = term_struc$term[c(2, 4)]))) } else { out[[1]] <- lab } return(out) } ================================================ FILE: R/cpp_funs.R ================================================ #' @useDynLib mvgam, .registration = TRUE #' @importFrom Rcpp evalCpp NULL ================================================ FILE: R/data_grids.R ================================================ #' Get data objects into correct order in case it is not already #'@noRd sort_data = function(data, series_time = FALSE) { if (inherits(data, 'list')) { data_arranged <- data if (series_time) { temp_dat = data.frame( time = data$index..time..index, series = data$series ) %>% dplyr::mutate(index = dplyr::row_number()) %>% dplyr::arrange(series, time) } else { temp_dat = data.frame( time = data$index..time..index, series = data$series ) %>% dplyr::mutate(index = dplyr::row_number()) %>% dplyr::arrange(time, series) } data_arranged <- lapply(data, function(x) { if (is.matrix(x)) { matrix(x[temp_dat$index, ], ncol = NCOL(x)) } else { x[temp_dat$index] } }) names(data_arranged) <- names(data) } else { if (series_time) { data_arranged <- data %>% dplyr::arrange(series, index..time..index) } else { data_arranged <- data %>% dplyr::arrange(index..time..index, series) } } return(data_arranged) } #' Create prediction grids, mostly for simple plotting functions #'@noRd data_grid = function(..., newdata) { dots <- list(...) vars <- names(dots) # Validate that vars exist in supplied data for (i in seq_along(vars)) { if (!exists(vars[i], newdata)) { stop(paste0('Variable ', vars[i], ' not found in newdata'), call. = FALSE) } } # Create sample dummy dataframe to get the prediction grid, ensuring # factors are preserved newdat_grid <- data.frame(do.call( cbind.data.frame, lapply(vars, function(x) { newdata[[x]] }) )) colnames(newdat_grid) <- vars # Use the supplied conditions for making the datagrid newdat_grid <- marginaleffects::datagrid(..., newdata = newdat_grid) # Now replicate the first observation for all other variables if (inherits(newdata, 'list')) { newdat_full <- lapply(seq_along(newdata), function(x) { if (names(newdata)[x] %in% vars) { newdat_grid[[names(newdata)[x]]] } else { if (is.matrix(newdata[[x]])) { t(replicate(NROW(newdat_grid), newdata[[x]][1, ])) } else { if (is.factor(newdata[[x]])) { factor( rep(newdata[[x]][1], NROW(newdat_grid)), levels = levels(newdata[[x]]) ) } else { rep(newdata[[x]][1], NROW(newdat_grid)) } } } }) names(newdat_full) <- names(newdata) } else { newdat_full <- dplyr::bind_cols( newdat_grid, data.frame( newdata %>% dplyr::select(!vars) %>% dplyr::slice_head(n = 1) ) ) } return(newdat_full) } ================================================ FILE: R/dynamic.R ================================================ #' Defining dynamic coefficients in \pkg{mvgam} formulae #' #' Set up time-varying (dynamic) coefficients for use in \pkg{mvgam} models. #' Currently, only low-rank Gaussian Process smooths are available for #' estimating the dynamics of the time-varying coefficient. #' #' @importFrom stats terms formula reformulate #' #' @param variable The variable that the dynamic smooth will be a function of #' #' @param k Optional number of basis functions for computing approximate GPs. #' If missing, `k` will be set as large as possible to accurately estimate #' the nonlinear function. #' #' @param stationary Logical. If \code{TRUE} (the default) and `rho` is #' supplied, the latent Gaussian Process smooth will not have a linear trend #' component. If \code{FALSE}, a linear trend in the covariate is added to #' the Gaussian Process smooth. Leave at \code{TRUE} if you do not believe #' the coefficient is evolving with much trend, as the linear component of #' the basis functions can be hard to penalize to zero. This sometimes causes #' divergence issues in `Stan`. See \code{\link[mgcv]{gp.smooth}} for #' details. Ignored if `rho` is missing (in which case a Hilbert space #' approximate GP is used). #' #' @param rho Either a positive numeric stating the length scale to be used for #' approximating the squared exponential Gaussian Process smooth (see #' \code{\link[mgcv]{gp.smooth}} for details) or missing, in which case the #' length scale will be estimated by setting up a Hilbert space approximate GP. #' #' @param scale Logical; If `TRUE` (the default) and `rho` is missing, #' predictors are scaled so that the maximum Euclidean distance between two #' points is `1`. This often improves sampling speed and convergence. Scaling #' also affects the estimated length-scale parameters in that they resemble #' those of scaled predictors (not of the original predictors) if scale is #' `TRUE`. #' #' @details \code{mvgam} currently sets up dynamic coefficients as low-rank #' squared exponential Gaussian Process smooths via the call #' \code{s(time, by = variable, bs = "gp", m = c(2, rho, 2))}. These smooths, #' if specified with reasonable values for the length scale parameter, will #' give more realistic out of sample forecasts than standard splines such as #' thin plate or cubic. But the user must set the value for `rho`, as there #' is currently no support for estimating this value in \code{mgcv}. This may #' not be too big of a problem, as estimating latent length scales is often #' difficult anyway. The \code{rho} parameter should be thought of as a prior #' on the smoothness of the latent dynamic coefficient function (where higher #' values of \code{rho} lead to smoother functions with more temporal #' covariance structure). Values of \code{k} are set automatically to ensure #' enough basis functions are used to approximate the expected wiggliness of #' the underlying dynamic function (\code{k} will increase as \code{rho} #' decreases). #' #' @rdname dynamic #' #' @return a `list` object for internal usage in 'mvgam' #' #' @examples #' \dontrun{ #' # Simulate a time-varying coefficient #' # (as a Gaussian Process with length scale = 10) #' set.seed(1111) #' N <- 200 #' #' # A function to simulate from a squared exponential Gaussian Process #' sim_gp <- function(N, c, alpha, rho) { #' Sigma <- alpha ^ 2 * #' exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + #' diag(1e-9, N) #' c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) #' } #' #' beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) #' plot( #' beta, type = 'l', lwd = 3, bty = 'l', #' xlab = 'Time', ylab = 'Coefficient', col = 'darkred' #' ) #' #' # Simulate the predictor as a standard normal #' predictor <- rnorm(N, sd = 1) #' #' # Simulate a Gaussian outcome variable #' out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) #' time <- seq_along(predictor) #' plot( #' out, type = 'l', lwd = 3, bty = 'l', #' xlab = 'Time', ylab = 'Outcome', col = 'darkred' #' ) #' #' # Gather into a data.frame and fit a dynamic coefficient model #' data <- data.frame(out, predictor, time) #' #' # Split into training and testing #' data_train <- data[1:190, ] #' data_test <- data[191:200, ] #' #' # Fit a model using the dynamic function #' mod <- mvgam( #' out ~ #' # mis-specify the length scale slightly as this #' # won't be known in practice #' dynamic(predictor, rho = 8, stationary = TRUE), #' family = gaussian(), #' data = data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Inspect the summary #' summary(mod) #' #' # Plot the time-varying coefficient estimates #' plot(mod, type = 'smooths') #' #' # Extrapolate the coefficient forward in time #' plot_mvgam_smooth(mod, smooth = 1, newdata = data) #' abline(v = 190, lty = 'dashed', lwd = 2) #' #' # Overlay the true simulated time-varying coefficient #' lines(beta, lwd = 2.5, col = 'white') #' lines(beta, lwd = 2) #' } #' #' @author Nicholas J Clark #' #' @export dynamic = function(variable, k, rho = 5, stationary = TRUE, scale = TRUE) { # Check that only one variable is supplied vars <- as.list(substitute(list(variable)))[-1] if (length(vars) > 1) { stop("dynamic() can only handle one term at a time.") } term <- deparse(vars[[1]]) if (term[1] == ".") { stop("dynamic(.) not supported.") } # Check rho if (missing(rho)) { rho <- NULL } else { if (rho <= 0) { stop( 'Argument "rho" in dynamic() must be a positive value', call. = FALSE ) } } # Check k if (missing(k)) { k <- NULL } else { validate_pos_integer(k) } # Gather into a structured list and return term <- attr(terms(reformulate(term)), "term.labels") out <- list( term = term, rho = rho, k = k, stationary = stationary, scale = scale ) class(out) <- "dyncoef.spec" return(out) } ================================================ FILE: R/ensemble.R ================================================ #' Combine forecasts from \pkg{mvgam} models into evenly weighted ensembles #' #' Generate evenly weighted ensemble forecast distributions from #' \code{mvgam_forecast} objects. #' #' @name ensemble.mvgam_forecast #' #' @param object \code{list} object of class \code{mvgam_forecast}. #' See [forecast.mvgam()] #' #' @param ... More \code{mvgam_forecast} objects. #' #' @details It is widely recognised in the forecasting literature that #' combining forecasts from different models often results in improved #' forecast accuracy. The simplest way to create an ensemble is to use #' evenly weighted combinations of forecasts from the different models. #' This is straightforward to do in a Bayesian setting with \pkg{mvgam} as #' the posterior MCMC draws contained in each \code{mvgam_forecast} object #' will already implicitly capture correlations among the temporal posterior #' predictions. #' #' @return An object of class \code{mvgam_forecast} containing the ensemble #' predictions. This object can be readily used with the supplied S3 #' functions \code{plot} and \code{score}. #' #' @author Nicholas J Clark #' #' @seealso \code{\link{plot.mvgam_forecast}}, #' \code{\link{score.mvgam_forecast}} #' #' @examples #' \dontrun{ #' # Simulate some series and fit a few competing dynamic models #' set.seed(1) #' simdat <- sim_mvgam( #' n_series = 1, #' prop_trend = 0.6, #' mu = 1 #' ) #' #' plot_mvgam_series( #' data = simdat$data_train, #' newdata = simdat$data_test #' ) #' #' m1 <- mvgam( #' y ~ 1, #' trend_formula = ~ time + #' s(season, bs = 'cc', k = 9), #' trend_model = AR(p = 1), #' noncentred = TRUE, #' data = simdat$data_train, #' newdata = simdat$data_test, #' chains = 2, #' silent = 2 #' ) #' #' m2 <- mvgam( #' y ~ time, #' trend_model = RW(), #' noncentred = TRUE, #' data = simdat$data_train, #' newdata = simdat$data_test, #' chains = 2, #' silent = 2 #' ) #' #' # Calculate forecast distributions for each model #' fc1 <- forecast(m1) #' fc2 <- forecast(m2) #' #' # Generate the ensemble forecast #' ensemble_fc <- ensemble(fc1, fc2) #' #' # Plot forecasts #' plot(fc1) #' plot(fc2) #' plot(ensemble_fc) #' #' # Score forecasts #' score(fc1) #' score(fc2) #' score(ensemble_fc) #' } #' #' @export ensemble <- function(object, ...) { UseMethod("ensemble", object) } #'@rdname ensemble.mvgam_forecast #' #'@method ensemble mvgam_forecast #' #'@param ndraws Positive integer specifying the number of draws to use from each #'forecast distribution for creating the ensemble. If some of the ensemble members have #'fewer draws than `ndraws`, their forecast distributions will be resampled with replacement #'to achieve the correct number of draws #' #'@export ensemble.mvgam_forecast <- function(object, ..., ndraws = 5000) { models <- split_fc_dots(object, ..., model_names = NULL) n_models <- length(models) # Check that series names and key dimensions match for all forecasts allsame <- function(x) length(unique(x)) == 1 if (!allsame(purrr::map(models, 'series_names'))) { stop('Names of series must match for all forecast objects.', call. = FALSE) } if (!allsame(lapply(models, function(x) length(x$forecasts)))) { stop( 'The number of forecast distributions must match for all forecast objects.', call. = FALSE ) } if (!allsame(lapply(models, function(x) length(x$test_observations)))) { stop('Validation data must match for all forecast objects.', call. = FALSE) } if ( !allsame(lapply(models, function(x) { unlist(lapply(x$forecasts, function(y) dim(y)[2]), use.names = FALSE) })) ) { stop( 'Forecast horizons must match for all forecast objects.', call. = FALSE ) } validate_pos_integer(ndraws) # End of checks; now proceed with ensembling n_series <- length(models[[1]]$series_names) # Calculate total number of forecast draws to sample from for each model n_mod_draws <- lapply(seq_len(n_models), function(x) { NROW(models[[x]]$forecasts[[1]]) }) # Calculate model weights (only option at the moment is even weighting, # but this may be relaxed in future) mod_weights <- data.frame( mod = paste0('mod', 1:n_models), orig_weight = 1 / n_models, ndraws = unlist(n_mod_draws, use.names = FALSE) ) %>% # Adjust weights by the number of draws available per # forecast, ensuring that models with fewer draws aren't # under-represented in the final weighted ensemble dplyr::mutate(weight = (orig_weight / ndraws) * 100) %>% dplyr::mutate(mod = as.factor(mod)) %>% dplyr::select(mod, weight) # Create draw indices mod_inds <- as.factor(unlist( lapply(seq_len(n_models), function(x) { rep(paste0('mod', x), NROW(models[[x]]$forecasts[[1]])) }), use.names = FALSE )) all_draw_inds <- 1:sum(unlist(n_mod_draws, use.names = FALSE)) mod_inds_draws <- split(all_draw_inds, mod_inds) # Add model-specific weights to the draw indices draw_weights <- data.frame(draw = all_draw_inds, mod = mod_inds) %>% dplyr::left_join(mod_weights, by = 'mod') # Perform multinomial sampling using draw-specific weights fc_draws <- sample( all_draw_inds, size = ndraws, replace = max(all_draw_inds) < ndraws, prob = draw_weights$weight ) # Create weighted ensemble hindcasts and forecasts ens_hcs <- lapply(seq_len(n_series), function(series) { all_hcs <- do.call(rbind, lapply(models, function(x) x$hindcasts[[series]])) all_hcs[fc_draws, ] }) ens_fcs <- lapply(seq_len(n_series), function(series) { all_fcs <- do.call(rbind, lapply(models, function(x) x$forecasts[[series]])) all_fcs[fc_draws, ] }) # Initiate the ensemble forecast object ens_fc <- models[[1]] # Add in hindcasts and forecasts ens_fc$hindcasts <- ens_hcs ens_fc$forecasts <- ens_fcs names(ens_fc$hindcasts) <- names(models[[1]]$hindcasts) names(ens_fc$forecasts) <- names(models[[1]]$forecasts) # Return return(ens_fc) } #'@noRd split_fc_dots = function(x, ..., model_names = NULL, other = TRUE) { dots <- list(x, ...) names <- substitute(list(x, ...), env = parent.frame())[-1] names <- ulapply(names, deparse) if (!is.null(model_names)) { names <- model_names } if (length(names)) { if (!length(names(dots))) { names(dots) <- names } else { has_no_name <- !nzchar(names(dots)) names(dots)[has_no_name] <- names[has_no_name] } } is_mvgam_fc <- unlist(lapply(dots, function(y) inherits(y, 'mvgam_forecast'))) models <- dots[is_mvgam_fc] out <- dots[!is_mvgam_fc] if (length(out)) { stop( "Only mvgam_forecast objects can be passed to '...' for this method.", call. = FALSE ) } models } ================================================ FILE: R/evaluate_mvgams.R ================================================ #' Evaluate forecasts from fitted \pkg{mvgam} objects #' #' @importFrom graphics barplot boxplot axis #' @importFrom stats quantile ecdf median predict #' @importFrom grDevices devAskNewPage #' @importFrom utils lsf.str #' #' @param object \code{list} object returned from \code{mvgam} #' #' @param n_samples \code{integer} specifying the number of samples to generate #' from the model's posterior distribution #' #' @param eval_timepoint \code{integer} indexing the timepoint that represents #' our last 'observed' set of outcome data #' #' @param fc_horizon \code{integer} specifying the length of the forecast #' horizon for evaluating forecasts #' #' @param n_cores Deprecated. Parallel processing is no longer supported #' #' @param score \code{character} specifying the type of ranked probability score #' to use for evaluation. Options are: `variogram`, `drps` or `crps` #' #' @param log \code{logical}. Should the forecasts and truths be logged prior #' to scoring? This is often appropriate for comparing performance of models #' when series vary in their observation ranges #' #' @param weights optional \code{vector} of weights (where #' \code{length(weights) == n_series}) for weighting pairwise correlations #' when evaluating the variogram score for multivariate forecasts. Useful for #' down-weighting series that have larger magnitude observations or that are #' of less interest when forecasting. Ignored if \code{score != 'variogram'} #' #' @details `eval_mvgam` may be useful when both repeated fitting of a model #' using \code{\link{update.mvgam}} for exact leave-future-out cross-validation #' and approximate leave-future-out cross-validation using \code{\link{lfo_cv}} #' are impractical. The function generates a set of samples representing fixed #' parameters estimated from the full \code{mvgam} model and latent trend states #' at a given point in time. The trends are rolled forward a total of #' \code{fc_horizon} timesteps according to their estimated state space dynamics #' to generate an 'out-of-sample' forecast that is evaluated against the true #' observations in the horizon window. This function therefore simulates a #' situation where the model's parameters had already been estimated but we have #' only observed data up to the evaluation timepoint and would like to generate #' forecasts from the latent trends that have been observed up to that timepoint. #' Evaluation involves calculating an appropriate Rank Probability Score and a #' binary indicator for whether or not the true value lies within the forecast's #' 90% prediction interval #' #' `roll_eval_mvgam` sets up a sequence of evaluation timepoints along a rolling #' window and iteratively calls \code{eval_mvgam} to evaluate 'out-of-sample' #' forecasts. Evaluation involves calculating the Rank Probability Scores and a #' binary indicator for whether or not the true value lies within the forecast's #' 90% prediction interval #' #' `compare_mvgams` automates the evaluation to compare two fitted models using #' rolling window forecast evaluation and provides a series of summary plots to #' facilitate model selection. It is essentially a wrapper for #' \code{roll_eval_mvgam} #' #' @return For `eval_mvgam`, a \code{list} object containing information on #' specific evaluations for each series (if using `drps` or `crps` as the score) #' or a vector of scores when using `variogram`. #' #' For `roll_eval_mvgam`, a \code{list} object containing information on specific #' evaluations for each series as well as a total evaluation summary (taken by #' summing the forecast score for each series at each evaluation and averaging #' the coverages at each evaluation) #' #' For `compare_mvgams`, a series of plots comparing forecast Rank Probability #' Scores for each competing model. A lower score is preferred. Note however #' that it is possible to select a model that ultimately would perform poorly #' in true out-of-sample forecasting. For example if a wiggly smooth function #' of 'year' is included in the model then this function will be learned prior #' to evaluating rolling window forecasts, and the model could generate very #' tight predictions as a result. But when forecasting ahead to timepoints that #' the model has not seen (i.e. next year), the smooth function will end up #' extrapolating, sometimes in very strange and unexpected ways. It is therefore #' recommended to only use smooth functions for covariates that are adequately #' measured in the data (i.e. 'seasonality', for example) to reduce possible #' extrapolation of smooths and let the latent trends in the \code{mvgam} model #' capture any temporal dependencies in the data. These trends are time series #' models and so will provide much more stable forecasts #' #' @seealso \code{\link{forecast}}, \code{\link{score}}, \code{\link{lfo_cv}} #' #' @examples #' \dontrun{ #' # Simulate from a Poisson-AR2 model with a seasonal smooth #' set.seed(1) #' dat <- sim_mvgam( #' T = 75, #' n_series = 1, #' prop_trend = 0.75, #' trend_model = AR(p = 2), #' family = poisson() #' ) #' #' # Fit an appropriate model #' mod_ar2 <- mvgam( #' formula = y ~ s(season, bs = 'cc'), #' trend_model = AR(p = 2), #' family = poisson(), #' data = dat$data_train, #' newdata = dat$data_test, #' chains = 2, #' silent = 2 #' ) #' #' # Fit a less appropriate model #' mod_rw <- mvgam( #' formula = y ~ 1, #' trend_model = RW(), #' family = poisson(), #' data = dat$data_train, #' newdata = dat$data_test, #' chains = 2, #' silent = 2 #' ) #' #' # Compare Discrete Ranked Probability Scores for the testing period #' fc_ar2 <- forecast(mod_ar2) #' fc_rw <- forecast(mod_rw) #' score_ar2 <- score( #' object = fc_ar2, #' score = 'drps' #' ) #' score_rw <- score( #' object = fc_rw, #' score = 'drps' #' ) #' sum(score_ar2$series_1$score) #' sum(score_rw$series_1$score) #' #' # Use rolling evaluation for approximate comparisons of 3-step ahead #' # forecasts across the training period #' compare_mvgams( #' model1 = mod_ar2, #' model2 = mod_rw, #' fc_horizon = 3, #' n_samples = 1000, #' n_evaluations = 5 #' ) #' #' # A more appropriate comparison would be to use approximate #' # leave-future-out CV to compare forecasts (see ?mvgam::lfo_cv()) #' } #' @name evaluate_mvgams NULL #' @rdname evaluate_mvgams #' @export eval_mvgam = function( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = 'drps', log = FALSE, weights ) { # Check arguments if (!(inherits(object, 'mvgam'))) { stop('argument "object" must be of class "mvgam"') } if (attr(object$model_data, 'trend_model') == 'None') { stop( 'cannot compute rolling forecasts for mvgams that have no trend model', call. = FALSE ) } validate_pos_integer(fc_horizon) validate_pos_integer(eval_timepoint) validate_pos_integer(n_cores) if (n_cores > 1L) { message('argument "n_cores" is deprecated') } validate_pos_integer(n_samples) if (eval_timepoint < 3) { stop('argument "eval_timepoint" must be >= 3', call. = FALSE) } #### 1. Prepare the data at the right timepoint #### data_train <- object$obs_data n_series <- NCOL(object$ytimes) # Check evaluation timepoint if (inherits(object$obs_data, 'list')) { all_times <- (data.frame(time = object$obs_data$time) %>% dplyr::select(time) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::mutate(time = dplyr::row_number())) %>% dplyr::pull(time) } else { all_times <- (object$obs_data %>% dplyr::select(time) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::mutate(time = dplyr::row_number())) %>% dplyr::pull(time) } if (!eval_timepoint %in% all_times) { stop('Evaluation timepoint does not exist in original training data') } # Filter training data to correct point (just following evaluation timepoint) data.frame(time = object$obs_data$time, series = object$obs_data$series) %>% dplyr::mutate(row_number = dplyr::row_number()) %>% dplyr::left_join( data.frame(time = object$obs_data$time, series = object$obs_data$series), by = c('time', 'series') ) %>% dplyr::arrange(time, series) %>% dplyr::filter( time > (eval_timepoint) & time <= (eval_timepoint + fc_horizon) ) %>% dplyr::pull(row_number) -> assim_rows if (inherits(object$obs_data, 'list')) { data_assim <- lapply(object$obs_data, function(x) { if (is.matrix(x)) { matrix(x[assim_rows, ], ncol = NCOL(x)) } else { x[assim_rows] } }) } else { object$obs_data[assim_rows, ] -> data_assim } #### 2. Generate the forecast distribution #### draw_fcs <- forecast_draws( object = object, type = 'response', series = 'all', data_test = data_assim, n_samples = n_samples, ending_time = eval_timepoint, n_cores = n_cores ) if (missing(weights)) { weights <- rep(1, NCOL(object$ytimes)) } # Final forecast distribution series_fcs <- lapply(seq_len(n_series), function(series) { indexed_forecasts <- do.call( rbind, lapply(seq_along(draw_fcs), function(x) { draw_fcs[[x]][[series]] }) ) indexed_forecasts }) names(series_fcs) <- levels(data_assim$series) # If variogram score is chosen if (score == 'variogram') { # Get truths (out of sample) into correct format truths <- do.call( rbind, lapply(seq_len(n_series), function(series) { s_name <- levels(data_assim$series)[series] data.frame( series = data_assim$series, y = data_assim$y, time = data_assim$time ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) }) ) series_score <- variogram_mcmc_object( truths = truths, fcs = series_fcs, log = log, weights = weights ) } # If not using variogram score if (score != 'variogram') { # Evaluate against the truth series_truths <- lapply(seq_len(n_series), function(series) { if (class(object$obs_data)[1] == 'list') { data_assim[['y']][which(as.numeric(data_assim$series) == series)] } else { data_assim[which(as.numeric(data_assim$series) == series), 'y'] } }) # Calculate score and interval coverage per series if ( object$family %in% c('poisson', 'negative binomial', 'binomial', 'beta_binomial') ) { series_score <- lapply(seq_len(n_series), function(series) { DRPS <- data.frame(drps_mcmc_object( as.vector(as.matrix(series_truths[[series]])), series_fcs[[series]], log = log )) colnames(DRPS) <- c('score', 'in_interval') DRPS$eval_horizon <- seq(1, fc_horizon) DRPS }) names(series_score) <- levels(data_assim$series) } else { series_score <- lapply(seq_len(n_series), function(series) { CRPS <- data.frame(crps_mcmc_object( as.vector(as.matrix(series_truths[[series]])), series_fcs[[series]], log = log )) colnames(CRPS) <- c('score', 'in_interval') if (log) { CRPS$score <- log(CRPS$score + 0.0001) } CRPS$eval_horizon <- seq(1, fc_horizon) CRPS }) names(series_score) <- levels(data_assim$series) } } return(series_score) } #' @param object \code{list} object returned from \code{mvgam} #' #' @param n_samples \code{integer} specifying the number of samples to generate #' from the model's posterior distribution #' #' @param evaluation_seq Optional \code{integer sequence} specifying the exact #' set of timepoints for evaluating the model's forecasts. This sequence #' cannot have values \code{<3} or #' \code{> max(training timepoints) - fc_horizon} #' #' @param n_evaluations \code{integer} specifying the total number of #' evaluations to perform (ignored if \code{evaluation_seq} is supplied) #' #' @param fc_horizon \code{integer} specifying the length of the forecast #' horizon for evaluating forecasts #' #' @param n_cores Deprecated. Parallel processing is no longer supported #' #' @rdname evaluate_mvgams #' #' @export roll_eval_mvgam = function( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = 'drps', log = FALSE, weights ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (attr(object$model_data, 'trend_model') == 'None') { stop( 'cannot compute rolling forecasts for mvgams that have no trend model', call. = FALSE ) } validate_pos_integer(n_cores) if (n_cores > 1L) { message('argument "n_cores" is deprecated') } validate_pos_integer(n_evaluations) validate_pos_integer(n_samples) validate_pos_integer(fc_horizon) # Generate time variable from training data if (inherits(object$obs_data, 'list')) { all_timepoints <- (data.frame(time = object$obs_data$index..time..index) %>% dplyr::select(time) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::mutate(time = dplyr::row_number())) %>% dplyr::pull(time) } else { all_timepoints <- (object$obs_data %>% dplyr::select(index..time..index) %>% dplyr::distinct() %>% dplyr::arrange(index..time..index) %>% dplyr::mutate(time = dplyr::row_number())) %>% dplyr::pull(time) } # Generate evaluation sequence if not supplied if (missing(evaluation_seq)) { evaluation_seq <- floor(seq( from = 3, to = (max(all_timepoints) - fc_horizon), length.out = n_evaluations )) } # Check evaluation sequence if (min(evaluation_seq) < 3) { stop('Evaluation sequence cannot start before timepoint 3') } if (max(evaluation_seq) > (max(all_timepoints) - fc_horizon)) { stop( 'Maximum of evaluation sequence is too large for fc_horizon evaluations' ) } # Loop across evaluation sequence and calculate evaluation metrics if (missing(weights)) { weights <- rep(1, NCOL(object$ytimes)) } evals <- lapply(evaluation_seq, function(timepoint) { eval_mvgam( object = object, n_samples = n_samples, n_cores = 1, eval_timepoint = timepoint, fc_horizon = fc_horizon, score = score, log = log, weights = weights ) }) # Take sum of score at each evaluation point for multivariate models sum_or_na = function(x) { if (all(is.na(x))) { NA } else { sum(x, na.rm = T) } } if (score == 'variogram') { eval_horizons <- do.call( rbind, lapply(seq_along(evals), function(x) { data.frame(seq_along(evals[[x]])) }) ) scores <- do.call( rbind, lapply(seq_along(evals), function(x) { data.frame(evals[[x]]) }) ) evals_df <- data.frame( score = scores, eval_horizon = eval_horizons, in_interval = NA ) colnames(evals_df) <- c('score', 'eval_horizon', 'in_interval') # Calculate summary statistics for each series out <- list( sum_score = sum_or_na(evals_df$score), score_summary = summary(evals_df$score), score_horizon_summary = evals_df %>% dplyr::group_by(eval_horizon) %>% dplyr::summarise(median_score = median(score, na.rm = T)), interval_coverage = NA, series_evals = NA, all_scores = evals_df ) } else { evals_df <- do.call(rbind, do.call(rbind, evals)) %>% dplyr::group_by(eval_horizon) %>% dplyr::summarise( score = sum_or_na(score), in_interval = mean(in_interval, na.rm = T) ) # Calculate summary statistics for each series tidy_evals <- lapply( seq_len(length(levels(object$obs_data$series))), function(series) { all_evals <- do.call( rbind, purrr::map(evals, levels(object$obs_data$series)[series]) ) list( sum_drps = sum_or_na(all_evals$score), score_summary = summary(all_evals$score), score_horizon_summary = all_evals %>% dplyr::group_by(eval_horizon) %>% dplyr::summarise(median_score = mean(score, na.rm = T)), interval_coverage = mean(all_evals$in_interval, na.rm = T), all_scores = all_evals ) } ) names(tidy_evals) <- levels(object$obs_data$series) out <- list( sum_score = sum_or_na(evals_df$score), score_summary = summary(evals_df$score), score_horizon_summary = evals_df %>% dplyr::group_by(eval_horizon) %>% dplyr::summarise(median_score = median(score, na.rm = T)), interval_coverage = mean(evals_df$in_interval, na.rm = T), series_evals = tidy_evals ) } # Return score summary statistics return(out) } #' @param model1 \code{list} object returned from \code{mvgam} representing #' the first model to be evaluated #' #' @param model2 \code{list} object returned from \code{mvgam} representing #' the second model to be evaluated #' #' @param n_samples \code{integer} specifying the number of samples to generate #' from the model's posterior distribution #' #' @param fc_horizon \code{integer} specifying the length of the forecast #' horizon for evaluating forecasts #' #' @param n_evaluations \code{integer} specifying the total number of #' evaluations to perform #' #' @param n_cores Deprecated. Parallel processing is no longer supported #' #' @rdname evaluate_mvgams #' #' @export compare_mvgams = function( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = 'drps', log = FALSE, weights ) { # Check arguments if (!(inherits(model1, "mvgam"))) { stop('argument "model1" must be of class "mvgam"') } if (!(inherits(model2, "mvgam"))) { stop('argument "model2" must be of class "mvgam"') } if (attr(model2$model_data, 'trend_model') == 'None') { stop( 'cannot compare rolling forecasts for mvgams that have no trend model', call. = FALSE ) } if (attr(model1$model_data, 'trend_model') == 'None') { stop( 'cannot compare rolling forecasts for mvgams that have no trend model', call. = FALSE ) } validate_pos_integer(n_evaluations) validate_pos_integer(fc_horizon) validate_pos_integer(n_cores) if (n_cores > 1L) { message('argument "n_cores" is deprecated') } validate_pos_integer(n_samples) # Evaluate the two models if (missing(weights)) { weights <- rep(1, NCOL(model1$ytimes)) } mod1_eval <- roll_eval_mvgam( model1, n_samples = n_samples, fc_horizon = fc_horizon, n_cores = n_cores, n_evaluations = n_evaluations, score = score, log = log, weights = weights ) mod2_eval <- roll_eval_mvgam( model2, n_samples = n_samples, fc_horizon = fc_horizon, n_cores = n_cores, n_evaluations = n_evaluations, score = score, log = log, weights = weights ) # Generate a simple summary of forecast scores for each model model_summary <- rbind(mod1_eval$score_summary, mod2_eval$score_summary) rownames(model_summary) <- c('Model 1', 'Model 2') cat('RPS summaries per model (lower is better)\n') print(model_summary) # Print 90% interval coverages for each model if (score != 'variogram') { cat('\n90% interval coverages per model (closer to 0.9 is better)\n') cat('Model 1', mod1_eval$interval_coverage, '\n') cat('Model 2', mod2_eval$interval_coverage) } # Set up plotting loop and return summary plots of DRPS ask <- TRUE for (i in 1:3) { if (i == 1) { barplot( c('model 1' = mod1_eval$sum_score, 'model 2' = mod2_eval$sum_score), col = c("#B97C7C", "#7C0000"), border = NA, ylab = 'Sum RPS (lower is better)', lwd = 2 ) } else if (i == 2) { boxplot( list( 'model 1' = mod1_eval$score_summary, 'model 2' = mod2_eval$score_summary ), border = c("#B97C7C", "#7C0000"), ylab = 'Sum RPS per evaluation', axes = FALSE ) axis(side = 2, lwd = 2) axis(side = 1, at = c(1, 2), labels = c('model 1', 'model 2'), lwd = 0) } else { plot_dat <- rbind( mod1_eval$score_horizon_summary$median_score, mod2_eval$score_horizon_summary$median_score ) colnames(plot_dat) <- seq(1:NCOL(plot_dat)) ylim = c( min(0, min(plot_dat, na.rm = TRUE)), max(plot_dat, na.rm = T) * 1.4 ) barplot( plot_dat, ylim = ylim, beside = T, xlab = 'Forecast horizon', ylab = 'Median RPS', col = c("#B97C7C", "#7C0000"), lwd = 2, border = NA, legend.text = c('Model 1', 'Model 2'), args.legend = list(x = "top", ncol = 2, border = NA, bty = 'n') ) } if (ask) { oask <- devAskNewPage(TRUE) on.exit(devAskNewPage(oask)) ask <- FALSE } } invisible() } #' @noRd crps_edf <- function(y, dat, w = NULL) { if (is.null(w)) { c_1n <- 1 / length(dat) x <- sort(dat) a <- seq.int(0.5 * c_1n, 1 - 0.5 * c_1n, length.out = length(dat)) f <- function(s) 2 * c_1n * sum(((s < x) - a) * (x - s)) } else { if (!identical(length(dat), length(w)) || any(w < 0, na.rm = TRUE)) { return(rep(NaN, length(y))) } ord <- order(dat) x <- dat[ord] w <- w[ord] p <- cumsum(w) P <- p[length(p)] a <- (p - 0.5 * w) / P f <- function(s) 2 / P * sum(w * ((s < x) - a) * (x - s)) } sapply(y, f) } # Compute CRPS # code borrowed from scoringRules: https://github.com/FK83/scoringRules/blob/master/R/scores_sample_univ.R #' @noRd crps_score <- function( truth, fc, method = "edf", w = NULL, interval_width = 0.9, log = FALSE ) { if (log) { truth <- log(truth + 0.001) fc <- log(fc + 0.001) } if (identical(length(truth), 1L) && is.vector(fc)) { score <- crps_edf(truth, fc, w) } else { score <- sapply( seq_along(truth), function(i) crps_edf(truth[i], fc[i, ], w[i, ]) ) } # Is value within empirical interval? interval <- quantile( fc, probs = c( (1 - interval_width) / 2, (interval_width + (1 - interval_width) / 2) ), na.rm = TRUE ) in_interval <- ifelse(truth <= interval[2] & truth >= interval[1], 1, 0) return(c(score, in_interval)) } # Compute DRPS #' @noRd drps_score <- function(truth, fc, interval_width = 0.9, log = FALSE) { if (log) { truth <- log(truth + 0.001) fc <- log(fc + 0.001) nsum <- max(c(truth, fc), na.rm = TRUE) + 5 } else { nsum <- max( c(truth, quantile(fc, probs = 0.99, na.rm = TRUE)), na.rm = TRUE ) + 1000 } Fy = ecdf(fc) ysum <- 0:nsum indicator <- ifelse(ysum - truth >= 0, 1, 0) score <- sum((indicator - Fy(ysum))^2) # Is value within empirical interval? interval <- quantile( fc, probs = c( (1 - interval_width) / 2, (interval_width + (1 - interval_width) / 2) ), na.rm = TRUE ) in_interval <- ifelse(truth <= interval[2] & truth >= interval[1], 1, 0) return(c(score, in_interval)) } # Compute the scaled interval score #' @noRd sis_score <- function(truth, fc, interval_width = 0.9, log = FALSE) { if (log) { truth <- log(truth + 0.001) fc <- log(fc + 0.001) } lower_prob <- (1 - interval_width) / 2 upper_prob <- 1 - lower_prob creds <- quantile(fc, probs = c(lower_prob, upper_prob), na.rm = TRUE) cred_lower <- creds[1] cred_upper <- creds[2] alpha <- 2 / (2 * lower_prob) cred_interval <- (cred_upper - cred_lower) / 2 err_up <- truth - cred_upper err_low <- cred_lower - truth # SIS score <- 2 * cred_interval + alpha * err_up * (err_up > 0) + alpha * err_low * (err_low > 0) # Is value within empirical interval? interval <- quantile( fc, probs = c( (1 - interval_width) / 2, (interval_width + (1 - interval_width) / 2) ), na.rm = TRUE ) in_interval <- ifelse(truth <= interval[2] & truth >= interval[1], 1, 0) return(c(score, in_interval)) } # Compute the Brier score #' @noRd brier_score <- function(truth, fc, interval_width = 0.9) { score <- (truth - fc)^2 score <- sum(score) / length(score) # Cannot evaluate coverage for binary truths in_interval <- NA return(c(score, in_interval)) } #' Compute the multivariate energy score #' @noRd energy_score <- function(truth, fc, log = FALSE) { insight::check_if_installed( "scoringRules", reason = 'to calculate energy scores' ) # es_sample can't handle any NAs has_nas <- apply(fc, 2, function(x) any(is.na(x))) fc <- fc[, !has_nas] if (log) { truth <- log(truth + 0.001) fc <- log(fc + 0.001) } es <- scoringRules::es_sample(y = truth, dat = fc) return(es) } #' Compute the variogram score, using the median pairwise difference #' from the forecast distribution (scoringRules::vs_sample uses the #' mean, which is not appropriate for skewed distributions) #' @noRd variogram_score = function(truth, fc, log = FALSE, weights) { if (log) { truth <- log(truth + 0.001) fc <- log(fc + 0.001) } # Use weight of 1 for each pairwise combination if no weights # are supplied; else take the product of each pair of weights if (missing(weights)) { weights <- matrix(1, nrow = length(truth), ncol = length(truth)) } else { weights <- outer(weights, weights, FUN = function(X, Y) { (X + Y) / 2 }) } out <- matrix(NA, length(truth), length(truth)) for (i in 1:length(truth)) { for (j in 1:length(truth)) { if (i == j) { out[i, j] <- 0 } else { v_fc <- quantile(abs(fc[i, ] - fc[j, ])^0.5, 0.5, na.rm = TRUE) v_dat <- abs(truth[i] - truth[j])^0.5 out[i, j] <- 2 * weights[i, j] * ((v_dat - v_fc)^2) } } } # Divide by two as we have (inefficiently) computed each pairwise # comparison twice score <- sum(out) / 2 } #' Compute the energy score on all observations in fc_horizon #' @noRd energy_mcmc_object <- function(truths, fcs, log = FALSE, weights) { fc_horizon <- length(fcs[[1]][1, ]) fcs_per_horizon <- lapply(seq_len(fc_horizon), function(horizon) { do.call( rbind, lapply(seq_along(fcs), function(fc) { fcs[[fc]][, horizon] }) ) }) unlist(lapply(seq_len(fc_horizon), function(horizon) { energy_score( truth = truths[, horizon], fc = fcs_per_horizon[[horizon]], log = log ) })) } #' Compute the Brier score on all observations in fc_horizon #' @noRd brier_mcmc_object <- function(truth, fc, log = FALSE, weights) { indices_keep <- which(!is.na(truth)) if (length(indices_keep) == 0) { scores = data.frame( 'brier' = rep(NA, length(truth)), 'interval' = rep(NA, length(truth)) ) } else { scores <- matrix(NA, nrow = length(truth), ncol = 2) for (i in indices_keep) { scores[i, ] <- brier_score(truth = as.vector(truth)[i], fc = fc[, i]) } } scores } #' Wrapper to calculate variogram score on all observations in fc_horizon #' @noRd variogram_mcmc_object <- function(truths, fcs, log = FALSE, weights) { fc_horizon <- length(fcs[[1]][1, ]) fcs_per_horizon <- lapply(seq_len(fc_horizon), function(horizon) { do.call( rbind, lapply(seq_along(fcs), function(fc) { fcs[[fc]][, horizon] }) ) }) unlist(lapply(seq_len(fc_horizon), function(horizon) { variogram_score( truth = truths[, horizon], fc = fcs_per_horizon[[horizon]], log = log, weights = weights ) })) } # Wrapper to calculate DRPS scores on all observations in fc_horizon #' @noRd drps_mcmc_object <- function(truth, fc, interval_width = 0.9, log = FALSE) { indices_keep <- which(!is.na(truth)) if (length(indices_keep) == 0) { scores = data.frame( 'drps' = rep(NA, length(truth)), 'interval' = rep(NA, length(truth)) ) } else { scores <- matrix(NA, nrow = length(truth), ncol = 2) for (i in indices_keep) { scores[i, ] <- drps_score( truth = as.vector(truth)[i], fc = fc[, i], interval_width, log = log ) } } scores } # Wrapper to calculate % #' # add 'series' information, which is an identifier of site, replicate #' # and species #' dplyr::mutate( #' series = paste0( #' 'site_', site, #' '_', species, #' '_rep_', replicate #' ), #' time = as.numeric(time), #' # add a 'cap' variable that defines the maximum latent N to #' # marginalize over when estimating latent abundance; in other words #' # how large do we realistically think the true abundance could be? #' cap = 80 #' ) %>% #' dplyr::select(-replicate) -> testdat #' #' # Now add another species that has a different temporal trend and a #' # smaller detection probability (0.45 for this species) #' testdat <- testdat %>% #' dplyr::bind_rows( #' data.frame( #' site = 1, #' replicate = rep(1:5, 6), #' time = sort(rep(1:6, 5)), #' species = 'sp_2', #' truth = c( #' rep(4, 5), #' rep(7, 5), #' rep(15, 5), #' rep(16, 5), #' rep(19, 5), #' rep(18, 5) #' ), #' obs = c( #' rbinom(5, 4, 0.45), #' rbinom(5, 7, 0.45), #' rbinom(5, 15, 0.45), #' rbinom(5, 16, 0.45), #' rbinom(5, 19, 0.45), #' rbinom(5, 18, 0.45) #' ) #' ) %>% #' dplyr::mutate( #' series = paste0( #' 'site_', site, #' '_', species, #' '_rep_', replicate #' ), #' time = as.numeric(time), #' cap = 50 #' ) %>% #' dplyr::select(-replicate) #' ) #' #' # series identifiers #' testdat$species <- factor( #' testdat$species, #' levels = unique(testdat$species) #' ) #' testdat$series <- factor( #' testdat$series, #' levels = unique(testdat$series) #' ) #' #' # The trend_map to state how replicates are structured #' testdat %>% #' # each unique combination of site*species is a separate process #' dplyr::mutate( #' trend = as.numeric(factor(paste0(site, species))) #' ) %>% #' dplyr::select(trend, series) %>% #' dplyr::distinct() -> trend_map #' trend_map #' #' # Fit a model #' mod <- mvgam( #' # the observation formula sets up linear predictors for #' # detection probability on the logit scale #' formula = obs ~ species - 1, #' #' # the trend_formula sets up the linear predictors for #' # the latent abundance processes on the log scale #' trend_formula = ~ s(time, by = trend, k = 4) + species, #' #' # the trend_map takes care of the mapping #' trend_map = trend_map, #' #' # nmix() family and data #' family = nmix(), #' data = testdat, #' #' # priors can be set in the usual way #' priors = c( #' prior(std_normal(), class = b), #' prior(normal(1, 1.5), class = Intercept_trend) #' ), #' chains = 2 #' ) #' #' # The usual diagnostics #' summary(mod) #' #' # Plotting conditional effects #' library(ggplot2) #' #' plot_predictions( #' mod, #' condition = 'species', #' type = 'detection' #' ) + #' ylab('Pr(detection)') + #' ylim(c(0, 1)) + #' theme_classic() + #' theme(legend.position = 'none') #' #' # ============================================================================= #' # Binomial Models #' # ============================================================================= #' #' # Simulate two time series of Binomial trials #' trials <- sample(c(20:25), 50, replace = TRUE) #' x <- rnorm(50) #' detprob1 <- plogis(-0.5 + 0.9 * x) #' detprob2 <- plogis(-0.1 - 0.7 * x) #' dat <- rbind( #' data.frame( #' y = rbinom(n = 50, size = trials, prob = detprob1), #' time = 1:50, #' series = 'series1', #' x = x, #' ntrials = trials #' ), #' data.frame( #' y = rbinom(n = 50, size = trials, prob = detprob2), #' time = 1:50, #' series = 'series2', #' x = x, #' ntrials = trials #' ) #' ) #' dat <- dplyr::mutate(dat, series = as.factor(series)) #' dat <- dplyr::arrange(dat, time, series) #' #' # Fit a model using the binomial() family; must specify observations #' # and number of trials in the cbind() wrapper #' mod <- mvgam( #' cbind(y, ntrials) ~ series + s(x, by = series), #' family = binomial(), #' data = dat #' ) #' summary(mod) #' } #' #' @export nmix = function(link = 'log') { linktemp <- make.link('log') structure( list( family = "nmix", link = 'log', linkfun = linktemp$linkfun, linkinv = linktemp$linkinv, mu.eta = linktemp$mu.eta, valideta = linktemp$valideta ), class = c("extended.family", "family") ) } #### Non-exported functions for performing family-specific tasks #### #' Family options in character format #' @noRd family_char_choices = function() { c( 'negative binomial', "poisson", "binomial", 'beta_binomial', "bernoulli", "nmix", "tweedie", "beta", "gaussian", "lognormal", "student", "Gamma" ) } # Convert location / precision parameters to shape parameters for the beta distribution # Original author: Andrew Heiss (https://www.andrewheiss.com/blog/2021/11/08/beta-regression-guide/) #' @noRd beta_shapes = function(mu, phi) { return(list(shape1 = mu * phi, shape2 = (1 - mu) * phi)) } # Calculate all possible Poisson log-densities for N-mixture simulation #' @noRd pois_dens = function(min_cap, max_cap, lambdas) { # Identify which indices share the exact same lambda AND # k value so that we only need to run dpois once for each group data.frame(lambdas, min_cap, max_cap) %>% dplyr::group_by(lambdas) %>% dplyr::summarise( min_cap = min(min_cap, na.rm = TRUE), max_cap = max(max_cap, na.rm = TRUE) ) -> group_inds l <- mapply(`:`, group_inds$min_cap, group_inds$max_cap) data.frame( k = unlist(l), lambda = group_inds$lambdas[rep(1:nrow(group_inds), lengths(l))] ) %>% dplyr::mutate(pois_dens = dpois(k, lambda, log = TRUE)) -> all_ks return(all_ks) } #' Generic prediction function #' @importFrom stats predict #' @param Xp A `mgcv` linear predictor matrix #' @param family \code{character}. The `family` slot of the model's family argument #' @param betas Vector of regression coefficients of length `NCOL(Xp)` #' @param latent_lambdas Optional vector of latent abundance estimates, only used for N-Mixture models #' @param cap Optional vector of latent abundance maximum capacities, only used for N-Mixture models #' @param type Either `link`, `expected`, `response`, `variance`, #' `latent_N` (only applies to N-mixture distributions) or #' `detection` (only applies to N-mixture distributions) #' @param family_pars Additional arguments for each specific observation process (i.e. #' overdispersion parameter if `family == "nb"`) #' @param density logical. Rather than calculating a prediction, evaluate the log-likelihood. #' Use this option when particle filtering #' @param truth Observation to use for evaluating the likelihood (if `density == TRUE`) #' @details A generic prediction function that will make it easier to add new #' response distributions in future. Use `type = variance` for computing family-level #' variance as a function of the mean #' @noRd mvgam_predict = function( Xp, family, betas, latent_lambdas, cap, min_cap, type = 'link', family_pars, density = FALSE, truth = NULL ) { if (type == 'latent_N' & family != 'nmix') { stop('"latent_N" type only available for N-mixture models', call. = FALSE) } if (type == 'detection' & family != 'nmix') { stop('"detection" type only available for N-mixture models', call. = FALSE) } # Poisson-Binomial N-Mixture (requires family parameter # 'cap' as well as 'latent_lambdas' argument) if (family == 'nmix') { insight::check_if_installed( "extraDistr", reason = 'to simulate from N-Mixture distributions' ) insight::check_if_installed( "wrswoR", reason = 'to simulate from N-Mixture distributions' ) # Calculate detection probability and convert to probability scale p <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) p <- plogis(p) # Latent mean of State vector lambdas <- as.vector(latent_lambdas) # User-specified cap on latent abundance cap <- as.vector(cap) if (type == 'detection') { out <- p } else if (type == 'link') { # 'link' predictions are expectations of the latent abundance out <- lambdas if (density) { out <- unlist( lapply(seq_along(truth), function(i) { if (is.na(truth[i])) { output <- NA } else { ks <- truth[i]:cap[i] lik_binom <- dbinom(truth[i], size = ks, prob = p[i], log = TRUE) lik_poisson <- dpois(x = ks, lambda = lambdas[i], log = TRUE) loglik <- lik_binom + lik_poisson output <- log_sum_exp(loglik) } output }), use.names = FALSE ) } } else if (type == 'latent_N') { if (missing(min_cap)) { min_cap <- 0 } min_cap <- as.vector(min_cap) if (missing(truth)) { out <- extraDistr::rtpois( n = length(lambdas), lambda = lambdas, a = min_cap, b = cap ) } else { # If true observed N is supplied, we can calculate the # most likely latent N given the covariates and the estimated # detection probability out <- unlist( lapply(seq_along(truth), function(i) { if (is.na(truth[i])) { output <- NA } else { ks <- min_cap[[i]]:cap[[i]] lik <- exp( dbinom(truth[[i]], size = ks, prob = p[[i]], log = TRUE) + dpois(x = ks, lambda = lambdas[i], log = TRUE) ) probs <- lik / sum(lik) probs[!is.finite(probs)] <- 0 output <- ks[wrswoR::sample_int_ccrank( length(ks), size = 1L, prob = probs )] } output }), use.names = FALSE ) } } else if (type == 'response') { xpred <- extraDistr::rtpois( n = length(lambdas), lambda = lambdas, b = cap ) out <- rbinom(length(lambdas), size = xpred, prob = p) } else if (type == 'variance') { xpred <- extraDistr::rtpois( n = length(lambdas), lambda = lambdas, b = cap ) # Variance of a Binomial distribution using the # weights convention from stats::glm() mu <- p / xpred out <- mu * (1 - mu) } else { # Expectations xpred <- extraDistr::rtpois( n = length(lambdas), lambda = lambdas, b = cap ) out <- xpred * p } } # Gaussian observations (requires family parameter 'sigma_obs') if (family == 'gaussian') { if (type %in% c('link', 'expected')) { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dnorm( truth, mean = out, sd = as.vector(family_pars$sigma_obs), log = TRUE ) } } else if (type == 'variance') { out <- rep.int(1, NROW(Xp)) } else { out <- rnorm( n = NROW(Xp), mean = ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset'), sd = as.vector(family_pars$sigma_obs) ) } } # LogNormal observations (requires family parameter 'sigma_obs') if (family == 'lognormal') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dlnorm( truth, meanlog = out, sdlog = as.vector(family_pars$sigma_obs), log = TRUE ) } } else if (type == 'response') { out <- rlnorm( n = NROW(Xp), meanlog = ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset'), sdlog = as.vector(family_pars$sigma_obs) ) } else if (type == 'variance') { mu <- ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') sd <- as.vector(family_pars$sigma_obs) out <- as.vector((exp((sd)^2) - 1) * exp((2 * mu + sd^2))) } else { mu <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) out <- exp(mu + (as.vector(family_pars$sigma_obs)^2 / 2)) } } # Student-T observations (requires family parameters 'nu', 'sigma_obs') if (family == 'student') { if (type %in% c('link', 'expected')) { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dstudent_t( truth, df = as.vector(family_pars$nu), mu = out, sigma = as.vector(family_pars$sigma_obs), log = TRUE ) } } else if (type == 'variance') { out <- as.vector(family_pars$sigma_obs)^2 * as.vector(family_pars$nu) / pmax(1.01, (as.vector(family_pars$nu) - 2)) } else { out <- rstudent_t( n = NROW(Xp), df = family_pars$nu, mu = ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset'), sigma = as.vector(family_pars$sigma_obs) ) } } # Poisson observations if (family == 'poisson') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dpois(truth, lambda = exp(out), log = TRUE) } } else if (type == 'response') { out <- rpois( n = NROW(Xp), lambda = exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ) ) } else if (type == 'variance') { out <- exp(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) } else { out <- exp(as.vector( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') )) } } # Bernoulli observations if (family == 'bernoulli') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dbinom(truth, prob = plogis(out), size = 1, log = TRUE) } } else if (type == 'response') { out <- rbinom( n = NROW(Xp), prob = plogis( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ), size = 1 ) } else if (type == 'variance') { mu <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) out <- mu * (1 - mu) } else { out <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) } } # Binomial observations (requires argument 'trials') if (family == 'binomial') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dbinom( truth, prob = plogis(out), size = as.vector(family_pars$trials), log = TRUE ) } } else if (type == 'response') { out <- rbinom( n = NROW(Xp), prob = plogis( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ), size = as.vector(family_pars$trials) ) } else if (type == 'variance') { mu <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) / as.vector(family_pars$trials) out <- mu * (1 - mu) } else { out <- plogis( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ) * as.vector(family_pars$trials) } } # Beta_Binomial observations (requires arguments 'trials' and 'phi') if (family == 'beta_binomial') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dbeta_binomial( truth, mu = plogis(out), phi = as.vector(family_pars$phi), size = as.vector(family_pars$trials), log = TRUE ) } } else if (type == 'response') { out <- rbeta_binomial( n = NROW(Xp), mu = plogis( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ), phi = as.vector(family_pars$phi), size = as.vector(family_pars$trials) ) } else if (type == 'variance') { mu <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) # https://en.wikipedia.org/wiki/Beta-binomial_distribution alpha <- mu * as.vector(family_pars$phi) beta <- (1 - mu) * as.vector(family_pars$phi) p <- 1 / (alpha + beta + 1) n <- as.vector(family_pars$trials) out <- ((n * p) * (1 - p)) * ((alpha + beta + n) / (alpha + beta + 1)) } else { out <- as.vector( plogis( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ) * as.vector(family_pars$trials) ) } } # Negative Binomial observations (requires argument 'phi') if (family == 'negative binomial') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dnbinom( truth, mu = exp(out), size = as.vector(family_pars$phi), log = TRUE ) } } else if (type == 'response') { out <- rnbinom( n = NROW(Xp), mu = exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ), size = as.vector(family_pars$phi) ) } else if (type == 'variance') { mu <- exp(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) out <- mu + mu^2 / as.vector(family_pars$phi) } else { out <- as.vector(exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') )) } } # Beta observations (requires argument 'phi') if (family == 'beta') { shape_pars <- beta_shapes( mu = plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )), phi = as.vector(family_pars$phi) ) if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dbeta( truth, shape1 = shape_pars$shape1, shape2 = shape_pars$shape2, log = TRUE ) } } else if (type == 'response') { out <- rbeta( n = NROW(Xp), shape1 = shape_pars$shape1, shape2 = shape_pars$shape2 ) } else if (type == 'variance') { mu <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) out <- mu * (1 - mu) / (1 + as.vector(family_pars$phi)) } else { out <- plogis(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )) } } # Gamma observations (requires argument 'shape') if (family == 'Gamma') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- dgamma( truth, rate = as.vector(family_pars$shape) / exp(out), shape = as.vector(family_pars$shape), log = TRUE ) } } else if (type == 'response') { out <- rgamma( n = NROW(Xp), rate = as.vector(family_pars$shape) / exp(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )), shape = as.vector(family_pars$shape) ) } else if (type == 'variance') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') )^2 } else { out <- as.vector(family_pars$shape) / (as.vector(family_pars$shape) / exp(as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ))) } } # Tweedie observations (requires argument 'phi') if (family == 'tweedie') { if (type == 'link') { out <- as.vector( (matrix(Xp, ncol = NCOL(Xp)) %*% betas) + attr(Xp, 'model.offset') ) if (density) { out <- mgcv::ldTweedie( y = truth, mu = exp(out), # Power parameter is fixed p = 1.5, phi = as.vector(family_pars$phi), all.derivs = F )[, 1] } } else if (type == 'response') { out <- rpois( n = NROW(Xp), lambda = mgcv::rTweedie( mu = exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') ), # Power parameter is fixed p = 1.5, phi = as.vector(family_pars$phi) ) ) } else if (type == 'variance') { out <- (exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') )^1.5) * as.vector(family_pars$phi) } else { out <- as.vector(exp( ((matrix(Xp, ncol = NCOL(Xp)) %*% betas)) + attr(Xp, 'model.offset') )) } } return(out) } #' Set which family to use when calculating default intercept priors #' in brms #' @noRd family_to_brmsfam = function(family) { if (family$family == 'beta') { brms::Beta() } else if (family$family == 'Beta regression') { brms::Beta() } else if (family$family == 'student') { brms::student() } else if (family$family %in% c('tweedie', 'negative binomial')) { brms::negbinomial() } else if (family$family == 'Gamma') { Gamma(link = 'log') } else { family } } #' Set which family to use when setting up the gam object #' Stick to Gaussian where possible to ensure the initial setup #' doesn't fail #' @noRd family_to_mgcvfam = function(family) { if (family$family == 'beta') { mgcv::betar() } else if (family$family == 'student') { gaussian() } else if (family$family %in% c('gamma', 'Gamma', 'lognormal')) { gaussian() } else if (family$family == 'tweedie') { mgcv::Tweedie(p = 1.5, link = 'log') } else if (family$family == 'nmix') { poisson() } else if (family$family %in% c('bernoulli', 'beta_binomial')) { binomial() } else { family } } #' Set which family to use when setting up the jagam object #' @noRd family_to_jagamfam = function(family) { if (family %in% c('gaussian', 'student')) { gaussian() } else { poisson() } } #' Define links used for the mean #' @noRd family_links = function(family) { if (family %in% c('gaussian', 'lognormal', 'student')) { out <- 'identity' } if ( family %in% c('Gamma', 'poisson', 'negative binomial', 'tweedie', 'nmix') ) { out <- 'log' } if (family %in% c('beta', 'binomial', 'bernoulli', 'beta_binomial')) { out <- 'logit' } out } #' @noRd family_invlinks = function(family) { if (family %in% c('gaussian', 'lognormal', 'student')) { out <- function(x) { x } } if (family %in% c('Gamma', 'poisson', 'negative binomial', 'tweedie')) { out <- function(x) { exp(x) } } if (family %in% c('beta', 'binomial', 'bernoulli', 'beta_binomial')) { out <- function(x) { plogis(x) } } out } #' Parameters to monitor / extract depending on the observation family #' @param family Character string of family name #' @return Named list with parameter names and descriptive labels #' @noRd family_param_info = function(family) { # Define family parameter specifications family_specs <- list( gaussian = list( param_names = "sigma_obs", labels = "observation_error" ), lognormal = list( param_names = "sigma_obs", labels = "log_observation_error" ), student = list( param_names = c("sigma_obs", "nu"), labels = c("observation_error", "observation_df") ), Gamma = list( param_names = "shape", labels = "observation_shape" ), beta = list( param_names = "phi", labels = "observation_precision" ), beta_binomial = list( param_names = "phi", labels = "observation_dispersion" ), "negative binomial" = list( param_names = "phi", labels = "observation_dispersion" ), tweedie = list( param_names = "phi", labels = "observation_dispersion" ), nmix = list( param_names = "detprob", labels = "detection_probability" ), poisson = list( param_names = character(0), labels = character(0) ), binomial = list( param_names = character(0), labels = character(0) ), bernoulli = list( param_names = character(0), labels = character(0) ) ) # Return specification for the given family spec <- family_specs[[family]] if (is.null(spec)) { # Default for unknown families spec <- list( param_names = character(0), labels = character(0) ) } return(spec) } #' Parameters to monitor / extract depending on the observation family #' (deprecated Use family_param_info() instead) #' @noRd family_par_names = function(family) { # Maintain backward compatibility by extracting param_names from new function family_param_info(family)$param_names } #' Define which parameters to monitor / extract #' @noRd extract_family_pars = function(object, newdata = NULL) { # Get names of parameters to extract pars_to_extract <- family_param_info(object$family)$param_names # Extract into a named list if (length(pars_to_extract) > 0) { out <- vector(mode = 'list') for (i in 1:length(pars_to_extract)) { out[[i]] <- mcmc_chains(object$model_output, params = pars_to_extract[i]) if (NCOL(out[[i]]) == 1) { out[[i]] <- as.vector(out[[i]]) } } } else { out <- list() } names(out) <- pars_to_extract # Return list of extracted posterior parameter samples out } #' Family-specific prior information #' @noRd family_prior_info = function(family, use_stan, data) { if (family == 'gaussian') { prior_df <- data.frame( param_name = c('vector[n_series] sigma_obs;'), param_length = length(unique(data$series)), param_info = c('observation error sd'), prior = c('sigma_obs ~ student_t(3, 0, 2);'), example_change = c( paste0( 'sigma_obs ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'lognormal') { prior_df <- data.frame( param_name = c('vector[n_series] sigma_obs;'), param_length = length(unique(data$series)), param_info = c('log(observation error sd)'), prior = c('sigma_obs ~ student_t(3, 0, 1);'), example_change = c( paste0( 'sigma_obs ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'student') { prior_df <- data.frame( param_name = c( 'vector[n_series] sigma_obs;', 'vector[n_series] nu;' ), param_length = rep(length(unique(data$series)), 2), param_info = c('observation error sd', 'observation degrees of freedom'), prior = c('sigma_obs ~ student_t(3, 0, 2);', 'nu ~ gamma(2, 0.1);'), example_change = c( paste0( 'sigma_obs ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'nu ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'beta') { prior_df <- data.frame( param_name = c('vector[n_series] phi;'), param_length = length(unique(data$series)), param_info = c('Beta precision parameter'), prior = c('phi ~ gamma(0.01, 0.01);'), example_change = c( paste0( 'phi ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'beta_binomial') { prior_df <- data.frame( param_name = c('vector[n_series] phi;'), param_length = length(unique(data$series)), param_info = c('Beta Binomial precision parameter'), prior = c('phi ~ gamma(0.01, 0.01);'), example_change = c( paste0( 'phi ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'Gamma') { prior_df <- data.frame( param_name = c('vector[n_series] shape;'), param_length = length(unique(data$series)), param_info = c('Gamma shape parameter'), prior = c('shape ~ gamma(0.01, 0.01);'), example_change = c( paste0( 'shape ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (family == 'negative binomial') { if (use_stan) { prior_df <- data.frame( param_name = c('vector[n_series] phi_inv;'), param_length = length(unique(data$series)), param_info = c('inverse of NB dispsersion'), prior = c('phi_inv ~ student_t(3, 0, 0.1);'), example_change = c( paste0( 'phi_inv ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } else { prior_df <- data.frame( param_name = c('vector[n_series] phi_inv;'), param_length = length(unique(data$series)), param_info = c('inverse of NB dispsersion'), prior = c('phi_inv[s] ~ dexp(5)'), example_change = c( paste0( 'phi_inv[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')T(0, )' ) ) ) } } if (family == 'tweedie') { prior_df <- data.frame( param_name = c('vector[n_series] phi_raw;'), param_length = length(unique(data$series)), param_info = c('log of Tweedie dispsersion (for each series s)'), prior = c('phi_raw[s] ~ dnorm(0, 2)T(-3.5, 3.5)'), example_change = c( paste0( 'phi_raw[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.5, max = 5, n = 1), 2), ')' ) ) ) } if (family %in% c('nmix', 'poisson', 'binomial', 'bernoulli')) { prior_df <- NULL } return(prior_df) } #' Family-specific Dunn-Smyth residual functions #' @noRd ds_resids_nmix = function(truth, fitted, draw, p, N) { na_obs <- is.na(truth) a_obs <- pbinom( ifelse( as.vector(truth[!na_obs]) - 1.e-6 > 0, as.vector(truth[!na_obs]) - 1.e-6, 0 ), size = N[!na_obs], prob = p[!na_obs] ) b_obs <- pbinom( as.vector(truth[!na_obs]), size = N[!na_obs], prob = p[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_binomial = function(truth, fitted, draw, N) { na_obs <- is.na(truth) a_obs <- pbinom( ifelse( as.vector(truth[!na_obs]) - 1.e-6 > 0, as.vector(truth[!na_obs]) - 1.e-6, 0 ), size = N[!na_obs], prob = fitted[!na_obs] ) b_obs <- pbinom( as.vector(truth[!na_obs]), size = N[!na_obs], prob = fitted[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_beta_binomial = function(truth, fitted, draw, N, phi) { na_obs <- is.na(truth) a_obs <- pbeta_binomial( ifelse( as.vector(truth[!na_obs]) - 1.e-6 > 0, as.vector(truth[!na_obs]) - 1.e-6, 0 ), size = N[!na_obs], mu = fitted[!na_obs], phi = phi[!na_obs] ) b_obs <- pbeta_binomial( ifelse(as.vector(truth[!na_obs]), as.vector(truth[!na_obs]), 0), size = N[!na_obs], mu = fitted[!na_obs], phi = phi[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_nb = function(truth, fitted, draw, size) { na_obs <- is.na(truth) p <- size[!na_obs] / (fitted[!na_obs] + size[!na_obs]) a_obs <- ifelse( as.vector(truth[!na_obs]) > 0, pbeta(p, size[!na_obs], pmax(as.vector(truth[!na_obs]), 1)), 0 ) b_obs <- pbeta(p, size[!na_obs], as.vector(truth[!na_obs]) + 1) u_obs <- runif(n = length(truth[!na_obs]), min = a_obs, max = b_obs) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_beta = function(truth, fitted, draw, precision) { shape_pars <- beta_shapes(mu = fitted, phi = precision) na_obs <- is.na(truth) a_obs <- pbeta( as.vector(truth[!na_obs]) - 1.e-6, shape1 = shape_pars$shape1[!na_obs], shape2 = shape_pars$shape2[!na_obs] ) b_obs <- pbeta( as.vector(truth[!na_obs]), shape1 = shape_pars$shape1[!na_obs], shape2 = shape_pars$shape2[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_pois = function(truth, fitted, draw) { na_obs <- is.na(truth) a_obs <- ppois(as.vector(truth[!na_obs]) - 1.e-6, lambda = fitted[!na_obs]) b_obs <- ppois(as.vector(truth[!na_obs]), lambda = fitted[!na_obs]) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_tw = function(truth, fitted, draw) { na_obs <- is.na(truth) a_obs <- ppois(as.vector(truth[!na_obs]) - 1.e-6, lambda = fitted[!na_obs]) b_obs <- ppois(as.vector(truth[!na_obs]), lambda = fitted[!na_obs]) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_gaus = function(truth, fitted, sigma, draw) { na_obs <- is.na(truth) a_obs <- pnorm( as.vector(truth[!na_obs]) - 1.e-6, mean = fitted[!na_obs], sd = sigma ) b_obs <- pnorm(as.vector(truth[!na_obs]), mean = fitted[!na_obs], sd = sigma) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_lnorm = function(truth, fitted, sigma, draw) { na_obs <- is.na(truth) a_obs <- plnorm( as.vector(truth[!na_obs]) - 1.e-6, meanlog = log(fitted[!na_obs]), sdlog = sigma[!na_obs] ) b_obs <- plnorm( as.vector(truth[!na_obs]), meanlog = log(fitted[!na_obs]), sdlog = sigma[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_gamma = function(truth, fitted, shape, draw) { na_obs <- is.na(truth) a_obs <- pgamma( as.vector(truth[!na_obs]) - 1.e-6, shape = shape[!na_obs], rate = shape[!na_obs] / fitted[!na_obs] ) b_obs <- pgamma( as.vector(truth[!na_obs]), shape = shape[!na_obs], rate = shape[!na_obs] / fitted[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #' @noRd ds_resids_student = function(truth, fitted, sigma, nu, draw) { na_obs <- is.na(truth) a_obs <- pstudent_t( as.vector(truth[!na_obs]) - 1, df = nu[!na_obs], mu = fitted[!na_obs], sigma = sigma[!na_obs] ) b_obs <- pstudent_t( as.vector(truth[!na_obs]), df = nu[!na_obs], mu = fitted[!na_obs], sigma = sigma[!na_obs] ) u_obs <- runif( n = length(truth[!na_obs]), min = pmin(a_obs, b_obs), max = pmax(a_obs, b_obs) ) if (any(is.na(truth))) { u <- vector(length = length(truth)) u[na_obs] <- NaN u[!na_obs] <- u_obs } else { u <- u_obs } dsres_out <- qnorm(u) dsres_out[is.infinite(dsres_out)] <- NaN dsres_out } #'Residual calculations for a forecast #' @noRd get_forecast_resids = function( object, series, truth, preds, family, sample_seq ) { if (family == 'poisson') { series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_pois( truth = truth, fitted = preds[x, ], draw = preds[x, ] )) }) ) } if (family == 'negative binomial') { size <- mcmc_chains(object$model_output, 'phi') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_nb( truth = truth, fitted = preds[x, ], draw = preds[x, ], size = size[x] )) }) ) } if (family == 'gaussian') { sigma <- mcmc_chains(object$model_output, 'sigma_obs') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_gaus( truth = truth, fitted = preds[x, ], draw = preds[x, ], sigma = sigma[x] )) }) ) } if (family == 'lognormal') { sigma <- mcmc_chains(object$model_output, 'sigma_obs') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_lnorm( truth = truth, fitted = preds[x, ], draw = preds[x, ], sigma = sigma[x] )) }) ) } if (family == 'student') { sigma <- mcmc_chains(object$model_output, 'sigma_obs') nu <- mcmc_chains(object$model_output, 'nu') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_student( truth = truth, fitted = preds[x, ], draw = preds[x, ], sigma = sigma[x], nu = nu[x] )) }) ) } if (family == 'beta') { precision <- mcmc_chains(object$model_output, 'phi') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_beta( truth = truth, fitted = preds[x, ], draw = preds[x, ], precision = precision[x] )) }) ) } if (family == 'Gamma') { shapes <- mcmc_chains(object$model_output, 'shape') series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_gamma( truth = truth, fitted = preds[x, ], draw = preds[x, ], shape = shapes[x] )) }) ) } if (family == 'tweedie') { series_residuals <- do.call( rbind, lapply(sample_seq, function(x) { suppressWarnings(ds_resids_tw( truth = truth, fitted = preds[x, ], draw = preds[x, ] )) }) ) } return(series_residuals) } #' #'Residual calculations for a fitted mvgam object #' @noRd dsresids_vec = function(object) { family <- object$family obs_series <- object$obs_data$series series_levels <- levels(obs_series) fit_engine <- object$fit_engine # Need to know which series each observation belongs to so we can # pull out appropriate family-level parameters (overdispersions, shapes, etc...) all_dat <- data.frame( series = object$obs_data$series, time = object$obs_data$index..time..index, y = object$obs_data$y ) %>% dplyr::arrange(time, series) truth <- all_dat$y last_train <- NROW(all_dat) series_obs <- as.numeric(all_dat$series) # Extract expectations and necessary generated quantities # and subset to only include training data preds <- posterior_epred(object)[, 1:last_train, drop = FALSE] if (family == 'nmix') { p <- mcmc_chains(object$model_output, 'detprob')[, 1:last_train, drop = FALSE ] N <- mcmc_chains(object$model_output, 'latent_ypred')[, 1:last_train, drop = FALSE ] } if (family %in% c('binomial', 'beta_binomial')) { p <- plogis(mcmc_chains(object$model_output, 'mus')[, 1:last_train, drop = FALSE ]) N <- as.vector(attr(object$mgcv_model, 'trials'))[1:length(truth)] } # Family-specific parameters family_pars <- extract_family_pars(object = object) n_series <- NCOL(object$ytimes) # Family parameters spread into a vector family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, series_obs]) } else { as.vector(matrix( rep(family_pars[[j]], NCOL(preds)), nrow = NROW(preds), byrow = FALSE )) } }) names(family_extracts) <- names(family_pars) # Create a truth matrix for vectorised residual computation truth_mat <- matrix(rep(truth, NROW(preds)), nrow = NROW(preds), byrow = TRUE) # Calculate DS residual distributions if (family == 'gaussian') { resids <- matrix( ds_resids_gaus( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, sigma = family_extracts$sigma_obs ), nrow = NROW(preds) ) } if (family == 'binomial') { N_mat <- matrix(rep(N, NROW(preds)), nrow = NROW(preds), byrow = TRUE) resids <- matrix( ds_resids_binomial( truth = as.vector(truth_mat), fitted = as.vector(p), draw = 1, N = as.vector(N_mat) ), nrow = NROW(preds) ) } if (family == 'beta_binomial') { N_mat <- matrix(rep(N, NROW(preds)), nrow = NROW(preds), byrow = TRUE) resids <- matrix( ds_resids_beta_binomial( truth = as.vector(truth_mat), fitted = as.vector(p), draw = 1, N = as.vector(N_mat), phi = family_extracts$phi ), nrow = NROW(preds) ) } if (family == 'bernoulli') { resids <- matrix( ds_resids_binomial( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, N = rep(1, length(truth_mat)) ), nrow = NROW(preds) ) } if (family == 'nmix') { resids <- matrix( ds_resids_nmix( truth = as.vector(truth_mat), fitted = 1, draw = 1, N = as.vector(N), p = as.vector(p) ), nrow = NROW(preds) ) } if (family == 'student') { resids <- matrix( ds_resids_student( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, sigma = family_extracts$sigma_obs, nu = family_extracts$nu ), nrow = NROW(preds) ) } if (family == 'lognormal') { resids <- matrix( ds_resids_lnorm( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, sigma = family_extracts$sigma_obs ), nrow = NROW(preds) ) } if (family == 'poisson') { resids <- matrix( ds_resids_pois( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1 ), nrow = NROW(preds) ) } if (family == 'beta') { resids <- matrix( ds_resids_beta( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, precision = family_extracts$phi ), nrow = NROW(preds) ) } if (family == 'Gamma') { resids <- matrix( ds_resids_gamma( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, shape = family_extracts$shape ), nrow = NROW(preds) ) } if (family == 'negative binomial') { resids <- matrix( ds_resids_nb( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1, size = family_extracts$phi ), nrow = NROW(preds) ) } if (family == 'tweedie') { resids <- matrix( ds_resids_tw( truth = as.vector(truth_mat), fitted = as.vector(preds), draw = 1 ), nrow = NROW(preds) ) } # Convert to a list of series-level matrices and return series_resids <- lapply(seq_len(n_series), function(series) { inds_keep <- which(series_obs == series) resids[, inds_keep] }) names(series_resids) <- levels(all_dat$series) return(series_resids) } ================================================ FILE: R/fevd.mvgam.R ================================================ #' Calculate latent VAR forecast error variance decompositions #' #' Compute forecast error variance decompositions from #' \code{mvgam} models with Vector Autoregressive dynamics #' #' @name fevd.mvgam #' #' @param object \code{list} object of class \code{mvgam} resulting from a call to [mvgam()] #' that used a Vector Autoregressive latent process model (either as `VAR(cor = FALSE)` or #' `VAR(cor = TRUE)`; see [VAR()] for details) #' #' @param h Positive \code{integer} specifying the forecast horizon over which to calculate #' the IRF #' #' @param ... ignored #' #' @return See \code{\link{mvgam_fevd-class}} for a full description of the quantities that are #' computed and returned by this function, along with key references. #' #' @author Nicholas J Clark #' #' @seealso [VAR()], [irf()], [stability()], \code{\link{mvgam_fevd-class}} #' #' @references Lütkepohl, H. (2007). #' New Introduction to Multiple Time Series Analysis. 2nd ed. Springer-Verlag Berlin Heidelberg. #' #' @examples #' \dontrun{ #' # Simulate some time series that follow a latent VAR(1) process #' simdat <- sim_mvgam( #' family = gaussian(), #' n_series = 4, #' trend_model = VAR(cor = TRUE), #' prop_trend = 1 #' ) #' plot_mvgam_series(data = simdat$data_train, series = "all") #' #' # Fit a model that uses a latent VAR(1) #' mod <- mvgam( #' formula = y ~ -1, #' trend_formula = ~ 1, #' trend_model = VAR(cor = TRUE), #' family = gaussian(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Plot the autoregressive coefficient distributions; #' # use 'dir = "v"' to arrange the order of facets #' # correctly #' mcmc_plot( #' mod, #' variable = 'A', #' regex = TRUE, #' type = 'hist', #' facet_args = list(dir = 'v') #' ) #' #' # Calulate forecast error variance decompositions for each series #' fevds <- fevd(mod, h = 12) #' #' # Plot median contributions to forecast error variance #' plot(fevds) #' #' # View a summary of the error variance decompositions #' summary(fevds) #' } #' @export fevd <- function(object, ...) { UseMethod("fevd", object) } #' @rdname fevd.mvgam #' @method fevd mvgam #' @export fevd.mvgam <- function(object, h = 10, ...) { validate_pos_integer(h) trend_model <- attr(object$model_data, "trend_model") if (!trend_model %in% c("VAR", "VARcor", "VAR1", "VAR1cor")) { stop( "Only VAR(1) models currently supported for calculating FEVDs", call. = FALSE ) } beta_vars <- mcmc_chains(object$model_output, "A") sigmas <- mcmc_chains(object$model_output, "Sigma") n_series <- object$n_lv if (is.null(n_series)) { n_series <- nlevels(object$obs_data$series) } all_fevds <- lapply(seq_len(NROW(beta_vars)), function(draw) { # Get necessary VAR parameters into a simple list format x <- list( K = n_series, A = matrix( beta_vars[draw, ], nrow = n_series, ncol = n_series, byrow = TRUE ), Sigma = matrix( sigmas[draw, ], nrow = n_series, ncol = n_series, byrow = TRUE ), p = 1 ) # Calculate the FEVD for this draw gen_fevd(x, h = h) }) class(all_fevds) <- "mvgam_fevd" return(all_fevds) } #### Functions to compute forecast error variance decompositions # Much of this code is modified from R code generously provided in the vars # package https://github.com/cran/vars #### #' Forecast error variance decomposition #' @noRd gen_fevd <- function(x, h = 6, ...) { K <- x$K ynames <- paste0("process_", 1:K) msey <- var_fecov(x, h = h) Psi <- var_psi(x, h = h) mse <- matrix(NA, nrow = h, ncol = K) Omega <- array(0, dim = c(h, K, K)) for (i in 1:h) { mse[i, ] <- diag(msey[,, i]) temp <- matrix(0, K, K) for (l in 1:K) { for (m in 1:K) { for (j in 1:i) { temp[l, m] <- temp[l, m] + Psi[l, m, j]^2 } } } temp <- temp / mse[i, ] for (j in 1:K) { Omega[i, , j] <- temp[j, ] } } result <- list() for (i in 1:K) { result[[i]] <- matrix(Omega[,, i], nrow = h, ncol = K) colnames(result[[i]]) <- ynames } names(result) <- ynames return(result) } #' Forecast error covariance matrix #' @noRd var_fecov <- function(x, h) { sigma_yh <- array(NA, dim = c(x$K, x$K, h)) Phi <- var_phi(x, h = h) sigma_yh[,, 1] <- Phi[,, 1] %*% t(Phi[,, 1]) if (h > 1) { for (i in 2:h) { temp <- matrix(0, nrow = x$K, ncol = x$K) for (j in 2:i) { temp <- temp + Phi[,, j] %*% t(Phi[,, j]) } sigma_yh[,, i] <- temp + sigma_yh[,, 1] } } return(sigma_yh) } ================================================ FILE: R/forecast.mvgam.R ================================================ #' @importFrom generics forecast #' @export generics::forecast #' @title Extract or compute hindcasts and forecasts for a fitted #' \code{mvgam} object #' #' @name forecast.mvgam #' #' @method forecast mvgam #' #' @importFrom stats predict #' #' @importFrom rlang missing_arg #' #' @inheritParams predict.mvgam #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing the same variables that were included in the original `data` #' used to fit the model. If included, the covariate information in #' \code{newdata} will be used to generate forecasts from the fitted model #' equations. If this same \code{newdata} was originally included in the call #' to \code{mvgam}, then forecasts have already been produced by the #' generative model and these will simply be extracted and plotted. However #' if no \code{newdata} was supplied to the original model call, an #' assumption is made that the \code{newdata} supplied here comes #' sequentially after the data supplied in the original model (i.e. we #' assume there is no time gap between the last observation of series 1 in #' the original data and the first observation for series 1 in #' \code{newdata}) #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param n_cores Deprecated. Parallel processing is no longer supported #' #' @param ... Ignored #' #' @details Posterior predictions are drawn from the fitted \code{mvgam} and #' used to simulate a forecast distribution #' #' @return An object of class \code{mvgam_forecast} containing hindcast and #' forecast distributions. See \code{\link{mvgam_forecast-class}} for #' details. #' #' @seealso [hindcast.mvgam()], [plot.mvgam_forecast()], #' [summary.mvgam_forecast()], [score.mvgam_forecast()] #' [ensemble.mvgam_forecast()] #' #' @examples #' \dontrun{ #' # Simulate data with 3 series and AR trend model #' simdat <- sim_mvgam(n_series = 3, trend_model = AR()) #' #' # Fit mvgam model #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Hindcasts on response scale #' hc <- hindcast(mod) #' str(hc) #' #' # Use summary() to extract hindcasts / forecasts for custom plotting #' head(summary(hc), 12) #' #' # Or just use the plot() function for quick plots #' plot(hc, series = 1) #' plot(hc, series = 2) #' plot(hc, series = 3) #' #' # Forecasts on response scale #' fc <- forecast( #' mod, #' newdata = simdat$data_test #' ) #' str(fc) #' head(summary(fc), 12) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' #' # Forecasts as expectations #' fc <- forecast( #' mod, #' newdata = simdat$data_test, #' type = 'expected' #' ) #' head(summary(fc), 12) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' #' # Dynamic trend extrapolations #' fc <- forecast( #' mod, #' newdata = simdat$data_test, #' type = 'trend' #' ) #' head(summary(fc), 12) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' } #' #' @export forecast.mvgam = function( object, newdata, data_test, n_cores = 1, type = 'response', ... ) { # Check arguments validate_pos_integer(n_cores) if (n_cores > 1L) { message('argument "n_cores" is deprecated') } if (!missing("newdata")) { data_test <- newdata } if (missing("newdata") & missing(data_test) & is.null(object$test_data)) { stop('newdata must be supplied to compute forecasts', call. = FALSE) } type <- match.arg( arg = type, choices = c( "link", "response", "trend", "expected", "detection", "latent_N" ) ) if (inherits(object, 'jsdgam')) { orig_trend_model <- attr(object$model_data, 'prepped_trend_model') } else { orig_trend_model <- object$trend_model } data_train <- validate_series_time( object$obs_data, trend_model = orig_trend_model ) n_series <- NCOL(object$ytimes) # Check whether a forecast has already been computed forecasts_exist <- FALSE if (!is.null(object$test_data) && !missing(data_test)) { object$test_data <- validate_series_time( object$test_data, trend_model = orig_trend_model ) data_test <- validate_series_time(data_test, trend_model = orig_trend_model) if ( max(data_test$index..time..index) <= max(object$test_data$index..time..index) ) { forecasts_exist <- TRUE } else { data.frame(time = data_test$time) %>% dplyr::mutate(rowid = dplyr::row_number()) %>% dplyr::filter(time > max(object$test_data$index..time..index)) %>% dplyr::pull(rowid) -> idx if (inherits(data_test, 'list')) { data_arranged <- data_test data_arranged <- lapply(data_test, function(x) { if (is.matrix(x)) { matrix(x[idx, ], ncol = NCOL(x)) } else { x[idx] } }) names(data_arranged) <- names(data_test) data_test <- data_arranged } else { data_test <- data_test[idx, ] } } } if (!is.null(object$test_data) && missing(data_test)) { forecasts_exist <- TRUE } if (is.null(object$test_data)) { data_test <- validate_series_time( data_test, name = 'newdata', trend_model = orig_trend_model ) data.frame(series = object$obs_data$series, time = object$obs_data$time) %>% dplyr::group_by(series) %>% dplyr::summarise(maxt = max(time)) -> series_max_ts data.frame(series = data_test$series, time = data_test$time) %>% dplyr::mutate(orig_rows = dplyr::row_number()) %>% dplyr::left_join(series_max_ts, by = 'series') %>% dplyr::filter(time > maxt) %>% dplyr::pull(orig_rows) -> idx if (inherits(data_test, 'list')) { data_arranged <- data_test data_arranged <- lapply(data_test, function(x) { if (is.matrix(x)) { matrix(x[idx, ], ncol = NCOL(x)) } else { x[idx] } }) names(data_arranged) <- names(data_test) data_test <- data_arranged } else { data_test <- data_test[idx, ] } object$test_data <- data_test } # Only compute forecasts if they don't already exist! if (!forecasts_exist) { resp_terms <- as.character(terms(formula(object))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(formula(object))[[2]]) } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] out_name <- resp_terms[1] } } if (out_name != 'y') { data_test$y <- data_test[[out_name]] } if (!missing(data_test)) { if (!'y' %in% names(data_test)) { data_test$y <- rep(NA, NROW(data_test)) } data_test <- validate_series_time( data_test, name = 'newdata', trend_model = orig_trend_model ) } # Generate draw-specific forecasts fc_preds <- forecast_draws( object = object, type = type, series = 'all', data_test = data_test, n_cores = n_cores, ... ) # Extract forecasts into the correct format series_fcs <- lapply(seq_len(n_series), function(series) { indexed_forecasts <- do.call( rbind, lapply(seq_along(fc_preds), function(x) { fc_preds[[x]][[series]] }) ) indexed_forecasts }) names(series_fcs) <- levels(data_test$series) # Extract hindcasts data_train <- validate_series_time( object$obs_data, trend_model = orig_trend_model ) ends <- seq( 0, dim(mcmc_chains(object$model_output, 'ypred'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] series_hcs <- lapply(seq_len(n_series), function(series) { to_extract <- switch( type, 'link' = 'mus', 'expected' = 'mus', 'response' = 'ypred', 'trend' = 'trend', 'latent_N' = 'mus', 'detection' = 'mus' ) if ( object$family == 'nmix' & type == 'link' ) { to_extract <- 'trend' } if (object$fit_engine == 'stan') { preds <- mcmc_chains(object$model_output, to_extract)[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ] } else { preds <- mcmc_chains(object$model_output, to_extract)[, starts[series]:ends[series], drop = FALSE ] } if ( object$family == 'nmix' & type == 'link' ) { preds <- exp(preds) } if (type %in% c('expected', 'latent_N', 'detection')) { # Compute expectations as one long vector Xpmat <- matrix(as.vector(preds)) attr(Xpmat, 'model.offset') <- 0 family_pars <- extract_family_pars(object = object) par_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(matrix( rep(as.vector(family_pars[[j]][, series]), NCOL(preds)), nrow = NROW(preds), byrow = FALSE )) } else { as.vector(matrix( rep(family_pars[[j]], NCOL(preds)), nrow = NROW(preds), byrow = FALSE )) } }) names(par_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep( as.vector(attr(object$mgcv_model, 'trials')[, series]), NROW(preds) ), nrow = NROW(preds), byrow = TRUE )) par_extracts$trials <- trials } if (object$family == 'nmix') { preds <- mcmc_chains(object$model_output, 'detprob')[, object$ytimes[, series], drop = FALSE ] Xpmat <- matrix(qlogis(as.vector(preds))) attr(Xpmat, 'model.offset') <- 0 latent_lambdas <- as.vector(mcmc_chains( object$model_output, 'trend' )[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ]) latent_lambdas <- exp(latent_lambdas) n_draws <- dim(mcmc_chains(object$model_output, 'ypred'))[1] cap <- as.vector(t(replicate( n_draws, object$obs_data$cap[which( as.numeric(object$obs_data$series) == series )] ))) } else { latent_lambdas <- NULL cap <- NULL } if (type == 'latent_N') { preds <- mcmc_chains(object$model_output, 'latent_ypred')[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ] } else { preds <- matrix( as.vector(mvgam_predict( family = object$family, Xp = Xpmat, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = 1, family_pars = par_extracts )), nrow = NROW(preds) ) } } preds }) names(series_hcs) <- levels(data_test$series) # Extract observations series_obs <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = object$obs_data$series, time = object$obs_data$index..time..index, y = object$obs_data$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(y) }) names(series_obs) <- levels(data_test$series) series_test <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = data_test$series, time = data_test$index..time..index, y = data_test$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(y) }) names(series_test) <- levels(data_test$series) } else { # If forecasts already exist, simply extract them data_test <- validate_series_time( object$test_data, trend_model = orig_trend_model ) last_train <- max(object$obs_data$index..time..index) - (min(object$obs_data$index..time..index) - 1) data_train <- validate_series_time( object$obs_data, trend_model = orig_trend_model ) ends <- seq( 0, dim(mcmc_chains(object$model_output, 'ypred'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] series_fcs <- lapply(seq_len(n_series), function(series) { to_extract <- switch( type, 'link' = 'mus', 'expected' = 'mus', 'response' = 'ypred', 'trend' = 'trend', 'latent_N' = 'mus', 'detection' = 'mus' ) if ( object$family == 'nmix' & type == 'link' ) { to_extract <- 'trend' } if (object$fit_engine == 'stan') { preds <- mcmc_chains(object$model_output, to_extract)[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ] } else { preds <- mcmc_chains(object$model_output, to_extract)[, starts[series]:ends[series], drop = FALSE ] } if ( object$family == 'nmix' & type == 'link' ) { preds <- exp(preds) } if (type %in% c('expected', 'latent_N', 'detection')) { # Compute expectations as one long vector Xpmat <- matrix(as.vector(preds)) attr(Xpmat, 'model.offset') <- 0 family_pars <- extract_family_pars(object = object) par_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { family_pars[[j]][, series] } else { family_pars[[j]] } }) names(par_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep( as.vector(attr(object$mgcv_model, 'trials')[, series]), NROW(mus) ), nrow = NROW(mus), byrow = TRUE )) par_extracts$trials <- trials } if (object$family == 'nmix') { preds <- mcmc_chains(object$model_output, 'detprob')[, object$ytimes[, series], drop = FALSE ] Xpmat <- matrix(qlogis(as.vector(preds))) attr(Xpmat, 'model.offset') <- 0 n_draws <- dim(mcmc_chains(object$model_output, 'ypred'))[1] n_cols <- dim(mcmc_chains(object$model_output, 'ypred'))[2] latent_lambdas <- as.vector(mcmc_chains( object$model_output, 'trend' )[, seq(series, n_cols, by = NCOL(object$ytimes)), drop = FALSE]) latent_lambdas <- exp(latent_lambdas) cap <- as.vector(t(replicate( n_draws, c( object$obs_data$cap[which( as.numeric(object$obs_data$series) == series )], object$test_data$cap[which( as.numeric(object$test_data$series) == series )] ) ))) } else { latent_lambdas <- NULL cap <- NULL } if (type == 'latent_N') { preds <- mcmc_chains(object$model_output, 'latent_ypred')[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ] } else { preds <- matrix( as.vector(mvgam_predict( family = object$family, Xp = Xpmat, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = 1, family_pars = par_extracts )), nrow = NROW(preds) ) } } preds[, (last_train + 1):NCOL(preds)] }) names(series_fcs) <- levels(data_train$series) # Extract hindcasts for storing in the returned object series_hcs <- lapply(seq_len(n_series), function(series) { to_extract <- switch( type, 'link' = 'mus', 'expected' = 'mus', 'response' = 'ypred', 'trend' = 'trend', 'latent_N' = 'mus', 'detection' = 'mus' ) if ( object$family == 'nmix' & type == 'link' ) { to_extract <- 'trend' } if (object$fit_engine == 'stan') { preds <- mcmc_chains(object$model_output, to_extract)[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ][, 1:last_train] } else { preds <- mcmc_chains(object$model_output, to_extract)[, starts[series]:ends[series], drop = FALSE ][, 1:last_train] } if ( object$family == 'nmix' & type == 'link' ) { preds <- exp(preds) } if (type %in% c('expected', 'latent_N', 'detection')) { # Compute expectations as one long vector Xpmat <- matrix(as.vector(preds)) attr(Xpmat, 'model.offset') <- 0 family_pars <- extract_family_pars(object = object) par_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { family_pars[[j]][, series] } else { family_pars[[j]] } }) names(par_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep( as.vector(attr(object$mgcv_model, 'trials')[ 1:last_train, series ]), NROW(mus) ), nrow = NROW(mus), byrow = TRUE )) par_extracts$trials <- trials } if (object$family == 'nmix') { preds <- mcmc_chains(object$model_output, 'detprob')[, object$ytimes[1:last_train, series], drop = FALSE ] Xpmat <- matrix(qlogis(as.vector(preds))) attr(Xpmat, 'model.offset') <- 0 n_draws <- dim(mcmc_chains(object$model_output, 'ypred'))[1] n_cols <- dim(mcmc_chains(object$model_output, 'ypred'))[2] latent_lambdas <- as.vector(mcmc_chains( object$model_output, 'trend' )[, seq(series, n_cols, by = NCOL(object$ytimes)), drop = FALSE][, 1:last_train ]) latent_lambdas <- exp(latent_lambdas) cap <- as.vector(t(replicate( n_draws, object$obs_data$cap[which( as.numeric(object$test_data$series) == series )] ))) } else { latent_lambdas <- NULL cap <- NULL } if (type == 'latent_N') { preds <- mcmc_chains(object$model_output, 'latent_ypred')[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ][, 1:last_train] } else { preds <- matrix( as.vector(mvgam_predict( family = object$family, Xp = Xpmat, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = 1, family_pars = par_extracts )), nrow = NROW(preds) ) } } preds }) names(series_hcs) <- levels(data_train$series) series_obs <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = object$obs_data$series, time = object$obs_data$index..time..index, y = object$obs_data$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(y) }) names(series_obs) <- levels(data_train$series) series_test <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = object$test_data$series, time = object$test_data$index..time..index, y = object$test_data$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(y) }) names(series_test) <- levels(data_train$series) } series_train_times <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = object$obs_data$series, time = object$obs_data$time, y = object$obs_data$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(time) }) names(series_train_times) <- levels(data_train$series) series_test_times <- lapply(seq_len(n_series), function(series) { s_name <- levels(object$obs_data$series)[series] data.frame( series = data_test$series, time = data_test$time ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(time) }) names(series_test_times) <- levels(data_train$series) series_fcs <- structure( list( call = object$call, trend_call = object$trend_call, family = object$family, family_pars = if (type == 'link') { extract_family_pars(object = object) } else { NULL }, trend_model = object$trend_model, drift = object$drift, use_lv = object$use_lv, fit_engine = object$fit_engine, type = type, series_names = factor( levels(data_train$series), levels = levels(data_train$series) ), train_observations = series_obs, train_times = series_train_times, test_observations = series_test, test_times = series_test_times, hindcasts = series_hcs, forecasts = series_fcs ), class = 'mvgam_forecast' ) return(series_fcs) } #'Compute forecasts using a posterior distribution #'@noRd forecast_draws = function( object, type = 'response', series = 'all', data_test, n_cores = 1, n_samples, ending_time, b_uncertainty = TRUE, trend_uncertainty = TRUE, obs_uncertainty = TRUE ) { # Check arguments validate_pos_integer(n_cores) if (inherits(object, 'jsdgam')) { orig_trend_model <- attr(object$model_data, 'prepped_trend_model') } else { orig_trend_model <- object$trend_model } data_test <- validate_series_time( data_test, name = 'newdata', trend_model = orig_trend_model ) data_test <- sort_data(data_test) n_series <- NCOL(object$ytimes) use_lv <- object$use_lv if (series != 'all') { s_name <- levels(data_test$series)[series] } # Generate the observation model linear predictor matrix, # ensuring the test data is sorted correctly (by time and then series) if (inherits(data_test, 'list')) { Xp <- obs_Xp_matrix( newdata = sort_data(data_test), mgcv_model = object$mgcv_model ) if (series != 'all') { obs_keep <- data.frame( y = data_test$y, series = data_test$series, time = data_test$index..time..index, rowid = 1:length(data_test$y) ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(rowid) series_test <- data.frame( y = data_test$y, series = data_test$series, time = data_test$index..time..index, rowid = 1:length(data_test$y) ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) Xp <- Xp[obs_keep, ] } else { series_test <- NULL } } else { if (series != 'all') { series_test <- data_test %>% dplyr::filter(series == s_name) %>% dplyr::arrange(index..time..index) Xp <- obs_Xp_matrix(newdata = series_test, mgcv_model = object$mgcv_model) } else { Xp <- obs_Xp_matrix( newdata = sort_data(data_test), mgcv_model = object$mgcv_model ) series_test <- NULL } } # Generate linear predictor matrix from trend mgcv model, ensuring # the test data is sorted correctly (by time and then series) if (!is.null(object$trend_call)) { Xp_trend <- trend_Xp_matrix( newdata = sort_data(data_test), trend_map = object$trend_map, series = series, mgcv_model = object$trend_mgcv_model, forecast = TRUE ) # For trend_formula models with autoregressive processes, # the process model operates as: AR * (process[t - 1] - mu[t-1]]) # We therefore need the values of mu at the end of the training set # to correctly propagate the process model forward if (use_lv & attr(object$model_data, 'trend_model') != 'GP') { # Get the observed trend predictor matrix newdata <- trend_map_data_prep( object$obs_data, object$trend_map, forecast = TRUE ) Xp_trend_last <- predict( object$trend_mgcv_model, newdata = newdata, type = 'lpmatrix' ) # Ensure the last three values are used, in case the obs_data # was not supplied in order data.frame( time = newdata$index..time..index, series = newdata$series, row_id = 1:length(newdata$index..time..index) ) %>% dplyr::arrange(time, series) %>% dplyr::pull(row_id) -> sorted_inds n_processes <- length(unique(object$trend_map$trend)) linpred_order <- tail(sorted_inds, 3 * n_processes) # Deal with any offsets if (!all(attr(Xp_trend_last, 'model.offset') == 0)) { offset_vec <- attr(Xp_trend_last, 'model.offset') offset_last <- offset_vec[linpred_order] offset_last[is.na(offset_last)] <- 0 full_offset <- c(offset_last, attr(Xp_trend, 'model.offset')) } else { full_offset <- 0 } # Bind the last 3 linpred rows with the forecast linpred rows Xp_trend <- rbind(Xp_trend_last[linpred_order, , drop = FALSE], Xp_trend) attr(Xp_trend, 'model.offset') <- full_offset } } else { Xp_trend <- NULL } # No need to compute in parallel if there was no trend model nmix_notrend <- FALSE if ( !inherits(orig_trend_model, 'mvgam_trend') & object$family == 'nmix' ) { nmix_notrend <- TRUE } if ( attr(object$model_data, 'trend_model') == 'None' | nmix_notrend ) { if (type == 'trend' & !nmix_notrend & !use_lv) { stop('No trend_model was used in this model', call. = FALSE) } all_preds <- predict( object, type = type, newdata = data_test, summary = FALSE ) fc_preds <- lapply(seq_len(NROW(all_preds)), function(draw) { lapply(seq_len(n_series), function(series) { all_preds[ draw, which(data_test$series == levels(data_test$series)[series]) ] }) }) } else { # Else compute forecasts including dynamic trend components # Set forecast horizon if (series != 'all') { fc_horizon <- NROW(series_test) } else { fc_horizon <- length(unique(data_test$index..time..index)) } # Beta coefficients for GAM observation component betas <- mcmc_chains(object$model_output, 'b') # Generate sample sequence for n_samples if (missing(n_samples)) { sample_seq <- 1:dim(betas)[1] } else { if (n_samples < dim(betas)[1]) { sample_seq <- sample( seq_len(dim(betas)[1]), size = n_samples, replace = FALSE ) } else { sample_seq <- sample( seq_len(dim(betas)[1]), size = n_samples, replace = TRUE ) } } # Beta coefficients for GAM trend component if (!is.null(object$trend_call)) { betas_trend <- mcmc_chains(object$model_output, 'b_trend') } else { betas_trend <- NULL } # Family of model family <- object$family # Family-specific parameters family_pars <- extract_family_pars(object = object, newdata = data_test) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { resp_terms <- as.character(terms(formula(object$call))[[2]]) resp_terms <- resp_terms[-grepl('cbind', resp_terms)] trial_name <- resp_terms[2] if (!exists(trial_name, data_test)) { stop( paste0('Variable ', trial_name, ' not found in newdata'), call. = FALSE ) } trial_df <- data.frame( series = data_test$series, time = data_test$index..time..index, trial = data_test[[trial_name]] ) trials <- matrix(NA, nrow = fc_horizon, ncol = n_series) for (i in 1:n_series) { trials[, i] <- trial_df %>% dplyr::filter(series == levels(data_test$series)[i]) %>% dplyr::arrange(time) %>% dplyr::pull(trial) } } else { trials <- NULL } # Trend model trend_model <- attr(object$model_data, 'trend_model') # Calculate time_dis if this is a CAR1 model if (trend_model == 'CAR1') { data_test$index..time..index <- data_test$index..time..index + max(object$obs_data$index..time..index) time_dis <- add_corcar( model_data = list(), data_train = object$obs_data, data_test = data_test )[[1]] time_dis <- time_dis[-c(1:max(object$obs_data$index..time..index)), ] } else { time_dis <- NULL } # Trend-specific parameters if (missing(ending_time)) { trend_pars <- extract_trend_pars( object = object, keep_all_estimates = FALSE ) } else { trend_pars <- extract_trend_pars( object = object, keep_all_estimates = FALSE, ending_time = ending_time ) } # Any model in which an autoregressive process was included should be # considered as VAR1 for forecasting purposes as this will make use of the # faster c++ functions if (trend_model == 'CAR1') { if (!'last_lvs' %in% names(trend_pars)) { trend_pars$last_lvs <- trend_pars$last_trends } } else { if ( 'Sigma' %in% names(trend_pars) | 'sigma' %in% names(trend_pars) | 'tau' %in% names(trend_pars) ) { trend_model <- 'VAR1' if (!'last_lvs' %in% names(trend_pars)) { trend_pars$last_lvs <- trend_pars$last_trends } } } # Loop over draws and compute forecasts (in serial at the moment) fc_preds <- lapply(seq_len(dim(betas)[1]), function(i) { # Sample index samp_index <- i # Sample beta coefs if (b_uncertainty) { betas <- betas[samp_index, ] } else { betas <- betas[1, ] } if (!is.null(betas_trend)) { if (b_uncertainty) { betas_trend <- betas_trend[samp_index, ] } else { betas_trend <- betas_trend[1, ] } } # Return predictions # Sample general trend-specific parameters if (trend_uncertainty) { general_trend_pars <- extract_general_trend_pars( trend_pars = trend_pars, samp_index = samp_index ) } else { general_trend_pars <- extract_general_trend_pars( trend_pars = trend_pars, samp_index = 1 ) } if ( use_lv || trend_model %in% c('VAR1', 'PWlinear', 'PWlogistic', 'CAR1') ) { if (trend_model == 'PWlogistic') { if (!(exists('cap', where = data_test))) { stop( 'Capacities must also be supplied in "newdata" for logistic growth predictions', call. = FALSE ) } family_links <- eval(parse(text = family)) if (family_links()$family == 'Gamma') { family_links <- Gamma(link = 'log') } cap <- data.frame( series = data_test$series, time = data_test$index..time..index, cap = suppressWarnings(linkfun( data_test$cap, link = family_links()$link )) ) if (any(is.na(cap$cap)) | any(is.infinite(cap$cap))) { stop( paste0( 'Missing or infinite values found for some "cap" terms\n', 'after transforming to the ', family$link, ' link scale' ), call. = FALSE ) } } else { cap <- NULL } # Propagate all trends / lvs forward jointly using sampled trend parameters trends <- forecast_trend( trend_model = trend_model, use_lv = use_lv, trend_pars = general_trend_pars, h = fc_horizon, betas_trend = betas_trend, Xp_trend = Xp_trend, time = unique( data_test$index..time..index - min(object$obs_data$index..time..index) + 1 ), cap = cap, time_dis = time_dis ) } # Loop across series and produce the next trend estimate trend_states <- lapply(seq_len(n_series), function(series) { # Sample series- and trend-specific parameters trend_extracts <- extract_series_trend_pars( series = series, samp_index = samp_index, trend_pars = trend_pars, use_lv = use_lv ) if ( use_lv || trend_model %in% c('VAR1', 'PWlinear', 'PWlogistic', 'CAR1') ) { if (use_lv) { # Multiply lv states with loadings to generate the series' forecast trend state out <- as.numeric(trends %*% trend_extracts$lv_coefs) } else if ( trend_model %in% c('VAR1', 'PWlinear', 'PWlogistic', 'CAR1') ) { out <- trends[, series] } } else { # Propagate the series-specific trends forward out <- forecast_trend( trend_model = trend_model, use_lv = FALSE, trend_pars = trend_extracts, h = fc_horizon, betas_trend = betas_trend, Xp_trend = Xp_trend, time = sort(unique(data_test$index..time..index)), time_dis = NULL ) } out }) if (type == 'trend') { out <- trend_states } else { trend_states <- do.call(cbind, trend_states) out <- lapply(seq_len(n_series), function(series) { if (family == 'nmix') { Xpmat <- Xp[which(as.numeric(data_test$series) == series), ] latent_lambdas <- exp(trend_states[, series]) pred_betas <- betas cap <- data_test$cap[which(as.numeric(data_test$series) == series)] } else { Xpmat <- cbind( Xp[which(as.numeric(data_test$series) == series), ], trend_states[, series] ) latent_lambdas <- NULL pred_betas <- c(betas, 1) cap <- NULL } if (!is.null(attr(Xp, 'model.offset'))) { attr(Xpmat, 'model.offset') <- attr(Xp, 'model.offset')[which( as.numeric(data_test$series) == series )] attr(Xpmat, 'model.offset')[is.na(attr(Xpmat, 'model.offset'))] <- 0 } # Family-specific parameters family_extracts <- lapply(seq_along(family_pars), function(x) { if (is.matrix(family_pars[[x]])) { if (obs_uncertainty) { family_pars[[x]][samp_index, series] } else { family_pars[[x]][1, series] } } else { if (obs_uncertainty) { family_pars[[x]][samp_index] } else { family_pars[[x]][1] } } }) names(family_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (family %in% c('binomial', 'beta_binomial')) { family_extracts$trials <- trials[, series] } mvgam_predict( family = family, Xp = Xpmat, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = pred_betas, family_pars = family_extracts ) }) } out }) } return(fc_preds) } ================================================ FILE: R/formula.mvgam.R ================================================ #'Extract formulae from \pkg{mvgam} objects #' #'@rdname formula.mvgam #'@param x `mvgam`, `jsdgam` or `mvgam_prefit` object #'@param trend_effects \code{logical}, return the formula from the #'observation model (if \code{FALSE}) or from the underlying process #'model (if\code{TRUE}) #'@param ... Ignored #'@author Nicholas J Clark #'@return A \code{formula} object #'@export formula.mvgam = function(x, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(x$trend_call)) { stop('no trend_formula exists so there is no trend-level model.frame') } } if (!trend_effects) { out <- x$call } else { out <- x$trend_call out <- update(out, as.formula(paste('trend_y', '~.'))) } return(out) } #'@rdname formula.mvgam #'@export formula.mvgam_prefit = function(x, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(x$trend_call)) { stop('no trend_formula exists so there is no trend-level model.frame') } } if (!trend_effects) { out <- x$call } else { out <- x$trend_call out <- update(out, as.formula(paste('trend_y', '~.'))) } return(out) } ================================================ FILE: R/get_linear_predictors.R ================================================ #' Function to prepare observation model linear predictor matrix #' @importFrom brms brmsterms #' @noRd obs_Xp_matrix = function(newdata, mgcv_model) { suppressWarnings( Xp <- try( predict(mgcv_model, newdata = newdata, type = 'lpmatrix'), silent = TRUE ) ) if (inherits(Xp, 'try-error')) { testdat <- data.frame(time = newdata$time) terms_include <- insight::find_predictors(mgcv_model)$conditional if (any(terms_include %in% names(newdata) == FALSE)) { stop( 'not all required variables have been supplied in newdata!', call. = FALSE ) } if (length(terms_include) > 0L) { newnames <- vector() newnames[1] <- 'time' for (i in 1:length(terms_include)) { testdat <- cbind(testdat, data.frame(newdata[[terms_include[i]]])) newnames[i + 1] <- terms_include[i] } colnames(testdat) <- newnames } suppressWarnings( Xp <- predict(mgcv_model, newdata = testdat, type = 'lpmatrix') ) } # Check for any gp() terms and update the design matrix # accordingly if (!is.null(attr(mgcv_model, 'gp_att_table'))) { # Compute the gp() eigenfunctions for newdata using the supplied brms_mock object # Requires a dataframe of all relevant variables for the gp effects mock_terms <- brms::brmsterms(attr(mgcv_model, 'brms_mock')$formula) terms_needed <- unique(all.vars(mock_terms$formula)[-1]) newdata_mock <- data.frame(newdata[[terms_needed[1]]]) if (length(terms_needed) > 1L) { for (i in 2:length(terms_needed)) { newdata_mock <- cbind( newdata_mock, data.frame(newdata[[terms_needed[i]]]) ) } } colnames(newdata_mock) <- terms_needed newdata_mock$.fake_gp_y <- rnorm(NROW(newdata_mock)) brms_mock_data <- brms::standata( attr(mgcv_model, 'brms_mock'), newdata = newdata_mock, internal = TRUE ) # Extract GP attributes gp_att_table <- attr(mgcv_model, 'gp_att_table') bys <- unlist(purrr::map(gp_att_table, 'by'), use.names = FALSE) lvls <- unlist(purrr::map(gp_att_table, 'level'), use.names = FALSE) # Extract eigenfunctions for each gp effect eigenfuncs <- eigenfunc_list( stan_data = brms_mock_data, mock_df = newdata_mock, by = bys, level = lvls ) # Find indices to replace in the design matrix and replace with # the computed eigenfunctions starts <- purrr::map(gp_att_table, 'first_coef') ends <- purrr::map(gp_att_table, 'last_coef') for (i in seq_along(starts)) { Xp[, c(starts[[i]]:ends[[i]])] <- eigenfuncs[[i]] } } return(Xp) } #' Function to prepare trend linear predictor matrix in the presence of a #' trend_map #' @noRd trend_map_data_prep = function(newdata, trend_map, forecast = FALSE) { trend_test <- newdata trend_indicators <- vector(length = length(trend_test$series)) for (i in 1:length(trend_test$series)) { trend_indicators[i] <- trend_map$trend[which( as.character(trend_map$series) == as.character(trend_test$series[i]) )] } trend_indicators <- factor( paste0('trend', trend_indicators), levels = paste0('trend', unique(trend_map$trend)) ) trend_test$series <- trend_indicators trend_test$y <- NULL # Only keep one time observation per trend, in case this is a reduced dimensionality # State-Space model (with a trend_map) and we are forecasting ahead if (forecast) { data.frame( series = trend_test$series, time = trend_test$index..time..index, row_num = 1:length(trend_test$index..time..index) ) %>% dplyr::group_by(series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> inds_keep inds_keep <- sort(inds_keep) if (inherits(trend_test, 'list')) { trend_test <- lapply(trend_test, function(x) { if (is.matrix(x)) { matrix(x[inds_keep, ], ncol = NCOL(x)) } else { x[inds_keep] } }) } else { trend_test <- trend_test[inds_keep, ] } } return(trend_test) } #' Function to prepare trend linear predictor matrix, ensuring ordering and #' indexing is correct with respect to the model structure #' @noRd trend_Xp_matrix = function( newdata, trend_map, series = 'all', mgcv_model, forecast = FALSE ) { trend_test <- trend_map_data_prep(newdata, trend_map, forecast = forecast) suppressWarnings( Xp_trend <- try( predict(mgcv_model, newdata = trend_test, type = 'lpmatrix'), silent = TRUE ) ) if (inherits(Xp_trend, 'try-error')) { testdat <- data.frame(series = trend_test$series) terms_include <- insight::find_predictors(mgcv_model)$conditional if (any(terms_include %in% names(trend_test) == FALSE)) { stop( 'not all required variables have been supplied in newdata!', call. = FALSE ) } if (length(terms_include) > 0) { newnames <- vector() newnames[1] <- 'series' for (i in 1:length(terms_include)) { testdat <- cbind(testdat, data.frame(trend_test[[terms_include[i]]])) newnames[i + 1] <- terms_include[i] } colnames(testdat) <- newnames } suppressWarnings( Xp_trend <- predict(mgcv_model, newdata = testdat, type = 'lpmatrix') ) } # Check for any gp() terms and update the design matrix # accordingly if (!is.null(attr(mgcv_model, 'gp_att_table'))) { # Compute the gp() eigenfunctions for newdata using the supplied brms_mock object # Requires a dataframe of all relevant variables for the gp effects mock_terms <- brms::brmsterms(attr(mgcv_model, 'brms_mock')$formula) terms_needed <- unique(all.vars(mock_terms$formula)[-1]) newdata_mock <- data.frame(trend_test[[terms_needed[1]]]) if (length(terms_needed) > 1L) { for (i in 2:length(terms_needed)) { newdata_mock <- cbind( newdata_mock, data.frame(trend_test[[terms_needed[i]]]) ) } } colnames(newdata_mock) <- terms_needed newdata_mock$.fake_gp_y <- rnorm(NROW(newdata_mock)) brms_mock_data <- brms::standata( attr(mgcv_model, 'brms_mock'), newdata = newdata_mock, internal = TRUE ) # Extract GP attributes gp_att_table <- attr(mgcv_model, 'gp_att_table') bys <- unlist(purrr::map(gp_att_table, 'by'), use.names = FALSE) lvls <- unlist(purrr::map(gp_att_table, 'level'), use.names = FALSE) # Extract eigenfunctions for each gp effect eigenfuncs <- eigenfunc_list( stan_data = brms_mock_data, mock_df = newdata_mock, by = bys, level = lvls ) # Find indices to replace in the design matrix and replace with # the computed eigenfunctions starts <- purrr::map(gp_att_table, 'first_coef') ends <- purrr::map(gp_att_table, 'last_coef') for (i in seq_along(starts)) { Xp_trend[, c(starts[[i]]:ends[[i]])] <- eigenfuncs[[i]] } } return(Xp_trend) } ================================================ FILE: R/get_monitor_pars.R ================================================ #' Return parameters to monitor during modelling #' #' #' @param family \code{character} #' @param smooths_included Logical. Are smooth terms included in the model formula? #' @param use_lv Logical (use latent variable trends or not) #' @param trend_model The type of trend model used #' @param drift Logical (was a drift term estimated or not) #' @return A string of parameters to monitor #' @noRd get_monitor_pars = function( family, smooths_included = TRUE, use_lv, trend_model, drift ) { family <- match.arg( arg = family, choices = c( "negative binomial", "poisson", "tweedie", "beta", "gaussian", "lognormal", "student", "Gamma", "nmix", "binomial", "bernoulli", "beta_binomial" ) ) if (smooths_included) { param <- c('rho', 'b', 'ypred', 'mus', 'lp__', 'lambda') } else { param <- c('b', 'ypred', 'mus', 'lp__') } # Family-specific parameters to monitor param <- c(param, family_param_info(family)$param_names) # Trend-specific parameters param <- c( param, trend_par_names(trend_model = trend_model, use_lv = use_lv, drift = drift) ) return(param) } ================================================ FILE: R/get_mvgam_priors.R ================================================ #' Extract information on default prior distributions for an \pkg{mvgam} model #' #' This function lists the parameters that can have their prior distributions #' changed for a given model, as well listing their default distributions #' #' @inheritParams mvgam #' #' @inheritParams jsdgam #' #' @param ... Not currently used #' #' @param factor_formula Can be supplied instead `trend_formula` to match #' syntax from [jsdgam] #' #' @details Users can supply a model formula, prior to fitting the model, so #' that default priors can be inspected and altered. To make alterations, #' change the contents of the `prior` column and supplying this #' \code{data.frame} to the \code{\link{mvgam}} or \code{\link{jsdgam}} #' functions using the argument `priors`. If using `Stan` as the backend, #' users can also modify the parameter bounds by modifying the #' `new_lowerbound` and/or `new_upperbound` columns. This will be necessary #' if using restrictive distributions on some parameters, such as a Beta #' distribution for the trend sd parameters for example (Beta only has #' support on \code{(0,1)}), so the upperbound cannot be above `1`. Another #' option is to make use of the prior modification functions in \pkg{brms} #' (i.e. \code{\link[brms]{prior}}) to change prior distributions and bounds #' (just use the name of the parameter that you'd like to change as the #' `class` argument; see examples below) #' #' @note Only the `prior`, `new_lowerbound` and/or `new_upperbound` columns of #' the output should be altered when defining the user-defined priors for #' the model. Use only if you are familiar with the underlying probabilistic #' programming language. There are no sanity checks done to ensure that the #' code is legal (i.e. to check that lower bounds are smaller than upper #' bounds, for example) #' #' @author Nicholas J Clark #' #' @seealso \code{\link{mvgam}}, \code{\link{mvgam_formulae}}, #' \code{\link[brms]{prior}} #' #' @return either a \code{data.frame} containing the prior definitions (if any #' suitable priors can be altered by the user) or \code{NULL}, indicating #' that no priors in the model can be modified #' #' @examples #' \dontrun{ #' # ======================================================================== #' # Example 1: Simulate data and inspect default priors #' # ======================================================================== #' #' dat <- sim_mvgam(trend_rel = 0.5) #' #' # Get a model file that uses default mvgam priors for inspection (not #' # always necessary, but this can be useful for testing whether your #' # updated priors are written correctly) #' mod_default <- mvgam( #' y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, #' family = nb(), #' data = dat$data_train, #' trend_model = AR(p = 2), #' run_model = FALSE #' ) #' #' # Inspect the model file with default mvgam priors #' stancode(mod_default) #' #' # Look at which priors can be updated in mvgam #' test_priors <- get_mvgam_priors( #' y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, #' family = nb(), #' data = dat$data_train, #' trend_model = AR(p = 2) #' ) #' test_priors #' #' # ======================================================================== #' # Example 2: Modify priors manually #' # ======================================================================== #' #' # Make a few changes; first, change the population mean for the #' # series-level random intercepts #' test_priors$prior[2] <- "mu_raw ~ normal(0.2, 0.5);" #' #' # Now use stronger regularisation for the series-level AR2 coefficients #' test_priors$prior[5] <- "ar2 ~ normal(0, 0.25);" #' #' # Check that the changes are made to the model file without any warnings #' # by setting 'run_model = FALSE' #' mod <- mvgam( #' y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, #' family = nb(), #' data = dat$data_train, #' trend_model = AR(p = 2), #' priors = test_priors, #' run_model = FALSE #' ) #' stancode(mod) #' #' # No warnings, the model is ready for fitting now in the usual way with #' # the addition of the 'priors' argument #' #' # ======================================================================== #' # Example 3: Use brms syntax for prior modification #' # ======================================================================== #' #' # The same can be done using 'brms' functions; here we will also change #' # the ar1 prior and put some bounds on the ar coefficients to enforce #' # stationarity; we set the prior using the 'class' argument in all brms #' # prior functions #' brmsprior <- c( #' prior(normal(0.2, 0.5), class = mu_raw), #' prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), #' prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1) #' ) #' brmsprior #' #' mod <- mvgam( #' y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, #' family = nb(), #' data = dat$data_train, #' trend_model = AR(p = 2), #' priors = brmsprior, #' run_model = FALSE #' ) #' stancode(mod) #' #' # ======================================================================== #' # Example 4: Error handling example #' # ======================================================================== #' #' # Look at what is returned when an incorrect spelling is used #' test_priors$prior[5] <- "ar2_bananas ~ normal(0, 0.25);" #' mod <- mvgam( #' y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, #' family = nb(), #' data = dat$data_train, #' trend_model = AR(p = 2), #' priors = test_priors, #' run_model = FALSE #' ) #' stancode(mod) #' #' # ======================================================================== #' # Example 5: Parametric (fixed effect) priors #' # ======================================================================== #' #' simdat <- sim_mvgam() #' #' # Add a fake covariate #' simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) #' #' priors <- get_mvgam_priors( #' y ~ cov + s(season), #' data = simdat$data_train, #' family = poisson(), #' trend_model = AR() #' ) #' #' # Change priors for the intercept and fake covariate effects #' priors$prior[1] <- "(Intercept) ~ normal(0, 1);" #' priors$prior[2] <- "cov ~ normal(0, 0.1);" #' #' mod2 <- mvgam( #' y ~ cov + s(season), #' data = simdat$data_train, #' trend_model = AR(), #' family = poisson(), #' priors = priors, #' run_model = FALSE #' ) #' stancode(mod2) #' #' # ======================================================================== #' # Example 6: Alternative brms syntax for fixed effects #' # ======================================================================== #' #' # Likewise using 'brms' utilities (note that you can use Intercept rather #' # than `(Intercept)`) to change priors on the intercept #' brmsprior <- c( #' prior(normal(0.2, 0.5), class = cov), #' prior(normal(0, 0.25), class = Intercept) #' ) #' brmsprior #' #' mod2 <- mvgam( #' y ~ cov + s(season), #' data = simdat$data_train, #' trend_model = AR(), #' family = poisson(), #' priors = brmsprior, #' run_model = FALSE #' ) #' stancode(mod2) #' #' # ======================================================================== #' # Example 7: Bulk prior assignment #' # ======================================================================== #' #' # The "class = 'b'" shortcut can be used to put the same prior on all #' # 'fixed' effect coefficients (apart from any intercepts) #' set.seed(0) #' dat <- mgcv::gamSim(1, n = 200, scale = 2) #' dat$time <- 1:NROW(dat) #' mod <- mvgam( #' y ~ x0 + x1 + s(x2) + s(x3), #' priors = prior(normal(0, 0.75), class = "b"), #' data = dat, #' family = gaussian(), #' run_model = FALSE #' ) #' stancode(mod) #' } #' #' @export get_mvgam_priors = function( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = 'None', family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... ) { # Validate the data dots <- list(...) if (missing("data")) { if ('data_train' %in% names(dots)) { message('argument "data_train" is deprecated; supply as "data" instead') data <- dots$data_train dots$data_train <- NULL } else { stop('Argument "data" is missing with no default', call. = FALSE) } } if (!missing("data")) { data_train <- data } orig_data <- data_train # Set trend_formula if (!missing(factor_formula)) { if (missing(n_lv)) { n_lv <- 2 } validate_pos_integer(n_lv) unit <- deparse0(substitute(unit)) subgr <- deparse0(substitute(species)) prepped_trend <- prep_jsdgam_trend(unit = unit, subgr = subgr, data = data) trend_model <- 'None' data_train <- validate_series_time(data = data, trend_model = prepped_trend) trend_map <- prep_jsdgam_trendmap(data_train, n_lv) if (!missing(trend_formula)) { warning( 'Both "trend_formula" and "factor_formula" supplied\nUsing "factor_formula" as default' ) } trend_formula <- factor_formula } # Validate the trend arguments if ('drift' %in% names(dots)) { message( 'The "drift" argument is deprecated; use fixed effects of "time" instead' ) dots$drift <- NULL } drift <- FALSE orig_trend_model <- trend_model trend_model <- validate_trend_model( orig_trend_model, drift = drift, noncentred = FALSE, warn = FALSE ) # Ensure series and time variables are present data_train <- validate_series_time( data_train, name = 'data', trend_model = orig_trend_model ) # Validate the formula to convert any dynamic() terms formula <- interpret_mvgam(formula, N = max(data_train$time), family = family) # Check for gp terms in the validated formula list2env( check_gp_terms(formula, data_train, family = family), envir = environment() ) # Check for missing rhs in formula list2env(check_obs_intercept(formula, orig_formula), envir = environment()) # Validate observation formula formula <- interpret_mvgam(formula, N = max(data_train$time)) data_train <- validate_obs_formula(formula, data = data_train, refit = FALSE) # Validate the family argument use_stan <- TRUE family <- validate_family(family, use_stan = use_stan) family_char <- match.arg(arg = family$family, choices = family_char_choices()) # Nmixture additions? list2env( check_nmix( family, family_char, trend_formula, trend_model, trend_map, data_train, priors = TRUE ), envir = environment() ) # Validate remaining trend arguments trend_val <- validate_trend_restrictions( trend_model = trend_model, formula = formula, trend_formula = trend_formula, trend_map = trend_map, drift = drift, drop_obs_intercept = drop_obs_intercept, use_lv = use_lv, n_lv = n_lv, data_train = data_train, use_stan = use_stan, priors = TRUE ) list2env(trend_val, envir = environment()) if (is.null(trend_map)) { trend_map <- rlang::missing_arg() } if (is.null(n_lv)) { n_lv <- rlang::missing_arg() } # If trend_formula supplied, first run get_mvgam_priors for the observation model # and then modify the resulting output if (!missing(trend_formula)) { if (trend_model == 'None') { trend_model <- 'RW' } validate_trend_formula(trend_formula) prior_df <- get_mvgam_priors( formula = orig_formula, data = data_train, family = family, use_lv = FALSE, trend_model = if (trend_model == 'None') { RW() } else { orig_trend_model }, trend_map = trend_map, knots = knots ) # Replace any terms labelled 'trend' with 'series' for creating the necessary # structures trend_formula <- formula(paste( gsub('trend', 'series', as.character(trend_formula), fixed = TRUE), collapse = " " )) # Drop any intercept from the formula if this is not an N-mixture model or if a # trend_map was not originally supplied if (family_char == 'nmix') { drop_trend_int <- FALSE } else { drop_trend_int <- TRUE } if (!missing(trend_map)) { drop_trend_int <- FALSE } if (drop_trend_int) { if (attr(terms(trend_formula), 'intercept') == 1) { trend_formula <- update(trend_formula, trend_y ~ . - 1) } else { trend_formula <- update(trend_formula, trend_y ~ .) } } else { trend_formula <- update(trend_formula, trend_y ~ .) } trend_train <- data_train trend_train$time <- trend_train$index..time..index trend_train$trend_y <- rnorm(length(trend_train$time)) # Add indicators of trend names as factor levels using the trend_map trend_indicators <- vector(length = length(trend_train$time)) for (i in 1:length(trend_train$time)) { trend_indicators[i] <- trend_map$trend[which( trend_map$series == trend_train$series[i] )] } trend_indicators <- as.factor(paste0('trend', trend_indicators)) trend_train$series <- trend_indicators trend_train$y <- NULL # Only keep one time observation per trend data.frame( series = trend_train$series, time = trend_train$time, row_num = 1:length(trend_train$time) ) %>% dplyr::group_by(series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> inds_keep if (inherits(trend_train, 'list')) { trend_train <- lapply(trend_train, function(x) { if (is.matrix(x)) { matrix(x[inds_keep, ], ncol = NCOL(x)) } else { x[inds_keep] } }) } else { trend_train <- trend_train[inds_keep, ] } # Now get the priors related to the trend model trend_prior_df <- get_mvgam_priors( trend_formula, data = trend_train, family = gaussian(), trend_model = 'None', knots = trend_knots ) # Modify some of the term names and return if (any(grepl('fixed effect', trend_prior_df$param_info))) { para_lines <- grep('fixed effect', trend_prior_df$param_info) for (i in para_lines) { trend_prior_df$param_name[i] <- paste0( trend_prior_df$param_name[i], '_trend' ) trend_prior_df$prior[i] <- paste0( trimws(strsplit(trend_prior_df$prior[i], "[~]")[[1]][1]), '_trend ~ student_t(3, 0, 2);' ) trend_prior_df$example_change[i] <- paste0( trimws(strsplit(trend_prior_df$example_change[i], "[~]")[[1]][1]), '_trend ~ normal(0, 1);' ) } } if (any(grepl('(Intercept)', trend_prior_df$param_info))) { para_lines <- grep('(Intercept)', trend_prior_df$param_info) for (i in para_lines) { trend_prior_df$param_name[i] <- paste0( trend_prior_df$param_name[i], '_trend' ) trend_prior_df$prior[i] <- paste0( trimws(strsplit(trend_prior_df$prior[i], "[~]")[[1]][1]), '_trend ~ student_t(3, 0, 2);' ) trend_prior_df$example_change[i] <- paste0( trimws(strsplit(trend_prior_df$example_change[i], "[~]")[[1]][1]), '_trend ~ normal(0, 1);' ) trend_prior_df$param_info[i] <- '(Intercept) for the trend' } } trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("lambda", "lambda_trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("n_sp", "n_sp_trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("mu_raw", "mu_raw_trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("sigma_raw", "sigma_raw_trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("n_series", "n_lv", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("series", "trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("alpha_gp", "alpha_gp_trend", x) ) trend_prior_df[] <- lapply( trend_prior_df, function(x) gsub("rho_gp", "rho_gp_trend", x) ) trend_prior_df <- trend_prior_df[ !trend_prior_df$param_info == 'observation error sd', ] out <- rbind(prior_df, trend_prior_df) out[] <- lapply(out, function(x) gsub("trend sd", "process error sd", x)) out[] <- lapply(out, function(x) gsub("trend AR1", "process model AR1", x)) out[] <- lapply(out, function(x) gsub("trend AR2", "process model AR2", x)) out[] <- lapply(out, function(x) gsub("trend AR3", "process model AR3", x)) out[] <- lapply( out, function(x) gsub("trend drift", "process model drift", x) ) out[] <- lapply( out, function(x) { gsub( "vector[n_series] sigma;", "vector[n_lv] sigma;", x, fixed = TRUE ) } ) # Remove intercept prior if an intercept was suppressed from the # observation model if (drop_obs_intercept) { if (any(grepl('Intercept', out$param_name))) { which_obs_int <- grep('Intercept', out$param_name) & !grep('(Intercept)_trend', out$param_name) if (length(which_obs_int) > 0L) out <- out[-which_obs_int, ] } } # Remove sigma prior if this is an N-mixture with no dynamics if (add_nmix & trend_model == 'None') { out <- out[ -grep('vector[n_lv] sigma;', out$param_name, fixed = TRUE), ] } } else { # JAGS cannot support latent GP, VAR or piecewise trends if ( !use_stan & trend_model %in% c('GP', 'VAR1', 'PWlinear', 'PWlogistic') ) { stop( 'Gaussian Process, VAR and piecewise trends not supported for JAGS', call. = FALSE ) } if (use_stan & family_char == 'tweedie') { warning('Tweedie family not supported for Stan; reverting to JAGS') use_stan <- FALSE } # Number of latent variables cannot be greater than number of series if (use_lv) { if (missing(n_lv)) { n_lv <- min(2, floor(length(unique(data_train$series)) / 2)) } if (n_lv > length(unique(data_train$series))) { stop( 'number of latent variables cannot be greater than number of series', call. = FALSE ) } } # # No point in latent variables if trend model is None # if(trend_model == 'None' & use_lv){ # use_lv <- FALSE # warning('No point in latent variables if trend model is None; changing use_lv to FALSE') # } # Fill in missing observations in data_train so the size of the dataset is correct when # building the initial JAGS model resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(formula(formula))[[2]]) } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] out_name <- resp_terms[1] } } data_train[[out_name]] <- replace_nas(data_train[[out_name]]) # Some general family-level restrictions can now be checked validate_family_restrictions( response = data_train[[out_name]], family = family ) # Use a small fit from mgcv to extract relevant information on smooths included # in the model ss_gam <- try( mvgam_setup( formula = formula, family = family_to_mgcvfam(family), dat = data_train, knots = knots ), silent = TRUE ) if (inherits(ss_gam, 'try-error')) { if (grepl('missing values', ss_gam[1])) { stop( paste( 'Missing values found in data predictors:\n', attr(ss_gam, 'condition') ), call. = FALSE ) } else { stop(paste(ss_gam[1]), call. = FALSE) } } # Parametric effect priors if (use_stan) { smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, term = paste(ss_gam$smooth[[x]]$term, collapse = ','), class = class(ss_gam$smooth[[x]])[1] ) }) ) lpmat <- suppressWarnings(predict( ss_gam, type = 'lpmatrix', exclude = smooth_labs$label )) para_indices <- which(apply(lpmat, 2, function(x) !all(x == 0)) == TRUE) int_included <- attr(ss_gam$pterms, 'intercept') == 1L if (int_included) { other_pterms <- names(para_indices)[-1] } else { other_pterms <- names(para_indices) } all_paras <- other_pterms para_priors <- c() para_info <- c() if (length(other_pterms) > 0) { para_priors <- c( para_priors, paste(other_pterms, '~ student_t(3, 0, 2);') ) para_info <- c(para_info, paste(other_pterms, 'fixed effect')) } if (int_included) { all_paras <- c('(Intercept)', all_paras) # Compute default intercept prior using brms def_int <- make_default_int(response = data_train$y, family = family) para_priors <- c( paste0(def_int$class, ' ~ ', def_int$prior, ';'), para_priors ) para_info <- c('(Intercept)', para_info) } if (length(all_paras) == 0) { para_df <- NULL } else { para_df <- data.frame( param_name = all_paras, param_length = 1, param_info = para_info, prior = para_priors, example_change = c( paste0(all_paras, ' ~ normal(0, 1);') ) ) } } else { para_df <- NULL } # Extract information on the number of smoothing parameters and # random effects smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, class = class(ss_gam$smooth[[x]])[1], nsp = ss_gam$smooth[[x]]$last.sp - ss_gam$smooth[[x]]$first.sp + 1 ) }) ) # Check for gp() terms if (!is.null(gp_terms)) { gp_additions <- make_gp_additions( gp_details = gp_details, orig_formula = orig_formula, data = data_train, newdata = NULL, model_data = list(X = t(predict(ss_gam, type = 'lpmatrix'))), mgcv_model = ss_gam, gp_terms = gp_terms, family = family ) gp_names <- unlist( purrr::map(gp_additions$gp_att_table, 'name'), use.names = FALSE ) gp_isos <- unlist( purrr::map(gp_additions$gp_att_table, 'iso'), use.names = FALSE ) abbv_names <- vector(mode = 'list', length = length(gp_names)) full_names <- vector(mode = 'list', length = length(gp_names)) for (i in seq_len(length(gp_names))) { if (gp_isos[i]) { abbv_names[[i]] <- gp_names[i] full_names[[i]] <- paste0(gp_names[i], '[1]') } else { abbv_names[[i]] <- paste0(gp_names[i], '[1][', 1:2, ']') full_names[[i]] <- paste0(gp_names[i], '[1][', 1:2, ']') } } full_names <- unlist(full_names, use.names = FALSE) abbv_names <- unlist(abbv_names, use.names = FALSE) alpha_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_alpha'), use.names = FALSE ) rho_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho'), use.names = FALSE ) rho_2_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho_2'), use.names = FALSE ) full_priors <- vector(mode = 'list', length = length(gp_names)) for (i in seq_len(length(gp_names))) { if (gp_isos[i]) { full_priors[[i]] <- rho_priors[i] } else { full_priors[[i]] <- c(rho_priors[i], rho_2_priors[i]) } } full_priors <- unlist(full_priors, use.names = FALSE) smooth_labs <- smooth_labs %>% dplyr::filter( !label %in% gsub('gp(', 's(', gp_names, fixed = TRUE) ) alpha_df <- data.frame( param_name = paste0('real alpha_', gp_names, ';'), param_length = 1, param_info = paste(gp_names, 'marginal deviation'), prior = paste0('alpha_', gp_names, ' ~ ', alpha_priors, ';'), example_change = paste0( 'alpha_', gp_names, ' ~ ', 'normal(0, ', round(runif(length(gp_names), 0.5, 1), 2), ');' ) ) rho_df <- data.frame( param_name = paste0('real rho_', abbv_names, ';'), param_length = 1, param_info = paste(abbv_names, 'length scale'), prior = paste0('rho_', full_names, ' ~ ', full_priors, ';'), example_change = paste0( 'rho_', full_names, ' ~ ', 'normal(0, ', round(runif(length(full_names), 0.5, 1), 2), ');' ) ) gp_df <- rbind(alpha_df, rho_df) } else { gp_df <- NULL } # Smoothing parameter priors for non-random effect smooths if (any(smooth_labs$class != 'random.effect')) { n_smooth_params <- smooth_labs %>% dplyr::filter(class != 'random.effect') %>% dplyr::pull(nsp) nonre_smooths <- smooth_labs %>% dplyr::filter(class != 'random.effect') %>% dplyr::pull(label) if (use_stan) { sp_df <- data.frame( param_name = 'vector[n_sp] lambda;', param_length = sum(smooth_labs$nsp), param_info = c(paste( nonre_smooths, 'smooth parameters', collapse = ', ' )), prior = 'lambda ~ normal(5, 30);', # Add an example for changing the prior; note that it is difficult to # understand how to change individual smoothing parameter priors because each # one acts on a different subset of the smooth function parameter space example_change = c( paste0( 'lambda ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } else { # Not recommended to alter smoothing parameter priors for JAGS as the Gibbs sampler # needs to have informative priors to have any hope of convergence sp_df <- NULL } } else { sp_df <- NULL } # Population mean and sd priors for random effect smooths if (any(smooth_labs$class == 'random.effect')) { re_smooths <- smooth_labs %>% dplyr::filter(class == 'random.effect') %>% dplyr::pull(label) n_re_terms <- length(re_smooths) if (use_stan) { re_df <- data.frame( param_name = c( paste0('vector[', n_re_terms, '] mu_raw;'), paste0('vector[', n_re_terms, '] sigma_raw;') ), param_length = rep(n_re_terms, 2), param_info = c( paste(re_smooths, 'pop mean', collapse = ', '), paste(re_smooths, 'pop sd', collapse = ', ') ), prior = c('mu_raw ~ std_normal();', 'sigma_raw ~ exponential(0.5);') ) # Add example change that users could implement to put different priors # on each re's mean and sd if (n_re_terms > 1) { re_df <- cbind( re_df, data.frame( example_change = c( paste( paste0( 'mu_raw[', 1:n_re_terms, '] ~ normal(', round(runif(min = -1, max = 1, n = n_re_terms), 2), ', ', round(runif(min = 0.1, max = 1, n = n_re_terms), 2), ');' ), collapse = '\n' ), paste( paste0( 'sigma_raw[', 1:n_re_terms, '] ~ exponential(', round(runif(min = 0.01, max = 1, n = n_re_terms), 2), ');' ), collapse = '\n' ) ) ) ) } else { re_df <- cbind( re_df, data.frame( example_change = c( paste0( 'mu_raw ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'sigma_raw ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) ) } } else { # If using JAGS as the backend re_df <- data.frame( param_name = c( paste0('mu_raw', 1:n_re_terms), paste0('sigma_raw', 1:n_re_terms, '') ), param_length = 1, param_info = c( paste(re_smooths, 'pop mean'), paste(re_smooths, 'pop sd') ), prior = c( paste0('mu_raw', 1:n_re_terms, ' ~ dnorm(0, 1)'), paste0('sigma_raw', 1:n_re_terms, ' ~ dexp(0.5)') ) ) # Add example change that users could implement to put different priors # on each re's mean and sd if (n_re_terms > 1) { re_df <- cbind( re_df, data.frame( example_change = c( paste(paste0( 'mu_raw', 1:n_re_terms, ' ~ dnorm(', round(runif(min = -1, max = 1, n = n_re_terms), 2), ', ', round(runif(min = 0.1, max = 10, n = n_re_terms), 2), ')' )), paste(paste0( 'sigma_raw', 1:n_re_terms, ' ~ dexp(', round(runif(min = 0.01, max = 1, n = n_re_terms), 2), ')' )) ) ) ) } else { re_df <- cbind( re_df, data.frame( example_change = c( paste0( 'mu_raw ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'sigma_raw ~ dexp(', round(runif(min = 0.01, max = 1, n = 1), 2), ')' ) ) ) ) } } } else { re_df <- NULL } # Extract information on priors for trend components trend_df <- NULL if (trend_model %in% c('PWlinear', 'PWlogistic')) { # Need to fix this as a next priority # trend_df <- NULL trend_df <- data.frame( param_name = c( 'vector[n_series] k_trend;', 'vector[n_series] m_trend;' ), param_length = length(unique(data_train$series)), param_info = c('base trend growth rates', 'trend offset parameters'), prior = c('k_trend ~ std_normal();', 'm_trend ~ student_t(3, 0, 2.5);'), example_change = c( paste0( 'k ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'm ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } if (trend_model == 'GP') { if (use_lv) { trend_df <- data.frame( param_name = c('vector[n_lv] rho_gp;'), param_length = n_lv, param_info = c('trend length scale'), prior = c('rho_gp ~ inv_gamma(1.499007, 5.670433);'), example_change = paste0( 'rho_gp ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) } else { trend_df <- data.frame( param_name = c( 'vector[n_series] alpha_gp;', 'vector[n_series] rho_gp;' ), param_length = length(unique(data_train$series)), param_info = c('trend amplitude', 'trend length scale'), prior = c( 'alpha_gp ~ normal(0, 0.5);', 'rho_gp ~ inv_gamma(1.499007, 5.670433);' ), example_change = c( paste0( 'alpha_gp ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'rho_gp ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } trend_df <- rbind( trend_df, data.frame( param_name = c('int num_gp_basis;'), param_length = 1, param_info = c('basis dimension for approximate GP'), prior = c('num_gp_basis = min(20, n);'), example_change = 'num_gp_basis = 12;' ) ) } if (trend_model %in% c('ZMVN', 'ZMVNhiercor')) { trend_df <- data.frame( param_name = c(paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' )), param_length = ifelse(use_lv, n_lv, length(unique(data_train$series))), param_info = c('residual sd'), prior = c('sigma ~ exponential(2);'), example_change = c( paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } if (trend_model %in% c('RW', 'RWcor', 'RWhiercor')) { if (use_stan) { trend_df <- data.frame( param_name = c(paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' )), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c('trend sd'), prior = c('sigma ~ exponential(2);'), example_change = c( paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } else { trend_df <- data.frame( param_name = c('vector[n_series] sigma'), param_length = length(unique(data_train$series)), param_info = 'trend sd (for each series s)', prior = c('sigma[s] ~ dexp(1)T(0.075, 5)'), example_change = c( paste0( 'sigma[s] ~ dexp(', round(runif(min = 0.01, max = 1, n = 1), 2), ')' ) ) ) } } if (trend_model == 'VAR1') { trend_df <- data.frame( param_name = c('vector[n_series] sigma;'), param_length = c(length(unique(data_train$series))), param_info = c('trend sd'), prior = c('sigma ~ inv_gamma(2.3693353, 0.7311319);'), example_change = c( paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) trend_df <- rbind( trend_df, data.frame( param_name = c( "real es[1];", "real es[2];", "real fs[1];", "real fs[2];", "real gs[1];", "real gs[2];", "real hs[1];", "real hs[2];" ), param_length = 1, param_info = c( 'diagonal autocorrelation population mean', 'off-diagonal autocorrelation population mean', 'diagonal autocorrelation population variance', 'off-diagonal autocorrelation population variance', 'shape1 for diagonal autocorrelation precision', 'shape1 for off-diagonal autocorrelation precision', 'shape2 for diagonal autocorrelation precision', 'shape2 for off-diagonal autocorrelation precision' ), prior = c( "es[1] = 0;", "es[2] = 0;", "fs[1] = sqrt(0.455);", "fs[2] = sqrt(0.455);", "gs[1] = 1.365;", "gs[2] = 1.365;", "hs[1] = 0.071175;", "hs[2] = 0.071175;" ), example_change = c( "es[1] = 0.5;", "es[2] = 0.1;", "fs[1] = 0.6;", "fs[2] = 0.3;", "gs[1] = 1.1;", "gs[2] = 1.07;", "hs[1] = 0.08;", "hs[2] = 0.1;" ) ) ) } if (trend_model %in% c('VAR1cor', 'VARhiercor', 'VARMA1,1cor')) { trend_df <- data.frame( param_name = c('vector[n_series] sigma;'), param_length = c(length(unique(data_train$series))), param_info = c('trend sd'), prior = c('sigma ~ inv_gamma(2.3693353, 0.7311319);'), example_change = c( paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) trend_df <- rbind( trend_df, data.frame( param_name = c( "real es[1];", "real es[2];", "real fs[1];", "real fs[2];", "real gs[1];", "real gs[2];", "real hs[1];", "real hs[2];", "real L_Omega;" ), param_length = 1, param_info = c( 'diagonal autocorrelation population mean', 'off-diagonal autocorrelation population mean', 'diagonal autocorrelation population variance', 'off-diagonal autocorrelation population variance', 'shape1 for diagonal autocorrelation precision', 'shape1 for off-diagonal autocorrelation precision', 'shape2 for diagonal autocorrelation precision', 'shape2 for off-diagonal autocorrelation precision', 'LKJ prior on trend error correlations' ), prior = c( "es[1] = 0;", "es[2] = 0;", "fs[1] = sqrt(0.455);", "fs[2] = sqrt(0.455);", "gs[1] = 1.365;", "gs[2] = 1.365;", "hs[1] = 0.071175;", "hs[2] = 0.071175;", "L_Omega ~ lkj_corr_cholesky(2);" ), example_change = c( "es[1] = 0.5;", "es[2] = 0.1;", "fs[1] = 0.6;", "fs[2] = 0.3;", "gs[1] = 1.1;", "gs[2] = 1.07;", "hs[1] = 0.08;", "hs[2] = 0.1;", "L_Omega ~ lkj_corr_cholesky(4);" ) ) ) } if (trend_model == 'CAR1') { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse(use_lv, n_lv, length(unique(data_train$series))), param_info = c('trend AR1 coefficient', 'trend sd'), prior = c('ar1 ~ std_normal();', 'sigma ~ exponential(2);'), example_change = c( paste0( 'ar1 ~ normal(', round(runif(min = 0.1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } if (trend_model %in% c('AR1', 'AR1cor', 'AR1hiercor')) { if (use_stan) { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c('trend AR1 coefficient', 'trend sd'), prior = c('ar1 ~ std_normal();', 'sigma ~ exponential(2);'), example_change = c( paste0( 'ar1 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } else { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c( 'trend AR1 coefficient (for each series s)', 'trend sd (for each series s)' ), prior = c('ar1[s] ~ dnorm(0, 10)', 'sigma[s] ~ dexp(2)T(0.075, 5)'), example_change = c( paste0( 'ar1[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'sigma[s] ~ dexp(', round(runif(min = 0.01, max = 1, n = 1), 2), ')' ) ) ) } } if (trend_model %in% c('AR2', 'AR2cor', 'AR2hiercor')) { if (use_stan) { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar2;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c( 'trend AR1 coefficient', 'trend AR2 coefficient', 'trend sd' ), prior = c( 'ar1 ~ std_normal();', 'ar2 ~ std_normal();', 'sigma ~ exponential(2);' ), example_change = c( paste0( 'ar1 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'ar2 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } else { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar2;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c( 'trend AR1 coefficient (for each series s)', 'trend AR2 coefficient (for each series s)', 'trend sd (for each series s)' ), prior = c( 'ar1[s] ~ dnorm(0, 10)', 'ar2[s] ~ dnorm(0, 10)', 'sigma[s] ~ dexp(2)T(0.075, 5)' ), example_change = c( paste0( 'ar1[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'ar2[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'sigma[s] ~ dexp(', round(runif(min = 0.01, max = 1, n = 1), 2), ')' ) ) ) } } if (trend_model %in% c('AR3', 'AR3cor', 'AR3hiercor')) { if (use_stan) { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar2;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar3;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c( 'trend AR1 coefficient', 'trend AR2 coefficient', 'trend AR3 coefficient', 'trend sd' ), prior = c( 'ar1 ~ std_normal();', 'ar2 ~ std_normal();', 'ar3 ~ std_normal();', 'sigma ~ exponential(2);' ), example_change = c( paste0( 'ar1 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'ar2 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'ar3 ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ), paste0( 'sigma ~ exponential(', round(runif(min = 0.01, max = 1, n = 1), 2), ');' ) ) ) } else { trend_df <- data.frame( param_name = c( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar1;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar2;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] ar3;' ), paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ) ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c( 'trend AR1 coefficient (for each series s)', 'trend AR2 coefficient (for each series s)', 'trend AR3 coefficient (for each series s)', 'trend sd (for each series s)' ), prior = c( 'ar1[s] ~ dnorm(0, 10)', 'ar2[s] ~ dnorm(0, 10)', 'ar3[s] ~ dnorm(0, 10)', 'sigma[s] ~ dexp(2)T(0.075, 5)' ), example_change = c( paste0( 'ar1[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'ar2[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'ar3[s] ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ), paste0( 'sigma[s] ~ dexp(', round(runif(min = 0.01, max = 1, n = 1), 2), ')' ) ) ) } } # Remove options for trend variance priors if using a dynamic factor model if (use_lv) { if (missing(trend_map)) { trend_df %>% dplyr::filter( !grepl( paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] sigma;' ), param_name, fixed = TRUE ) ) -> trend_df } if (use_stan) { if (missing(trend_map)) { trend_df <- rbind( trend_df, data.frame( param_name = c('vector[M] L;'), param_length = n_lv * length(unique(data_train$series)), param_info = c('factor loadings'), prior = c('L ~ student_t(5, 0, 1);'), example_change = 'L ~ std_normal();' ) ) } } } # Extract drift parameter information if (drift) { if (use_stan) { drift_df <- data.frame( param_name = paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] drift;' ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c('trend drift'), prior = c('drift ~ std_normal();'), example_change = c( paste0( 'drift ~ normal(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ');' ) ) ) } else { drift_df <- data.frame( param_name = paste0( 'vector[', ifelse(use_lv, 'n_lv', 'n_series'), '] drift;' ), param_length = ifelse( use_lv, n_lv, length(unique(data_train$series)) ), param_info = c('trend drift (for each series s)'), prior = c('drift ~ dnorm(0, 10)'), example_change = c( paste0( 'drift ~ dnorm(', round(runif(min = -1, max = 1, n = 1), 2), ', ', round(runif(min = 0.1, max = 1, n = 1), 2), ')' ) ) ) } } else { drift_df <- NULL } # Extract information for family-specific parameters family_df <- family_prior_info( family = family_char, use_stan = use_stan, data = data_train ) # Return the dataframe of prior information prior_df <- rbind( para_df, gp_df, sp_df, re_df, trend_df, drift_df, family_df ) prior_df$new_lowerbound <- NA prior_df$new_upperbound <- NA # Final update to use more brms-like default priors on # scale parameters def_scale_prior <- update_default_scales( response = replace_nas(data_train[[out_name]]), family = family ) # Update in priors df if (any(grepl('sigma|sigma_raw|sigma_obs', prior_df$prior))) { lines_with_scales <- grep('sigma|sigma_raw|sigma_obs', prior_df$prior) for (i in lines_with_scales) { prior_df$prior[i] <- paste0( trimws(strsplit(prior_df$prior[i], "[~]")[[1]][1]), ' ~ ', def_scale_prior, ';' ) } } out <- prior_df } return(out) } #' @export #' @importFrom brms prior brms::prior #' @export #' @importFrom brms prior_ brms::prior_ #' @export #' @importFrom brms set_prior brms::set_prior #' @export #' @importFrom brms prior_string brms::prior_string #' Use informative scale and intercept priors following brms example #' @importFrom stats mad qcauchy setNames #' @importFrom brms logm1 prior_string get_prior #' @noRd make_default_scales = function(response, family) { def_scale_prior <- update_default_scales(response, family) c( prior_string(def_scale_prior, class = 'sigma'), prior_string(def_scale_prior, class = 'sigma_raw'), prior_string(def_scale_prior, class = 'sigma_obs') ) } #' @noRd make_default_int = function(response, family) { if (all(is.na(response))) { out <- prior_string("student_t(3, 0, 3.5)", class = '(Intercept)') } else if (family$family == 'nmix') { # Intercept prior in N-Mixtures applies to avg detection probability out <- prior_string("normal(0, 1.5)", class = '(Intercept)') } else { resp_dat <- data.frame(y = response[!is.na(response)]) int_prior <- get_prior( y ~ 1, data = resp_dat, family = family_to_brmsfam(family) ) out <- prior_string( int_prior$prior[which(int_prior$class == 'Intercept')], class = '(Intercept)' ) } return(out) } #' @noRd linkfun = function(x, link) { switch( link, identity = x, log = log(x), logm1 = logm1(x), log1p = log1p(x), inverse = 1 / x, sqrt = sqrt(x), `1/mu^2` = 1 / x^2, tan_half = tan(x / 2), logit = plogis(x), probit = qnorm(x), cauchit = qcauchy(x), probit_approx = qnorm(x), squareplus = (x^2 - 1) / x, stop("Link '", link, "' is not supported.", call. = FALSE) ) } #' @noRd update_default_scales = function( response, family, df = 3, center = TRUE ) { if (all(is.na(response))) { out <- paste0( "student_t(", paste0(as.character(c(df, '0', '3')), collapse = ", "), ")" ) } else { y <- response[!is.na(response)] link <- family$link if (link %in% c("log", "inverse", "1/mu^2")) { # avoid Inf in link(y) y <- ifelse(y == 0, y + 0.1, y) } y_link <- suppressWarnings(linkfun(y, link = link)) scale_y <- round(mad(y_link, na.rm = TRUE), 1) if (scale_y <= 5) { out <- 'inv_gamma(1.418, 0.452)' } if ( scale_y > 5 & scale_y <= 20 ) { out <- 'inv_gamma(0.9187, 0.3516)' } if (scale_y > 20) { out <- paste0( "student_t(", paste0(as.character(c(df, 0, scale_y)), collapse = ", "), ")" ) } } return(out) } ================================================ FILE: R/globals.R ================================================ #' Prevent R CMD Check notes about missing global variables due to #' dplyr mutates etc... #' @noRd utils::globalVariables(c( "y", "year", "smooth_vals", "smooth_num", "series", "season", "rowid", "row_number", "nsp", "last_assim", "index", "in_interval", "assimilated", "eval_horizon", "label", "mod_call", "particles", "obs", "mgcv_model", "param_name", "outcome", "mgcv_plottable", "term", "data_test", "object", "row_num", "trends_test", "trend", "trend_series", "trend_y", ".", "gam", "group", "mod", "row_id", "byvar", "direction", "index..time..index", "trend_test", "Var2", "add_cor", "add_ma", "add_nmix", "binomial", "current", "drop_obs_intercept", "gp_details", "gp_terms", "k", "mus", "name", "needed", "nmix_trendmap", "orig_formula", "trial", "use_var1", "use_var1cor", "xcols_drop", "time_lag", "dis_time", "maxt", "orig_rows", "matches", "time.", "file_name", ".data", "horizon", "target", "Series", "evd", "mean_evd", "total_evd", "smooth_label", "by_variable", "gr", "tot_subgrs", "subgr", "lambda", "level", "sim_hilbert_gp", "trend_model", "jags_path", "x", "elpds", "pareto_ks", "value", "threshold", "colour", "resids", "c_dark", "eval_timepoints", "yqlow", "ymidlow", "ymidhigh", "yqhigh", "preds", "yhigh", "ylow", "weight", "orig_weight", "imp_resp", "lower1", "lower2", "lower3", "lower4", "med", "resp_var", "upper1", "upper2", "upper3", "upper4", "draw", "fevd_Qlower", "fevd_Qupper", "fevd_median", "shock", "irf_median", "irf_Qlower", "irf_Qupper", "fevdQ50", "parameter", "pred_median", "pred_Qupper", "pred_Qlower", "lower_deriv", "med_deriv", "upper_deriv", "sum_contribution", "total", "Contribution", "Var1", "lv", "correlation", "Factor", "contribution", "unit", "lw", "data" )) ================================================ FILE: R/gp.R ================================================ #' Re-label gp terms inside an mgcv gam object for nicer plotting #' @noRd relabel_gps = function(mgcv_model) { if (length(mgcv_model$smooth) > 0L) { # Get classes of all smooths smooth_classes <- purrr::map(mgcv_model$smooth, class) # Check for gp() terms for (x in seq_along(smooth_classes)) { if (any(smooth_classes[[x]] %in% 'hilbert.smooth')) { mgcv_model$smooth[[x]]$label <- gsub('s\\(|ti\\(', 'gp(', mgcv_model$smooth[[x]]$label) } } } return(mgcv_model) } #' @noRd seq_cols <- function(x) { seq_len(NCOL(x)) } #' Make gp() attributes table and necessary stan lines #' @importFrom brms brm standata #' @noRd make_gp_additions = function( gp_details, orig_formula, data, newdata, model_data, mgcv_model, gp_terms, family = gaussian(), rho_names ) { by <- gp_details$by gp_details$row_id <- 1:NROW(gp_details) gp_covariates <- gp_details$gp_covariates gp_details_orig <- gp_details if (any(!is.na(by))) { for (i in 1:length(by)) { if (!is.na(by[i])) { if (is.factor(data[[by[i]]])) { nlevels <- length(levels(droplevels(data[[by[i]]]))) new_details <- do.call( rbind, lapply(1:nlevels, function(x) { gp_details_orig[i, ] }) ) new_details$level <- levels(droplevels(data[[by[i]]])) rows_drop <- which( gp_details$gp_covariates == gp_covariates[i] & gp_details$by == by[i] ) gp_details <- gp_details[-rows_drop, ] gp_details <- rbind(gp_details, new_details) } } } } # Preserve ordering of terms gp_details %>% dplyr::arrange(row_id, level) %>% dplyr::select(-row_id) -> gp_details # Initiate a brms GP model using the 'mock' backend so it doesn't actually fit; terms_needed <- unique(c( unlist(strsplit(gp_details$gp_covariates, ", |\\n")), unlist(strsplit(gp_details$by, ", |\\n")) )) terms_needed <- terms_needed[!is.na(terms_needed)] terms_needed <- terms_needed[!terms_needed %in% c('series', 'time')] brms_fake_df <- data.frame( .fake_gp_y = rnorm(length(data[[1]])), series = data$series, time = data$index..time..index ) for (i in seq_along(terms_needed)) { brms_fake_df <- cbind(brms_fake_df, data[[terms_needed[i]]]) } colnames(brms_fake_df) <- c('.fake_gp_y', 'series', 'time', terms_needed) brms_fake_df <- brms_fake_df %>% dplyr::arrange(time, series) # Build the gp formula to pass to the mock brms gp_formula <- reformulate( attr(terms(attr(gp_details, 'gp_formula')), 'term.labels'), '.fake_gp_y' ) brms_mock <- brms::brm( gp_formula, data = brms_fake_df, mock_fit = 1, backend = "mock", rename = FALSE ) brms_mock <- trim_mockbrms(brms_mock) # Eigenfunction design matrices (to be inserted into Xp matrices) brms_mock_data <- brms::standata(brms_mock) eigenfuncs <- eigenfunc_list( stan_data = brms_mock_data, mock_df = brms_fake_df, by = gp_details$by, level = gp_details$level ) # Eigenvalues (l_gp in mvgam stancode) eigenvals <- eigenval_list(brms_mock_data) # If newdata supplied, compute the eigenfunctions for these out of # sample data points if (!is.null(newdata)) { brms_fake_df_new <- data.frame( .fake_gp_y = rnorm(length(newdata[[1]])), series = newdata$series, time = newdata$index..time..index ) for (i in seq_along(terms_needed)) { brms_fake_df_new <- cbind(brms_fake_df_new, newdata[[terms_needed[i]]]) } colnames(brms_fake_df_new) <- c( '.fake_gp_y', 'series', 'time', terms_needed ) brms_fake_df_new <- brms_fake_df_new %>% dplyr::arrange(time, series) # Compute eigenfunctions for these new data and bind to the # training data eigenfunctions brms_mock_data_new <- brms::standata( brms_mock, newdata = brms_fake_df_new, internal = TRUE ) eigenfuncs_new <- eigenfunc_list( stan_data = brms_mock_data_new, mock_df = brms_fake_df_new, by = gp_details$by, level = gp_details$level ) for (i in seq_along(eigenfuncs)) { eigenfuncs[[i]] <- rbind(eigenfuncs[[i]], eigenfuncs_new[[i]]) } } # Numbers of basis functions (k_gp in mvgam stancode) k_gps <- lapply(eigenvals, function(x) NROW(x)) # Put all relevant data into a list gp_data <- lapply(seq_along(eigenvals), function(x) { byname <- ifelse(is.na(gp_details$by[x]), '', paste0(':', gp_details$by[x])) covariate_name <- paste0('gp(', gp_details$gp_covariates[x], ')', byname) if (!is.na(gp_details$level[x])) { covariate_name <- paste0(covariate_name, gp_details$level[x]) } orig_name <- if (gp_details$dim[x] > 1L) { paste0('ti(', gp_details$gp_covariates[x], ')', byname) } else { paste0('s(', gp_details$gp_covariates[x], ')', byname) } if (!is.na(gp_details$level[x])) { orig_name <- paste0(orig_name, gp_details$level[x]) } att_table <- list( effect = 'gp', name = covariate_name, orig_name = orig_name, dim = gp_details$dim[x], iso = gp_details$iso[x], kernel = gp_details$kernel[x], covariate = gp_details$gp_covariates[x], by = gp_details$b[x], level = gp_details$level[x], k = k_gps[[x]], def_rho = gp_details$def_rho[x], def_rho_2 = gp_details$def_rho_2[x], def_rho_3 = gp_details$def_rho_3[x], def_rho_4 = gp_details$def_rho_4[x], def_alpha = gp_details$def_alpha[x], eigenvalues = eigenvals[[x]] ) # Items to add to Stan data # Number of basis functions covariate_name <- clean_gpnames(covariate_name) data_lines <- paste0( 'int k_', covariate_name, '; // basis functions for approximate gp\n' ) append_dat <- list(k = k_gps[[x]]) names(append_dat) <- paste0('k_', covariate_name, '') # Approximate GP eigenvalues data_lines <- paste0( data_lines, paste0( 'array[', 'k_', covariate_name, '] vector[', gp_details$dim[x], '] l_', covariate_name, '; // approximate gp eigenvalues\n' ), collapse = '\n' ) append_dat2 <- list(slambda = eigenvals[[x]]) names(append_dat2) <- paste0('l_', covariate_name, '') append_dat <- append(append_dat, append_dat2) # Return necessary objects in a list list( att_table = att_table, data_lines = data_lines, data_append = append_dat, eigenfunctions = eigenfuncs[[x]] ) }) # Consolidate Stan data objects and add to model_data gp_stan_data <- do.call(c, purrr::map(gp_data, 'data_append')) model_data <- append(model_data, gp_stan_data) # Consolidate attribute tables gp_att_table <- purrr::map(gp_data, 'att_table') # Create updated design matrix by replacing the s() basis functions with # the gp() eigenfunctions coefs_replace <- list() for (x in gp_terms) { label <- attr(terms(formula(mgcv_model), keep.order = TRUE), 'term.labels')[ x ] s_attributes <- eval(rlang::parse_expr(label)) if (s_attributes$by != 'NA') { if (grepl('ti(', label, fixed = TRUE)) { coef_name <- paste0( 'ti(', paste(s_attributes$term, collapse = ','), '):', s_attributes$by ) } else { coef_name <- paste0('s(', s_attributes$term, '):', s_attributes$by) } } else { if (grepl('ti(', label, fixed = TRUE)) { coef_name <- paste0( 'ti(', paste(s_attributes$term, collapse = ','), ')' ) } else { coef_name <- paste0('s(', s_attributes$term, ')') } } which_replace <- grep(coef_name, names(coef(mgcv_model)), fixed = TRUE) names(mgcv_model$coefficients)[which_replace] <- if (grepl('ti(', label, fixed = TRUE)) { gsub( 'ti(', 'gp(', names(mgcv_model$coefficients)[which_replace], fixed = TRUE ) } else { gsub( 's(', 'gp(', names(mgcv_model$coefficients)[which_replace], fixed = TRUE ) } coefs_replace[[x]] <- which_replace } # Replace basis functions with gp() eigenfunctions newX <- model_data$X # Add eigenfunctions to the GAM design matrix eigenfuncs <- do.call(cbind, purrr::map(gp_data, 'eigenfunctions')) newX[unlist(coefs_replace), ] <- t(eigenfuncs) model_data$X <- newX # Consolidate Stan data lines gp_stan_lines <- paste0(purrr::map(gp_data, 'data_lines'), collapse = '') # Add coefficient indices to attribute table and to Stan data for (covariate in seq_along(gp_att_table)) { clean_name <- gsub(' ', '', gp_att_table[[covariate]]$name) clean_coefs <- sub("^(.*)[.].*", "\\1", names(coef(mgcv_model))) coef_indices <- which( clean_coefs %in% clean_name & !grepl( paste0(gp_att_table[[covariate]]$name, ':'), names(coef(mgcv_model)), fixed = TRUE ) == TRUE ) gp_att_table[[covariate]]$first_coef <- min(coef_indices) gp_att_table[[covariate]]$last_coef <- max(coef_indices) gp_names <- clean_gpnames(gp_att_table[[covariate]]$name) gp_stan_lines <- paste0( gp_stan_lines, paste0( 'array[', gp_att_table[[covariate]]$k, '] int b_idx_', gp_names, '; // gp basis coefficient indices\n' ) ) gp_idx_data <- list(coef_indices) names(gp_idx_data) <- paste0('b_idx_', gp_names) model_data <- append(model_data, gp_idx_data) } # Add the GP attribute table and mock brmsfit object to the mgcv_model attr(mgcv_model, 'gp_att_table') <- gp_att_table attr(mgcv_model, 'brms_mock') <- brms_mock # Assign GP labels to smooths gp_assign <- data.frame( label = unlist(purrr::map(gp_att_table, 'name')), first.para = unlist(purrr::map(gp_att_table, 'first_coef')), last.para = unlist(purrr::map(gp_att_table, 'last_coef')), by = unlist(purrr::map(gp_att_table, 'by')) ) for (i in seq_along(mgcv_model$smooth)) { if ( mgcv_model$smooth[[i]]$label %in% gsub('gp(', 's(', gsub(' ', '', gp_assign$label[i]), fixed = TRUE) || mgcv_model$smooth[[i]]$label %in% gsub('gp(', 'ti(', gsub(' ', '', gp_assign$label[i]), fixed = TRUE) & mgcv_model$smooth[[i]]$first.para %in% gp_assign$first.para ) { mgcv_model$smooth[[i]]$gp_term <- TRUE class(mgcv_model$smooth[[i]]) <- c( class(mgcv_model$smooth[[i]])[1], 'hilbert.smooth', 'mgcv.smooth' ) } else { mgcv_model$smooth[[i]]$gp_term <- FALSE } } # Update smoothing parameter names and return if (!missing(rho_names)) { gp_names <- unlist(purrr::map(gp_att_table, 'name')) gp_names_new <- vector() for (i in seq_along(gp_names)) { if (any(grepl(',', gp_names[i]))) { gp_names_new[i] <- gsub( ' ', '', gsub('gp(', 'ti(', gp_names[i], fixed = TRUE) ) } else { gp_names_new[i] <- gsub('gp(', 's(', gp_names[i], fixed = TRUE) } } rhos_change <- list() for (i in seq_along(gp_names_new)) { rhos_change[[i]] <- grep(gp_names_new[i], rho_names, fixed = TRUE) } rho_names[c(unique(unlist(rhos_change)))] <- gsub( 's\\(|ti\\(', 'gp(', rho_names[c(unique(unlist(rhos_change)))] ) } else { rho_names <- NULL } # Return return(list( model_data = model_data, mgcv_model = mgcv_model, gp_stan_lines = gp_stan_lines, gp_att_table = gp_att_table, rho_names )) } #' Reduce the size of the brmsfit object #' @noRd trim_mockbrms = function(brms_mock) { brms_mock$opencl <- NULL brms_mock$data.name <- NULL brms_mock$algorithm <- NULL brms_mock$backend <- NULL brms_mock$stan_args <- NULL brms_mock$model <- NULL brms_mock$stan_funs <- NULL brms_mock$threads <- NULL brms_mock$prior <- NULL brms_mock$family <- NULL brms_mock$save_pars <- NULL brms_mock } #' Extract eigenfunctions for gp() terms and pad with zeros if necessary #' @noRd eigenfunc_list = function(stan_data, mock_df, by = NA, level = NA) { eigenfuncs <- stan_data[which( grepl('Xgp_', names(stan_data), fixed = TRUE) & !grepl('_old', names(stan_data), fixed = TRUE) & !grepl('_prior', names(stan_data), fixed = TRUE) )] # We need to pad the eigenfunctions with zeros # for the observations where the by is a different level; padded_eigenfuncs <- lapply(seq_along(eigenfuncs), function(x) { if (!is.na(by[x])) { if (!is.na(level[x])) { sorted_by <- mock_df[[by[x]]] full_eigens <- matrix( 0, nrow = length(sorted_by), ncol = NCOL(eigenfuncs[[x]]) ) full_eigens[ (seq_along(sorted_by))[ sorted_by == level[x] ], ] <- eigenfuncs[[x]] } else { # Numeric by variables should be multiplied by the # spectral eigenfunctions full_eigens <- eigenfuncs[[x]] * mock_df[[by[x]]] } } else { full_eigens <- eigenfuncs[[x]] } full_eigens }) padded_eigenfuncs } #' Extract eigenvalues for gp() terms #' @noRd eigenval_list = function(stan_data) { stan_data[which( grepl('slambda_', names(stan_data), fixed = TRUE) & !grepl('_old', names(stan_data), fixed = TRUE) )] } #' Which terms are gp() terms? #' @noRd which_are_gp = function(formula) { termlabs <- attr(terms(formula, keep.order = TRUE), 'term.labels') return(grep('gp(', termlabs, fixed = TRUE)) } #' Convert gp() terms to s() terms for initial model construction #' @importFrom stats drop.terms #' @noRd gp_to_s <- function(formula, data, family) { # Extract details of gp() terms gp_details <- get_gp_attributes(formula, data, family) termlabs <- attr(terms(formula, keep.order = TRUE), 'term.labels') # Check for offsets as well off_names <- grep( 'offset', rownames(attr(terms.formula(formula), 'factors')), value = TRUE ) if (length(off_names) > 0L) { termlabs <- c(termlabs, off_names) } # Replace the gp() terms with s() for constructing the initial model which_gp <- which_are_gp(formula) response <- rlang::f_lhs(formula) s_terms <- vector() for (i in 1:NROW(gp_details)) { if (!is.na(gp_details$by[i])) { if (is.factor(data[[gp_details$by[i]]])) { # For terms with factor by variables, constraints are in place # we either need one additional # value for k (for unidimensionsal terms) or must use mc = 0 for all # marginals in a ti call (for multidimensional terms) s_terms[i] <- paste0( if (gp_details$dim[i] < 2L) { 's(' } else { 'ti(' }, gp_details$gp_covariates[i], ', by = ', gp_details$by[i], ', k = ', if (gp_details$dim[i] > 1L) { paste0( gp_details$k[i], ', mc = c(', paste(rep(0, gp_details$dim[i]), collapse = ', '), ')' ) } else { gp_details$k[i] + 1 }, ')' ) } else { # No constraints are used when numeric by variables are in smooths, # so number of coefficients will match those from the brms gp s_terms[i] <- paste0( if (gp_details$dim[i] < 2L) { 's(' } else { 'ti(' }, gp_details$gp_covariates[i], ', by = ', gp_details$by[i], ', k = ', gp_details$k[i], ')' ) } } else { # For terms with no by-variable, we either need one additional # value for k (for unidimensionsal terms) or must use mc = 0 for all # marginals in a ti call (for multidimensional terms) s_terms[i] <- paste0( if (gp_details$dim[i] < 2L) { 's(' } else { 'ti(' }, gp_details$gp_covariates[i], ', k = ', if (gp_details$dim[i] > 1L) { paste0( gp_details$k[i], ', mc = c(', paste(rep(0, gp_details$dim[i]), collapse = ', '), ')' ) } else { gp_details$k[i] + 1 }, ')' ) } termlabs[which_gp[i]] <- s_terms[i] } newformula <- reformulate(termlabs, rlang::f_lhs(formula)) attr(newformula, '.Environment') <- attr(formula, '.Environment') return(newformula) } #' Store attributes of the gp terms #' @importFrom rlang parse_expr #' @noRd get_gp_attributes = function(formula, data, family = gaussian()) { gp_terms <- rownames(attr(terms(formula), 'factors'))[ grep('gp(', rownames(attr(terms(formula), 'factors')), fixed = TRUE) ] # Term details and default priors gp_attributes <- lapply(seq_along(gp_terms), function(x) { eval(rlang::parse_expr(gp_terms[x])) }) gp_isos <- unlist(purrr::map(gp_attributes, 'iso'), use.names = FALSE) gp_kernels <- unlist(purrr::map(gp_attributes, 'cov'), use.names = FALSE) gp_cmcs <- unlist(purrr::map(gp_attributes, 'cmc'), use.names = FALSE) if (any(gp_cmcs == FALSE)) { rlang::warn( paste0( "gp effects in mvgam cannot yet handle contrast coding\n", "resetting all instances of 'cmc = FALSE' to 'cmc = TRUE'" ), .frequency = "once", .frequency_id = 'gp_cmcs' ) } gp_grs <- unlist(purrr::map(gp_attributes, 'gr'), use.names = FALSE) if (any(gp_grs == TRUE)) { rlang::warn( paste0( "gp effects in mvgam cannot yet handle autogrouping\n", "resetting all instances of 'gr = TRUE' to 'gr = FALSE'" ), .frequency = "once", .frequency_id = 'gp_grs' ) } newgp_terms <- unlist( lapply(seq_along(gp_terms), function(x) { lbl <- paste0( 'gp(', paste(gp_attributes[[x]]$term, collapse = ', '), if (gp_attributes[[x]]$by != 'NA') { paste0(', by = ', gp_attributes[[x]]$by) } else { NULL }, ', k = ', gp_attributes[[x]]$k, ', cov = "', gp_attributes[[x]]$cov, '", iso = ', gp_attributes[[x]]$iso, ', scale = ', gp_attributes[[x]]$scale, ', c = ', gp_attributes[[x]]$c[1], ', gr = FALSE, cmc = TRUE)' ) }), use.names = FALSE ) gp_formula <- reformulate(newgp_terms, rlang::f_lhs(formula)) gp_def_priors <- do.call( rbind, lapply(seq_along(gp_terms), function(x) { def_gp_prior <- suppressWarnings(brms::get_prior( reformulate(newgp_terms[x], rlang::f_lhs(formula)), family = family_to_brmsfam(family), data = data )) def_gp_prior <- def_gp_prior[def_gp_prior$prior != '', ] def_rho <- def_gp_prior$prior[which(def_gp_prior$class == 'lscale')] def_alpha <- def_gp_prior$prior[min(which(def_gp_prior$class == 'sdgp'))] if (def_alpha == '') { def_alpha <- 'student_t(3, 0, 2.5);' } if (length(def_rho) > 1L) { def_rho_1 <- def_rho[1] def_rho_2 <- def_rho[2] out <- data.frame( def_rho = def_rho_1, def_rho_2 = def_rho_2, def_rho_3 = NA, def_rho_4 = NA, def_alpha = def_alpha ) if (length(def_rho) > 2L) { out$def_rho_3 <- def_rho[3] } if (length(def_rho) > 3L) out$def_rho_4 <- def_rho[4] } else { out <- data.frame( def_rho = def_rho, def_rho_2 = NA, def_rho_3 = NA, def_rho_4 = NA, def_alpha = def_alpha ) } out }) ) # Extract information necessary to construct the GP terms gp_terms <- purrr::map(gp_attributes, 'term') gp_dims <- unlist(lapply(gp_terms, length), use.names = FALSE) gp_covariates <- unlist( lapply(gp_terms, function(x) { paste(x, collapse = ', ') }), use.names = FALSE ) k <- unlist(purrr::map(gp_attributes, 'k')) if (any(is.na(k))) { stop('argument "k" must be supplied for any gp() terms', call. = FALSE) } # No longer will need boundary or scale information as # brms will handle this internally by <- unlist(purrr::map(gp_attributes, 'by'), use.names = FALSE) if (any(by == 'NA')) { by[by == 'NA'] <- NA } ret_dat <- data.frame( gp_covariates, dim = gp_dims, kernel = gp_kernels, iso = gp_isos, k = k, by, level = NA, def_alpha = gp_def_priors$def_alpha, def_rho = gp_def_priors$def_rho, def_rho_2 = gp_def_priors$def_rho_2, def_rho_3 = gp_def_priors$def_rho_3, def_rho_4 = gp_def_priors$def_rho_4 ) attr(ret_dat, 'gp_formula') <- gp_formula # Return as a data.frame return(ret_dat) } #' Clean GP names so no illegal characters are used in Stan code #' @noRd clean_gpnames = function(gp_names) { gp_names_clean <- gsub(' ', '_', gp_names, fixed = TRUE) gp_names_clean <- gsub('(', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(')', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(',', 'by', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(':', 'by', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub('.', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(']', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub('[', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(';', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub(':', '_', gp_names_clean, fixed = TRUE) gp_names_clean <- gsub("'", "", gp_names_clean, fixed = TRUE) gp_names_clean <- gsub("\"", "", gp_names_clean, fixed = TRUE) gp_names_clean <- gsub("%", "percent", gp_names_clean, fixed = TRUE) gp_names_clean <- gsub("[.]+", "_", gp_names_clean, fixed = TRUE) gp_names_clean <- gsub("'", "", gp_names_clean, fixed = TRUE) #gp_names_clean <- gsub("’", "", gp_names_clean, fixed = TRUE) gp_names_clean } #' Update a Stan file with GP information #' @noRd add_gp_model_file = function(model_file, model_data, mgcv_model, gp_additions) { rho_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho'), use.names = FALSE ) rho_2_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho_2'), use.names = FALSE ) rho_3_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho_3'), use.names = FALSE ) rho_4_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_rho_4'), use.names = FALSE ) alpha_priors <- unlist( purrr::map(gp_additions$gp_att_table, 'def_alpha'), use.names = FALSE ) # Add data lines model_file[grep( 'int ytimes[n, n_series];', model_file, fixed = TRUE )] <- paste0( model_file[grep( 'int ytimes[n, n_series];', model_file, fixed = TRUE )], '\n', gp_additions$gp_stan_lines ) model_file <- readLines(textConnection(model_file), n = -1) # Replace the multi_normal_prec lines with the relevant spd function gp_kernels <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'kernel'), use.names = FALSE ) gp_names <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'name'), use.names = FALSE ) gp_isos <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'iso'), use.names = FALSE ) gp_dims <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'dim'), use.names = FALSE ) orig_names <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'orig_name'), use.names = FALSE ) gp_names_clean <- clean_gpnames(gp_names) s_to_remove <- list() for (i in seq_along(gp_names)) { i_rho_priors <- c( rho_priors[i], rho_2_priors[i], rho_3_priors[i], rho_4_priors[i] ) i_rho_priors <- i_rho_priors[!is.na(i_rho_priors)] s_name <- gsub(' ', '', orig_names[i]) to_replace <- grep( paste0('// prior for ', s_name, '...'), model_file, fixed = TRUE ) + 1 pattern <- "S\\s*(.*?)\\s*\\[" result <- regmatches( model_file[to_replace], regexec(pattern, model_file[to_replace]) )[[1]] s_to_remove[[i]] <- unique(unlist(regmatches( result, gregexpr("[[:digit:]]+", result) ))) model_file[grep( paste0('// prior for ', s_name, '...'), model_file, fixed = TRUE )] <- gsub( 's\\(|ti\\(', 'gp(', model_file[grep( paste0('// prior for ', s_name, '...'), model_file, fixed = TRUE )] ) rho_prior_lines <- paste( paste0( 'rho_', gp_names_clean[i], if (!gp_isos[i]) { '[1]' } else { NULL }, '[', if (gp_isos[i]) { 1 } else { seq(1:gp_dims[i]) }, ']', ' ~ ', if (gp_isos[i]) { rho_priors[i] } else { i_rho_priors }, ';\n' ), collapse = '\n' ) model_file[to_replace] <- paste0( 'z_', gp_names_clean[i], ' ~ std_normal();\n', 'alpha_', gp_names_clean[i], ' ~ ', alpha_priors[i], ';\n', rho_prior_lines, 'b_raw[b_idx_', gp_names_clean[i], '] ~ std_normal();\n' ) } b_line <- max(which( grepl('b[', model_file, fixed = TRUE) & grepl('] =', model_file, fixed = TRUE) )) b_edits <- paste0( 'b[b_idx_', gp_names_clean, add_gp_spd_calls(gp_kernels), gp_names_clean, ', alpha_', gp_names_clean, ', rho_', gp_names_clean, '[1])) .* z_', gp_names_clean, ';', collapse = '\n' ) model_file[b_line] <- paste0(model_file[b_line], '\n', b_edits) model_file <- readLines(textConnection(model_file), n = -1) # Remove un-needed penalty matrices from the model file and the # model data for (i in seq_along(unique(unlist(s_to_remove)))) { model_data[[paste0('S', unique(unlist(s_to_remove))[i])]] <- NULL model_file <- model_file[ -grep( paste0( '\\bmgcv smooth penalty matrix S', unique(unlist(s_to_remove))[i], '\\b' ), model_file ) ] } # Add alpha, rho and z lines in parameters and model blocks alpha_names <- paste( paste0('real alpha_', gp_names_clean, ';'), collapse = '\n' ) rho_names <- paste( paste0( 'array[1] vector[', ifelse(gp_isos, 1, gp_dims), '] rho_', gp_names_clean, ';' ), collapse = '\n' ) z_names <- paste( paste0('vector[k_', gp_names_clean, '] z_', gp_names_clean, ';'), collapse = '\n' ) model_file[grep("vector[num_basis] b_raw;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b_raw;\n\n", '// gp term sd parameters\n', alpha_names, '\n\n// gp term length scale parameters\n', rho_names, '\n\n// gp term latent variables\n', z_names, '\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Add spd_ functions from brms code kerns_add <- rev(gp_kernels) for (i in seq_along(kerns_add)) { model_file <- add_gp_spd_funs(model_file, kerns_add[i]) } return(list(model_file = model_file, model_data = model_data)) } #' Add GP SPD functions to a stan model file #' @noRd add_gp_spd_calls = function(kernels) { kern_calls <- vector(length = length(kernels)) for (i in seq_along(kern_calls)) { if (kernels[i] == 'exp_quad') { kern_calls[i] <- '] = sqrt(spd_gp_exp_quad(l_' } if (kernels[i] == 'exponential') { kern_calls[i] <- '] = sqrt(spd_gp_exponential(l_' } if (kernels[i] == 'matern32') { kern_calls[i] <- '] = sqrt(spd_gp_matern32(l_' } if (kernels[i] == 'matern52') { kern_calls[i] <- '] = sqrt(spd_gp_matern52(l_' } } return(kern_calls) } #' @noRd add_gp_spd_funs = function(model_file, kernel) { if (kernel == 'exp_quad') { if ( !any(grepl( '/* Spectral density of a squared exponential Gaussian process', model_file, fixed = TRUE )) ) { fun_lines <- paste0( '/* Spectral density of a squared exponential Gaussian process\n', '* Args:\n', '* x: array of numeric values of dimension NB x D\n', '* sdgp: marginal SD parameter\n', '* lscale: vector of length-scale parameters\n', '* Returns:\n', "* numeric vector of length NB of the SPD evaluated at 'x'\n", '*/\n', 'vector spd_gp_exp_quad(data array[] vector x, real sdgp, vector lscale) {\n', 'int NB = dims(x)[1];\n', 'int D = dims(x)[2];\n', 'int Dls = rows(lscale);\n', 'real constant = square(sdgp) * sqrt(2 * pi())^D;\n', 'vector[NB] out;\n', 'if (Dls == 1) {\n', '// one dimensional or isotropic GP\n', 'real neg_half_lscale2 = -0.5 * square(lscale[1]);\n', 'constant = constant * lscale[1]^D;\n', 'for (m in 1:NB) {\n', 'out[m] = constant * exp(neg_half_lscale2 * dot_self(x[m]));\n', '}\n', '} else {\n', '// multi-dimensional non-isotropic GP\n', 'vector[Dls] neg_half_lscale2 = -0.5 * square(lscale);\n', 'constant = constant * prod(lscale);\n', 'for (m in 1:NB) {\n', 'out[m] = constant * exp(dot_product(neg_half_lscale2, square(x[m])));\n', '}\n', '}\n', 'return out;\n', '}' ) } else { fun_lines <- NULL } } if (kernel %in% c('exponential', 'matern12')) { if ( !any(grepl( '/* Spectral density of an exponential Gaussian process', model_file, fixed = TRUE )) ) { fun_lines <- paste0( '/* Spectral density of an exponential Gaussian process\n', '* also known as the Matern 1/2 kernel\n', '* Args:\n', '* x: array of numeric values of dimension NB x D\n', '* sdgp: marginal SD parameter\n', '* lscale: vector of length-scale parameters\n', '* Returns:\n', "* numeric vector of length NB of the SPD evaluated at 'x'\n", '*/\n', 'vector spd_gp_exponential(data array[] vector x, real sdgp, vector lscale) {\n', 'int NB = dims(x)[1];\n', 'int D = dims(x)[2];\n', 'int Dls = rows(lscale);\n', 'real constant = square(sdgp) *\n', '(2^D * pi()^(D / 2.0) * tgamma((D + 1.0) / 2)) / sqrt(pi());\n', 'real expo = -(D + 1.0) / 2;\n', 'vector[NB] out;\n', 'if (Dls == 1) {\n', '// one dimensional or isotropic GP\n', 'real lscale2 = square(lscale[1]);\n', 'constant = constant * lscale[1]^D;\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (1 + lscale2 * dot_self(x[m]))^expo;\n', '}\n', '} else {\n', '// multi-dimensional non-isotropic GP\n', 'vector[Dls] lscale2 = square(lscale);\n', 'constant = constant * prod(lscale);\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (1 + dot_product(lscale2, square(x[m])))^expo;\n', '}\n', '}\n', 'return out;\n', '}' ) } else { fun_lines <- NULL } } if (kernel == 'matern32') { if ( !any(grepl( '/* Spectral density of a Matern 3/2 Gaussian process', model_file, fixed = TRUE )) ) { fun_lines <- paste0( '/* Spectral density of a Matern 3/2 Gaussian process\n', '* Args:\n', '* x: array of numeric values of dimension NB x D\n', '* sdgp: marginal SD parameter\n', '* lscale: vector of length-scale parameters\n', '* Returns:\n', "* numeric vector of length NB of the SPD evaluated at 'x'\n", '*/\n', 'vector spd_gp_matern32(data array[] vector x, real sdgp, vector lscale) {\n', 'int NB = dims(x)[1];\n', 'int D = dims(x)[2];\n', 'int Dls = rows(lscale);\n', 'real constant = square(sdgp) *\n', '(2^D * pi()^(D / 2.0) * tgamma((D + 3.0) / 2) * 3^(3.0 / 2)) /\n', '(0.5 * sqrt(pi()));\n', 'real expo = -(D + 3.0) / 2;\n', 'vector[NB] out;\n', 'if (Dls == 1) {\n', '// one dimensional or isotropic GP\n', 'real lscale2 = square(lscale[1]);\n', 'constant = constant * lscale[1]^D;\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (3 + lscale2 * dot_self(x[m]))^expo;\n', '}\n', '} else {\n', '// multi-dimensional non-isotropic GP\n', 'vector[Dls] lscale2 = square(lscale);\n', 'constant = constant * prod(lscale);\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (3 + dot_product(lscale2, square(x[m])))^expo;\n', '}\n', '}\n', 'return out;\n', '}' ) } else { fun_lines <- NULL } } if (kernel == 'matern52') { if ( !any(grepl( '/* Spectral density of a Matern 5/2 Gaussian process', model_file, fixed = TRUE )) ) { fun_lines <- paste0( '/* Spectral density of a Matern 5/2 Gaussian process\n', '* Args:\n', '* x: array of numeric values of dimension NB x D\n', '* sdgp: marginal SD parameter\n', '* lscale: vector of length-scale parameters\n', '* Returns:\n', "* numeric vector of length NB of the SPD evaluated at 'x'\n", '*/\n', 'vector spd_gp_matern52(data array[] vector x, real sdgp, vector lscale) {\n', 'int NB = dims(x)[1];\n', 'int D = dims(x)[2];\n', 'int Dls = rows(lscale);\n', 'real constant = square(sdgp) *\n', '(2^D * pi()^(D / 2.0) * tgamma((D + 5.0) / 2) * 5^(5.0 / 2)) /\n', '(0.75 * sqrt(pi()));\n', 'real expo = -(D + 5.0) / 2;\n', 'vector[NB] out;\n', 'if (Dls == 1) {\n', '// one dimensional or isotropic GP\n', 'real lscale2 = square(lscale[1]);\n', 'constant = constant * lscale[1]^D;\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (5 + lscale2 * dot_self(x[m]))^expo;\n', '}\n', '} else {\n', '// multi-dimensional non-isotropic GP\n', 'vector[Dls] lscale2 = square(lscale);\n', 'constant = constant * prod(lscale);\n', 'for (m in 1:NB) {\n', 'out[m] = constant * (5 + dot_product(lscale2, square(x[m])))^expo;\n', '}\n', '}\n', 'return out;\n', '}' ) } else { fun_lines <- NULL } } if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0('functions {\n', fun_lines) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', fun_lines, '\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } #' Evaluate Laplacian eigenfunction for a given GP basis function #' @noRd phi = function(boundary, m, centred_covariate) { 1 / sqrt(boundary) * sin((m * pi) / (2 * boundary) * (centred_covariate + boundary)) } #' Evaluate eigenvalues for a given GP basis function #' @noRd lambda = function(boundary, m) { ((m * pi) / (2 * boundary))^2 } #' Spectral density squared exponential Gaussian Process kernel #' @noRd spd = function(alpha_gp, rho_gp, eigenvalues) { (alpha_gp^2) * sqrt(2 * pi) * rho_gp * exp(-0.5 * (rho_gp^2) * (eigenvalues^2)) } #' @noRd sim_hilbert_gp = function( alpha_gp, rho_gp, b_gp, last_trends, fc_times, train_times, mean_train_times ) { num_gp_basis <- length(b_gp) # Get vector of eigenvalues of covariance matrix eigenvalues <- vector() for (m in 1:num_gp_basis) { eigenvalues[m] <- lambda( boundary = (5.0 / 4) * (max(train_times) - min(train_times)), m = m ) } # Get vector of eigenfunctions eigenfunctions <- matrix(NA, nrow = length(fc_times), ncol = num_gp_basis) for (m in 1:num_gp_basis) { eigenfunctions[, m] <- phi( boundary = (5.0 / 4) * (max(train_times) - min(train_times)), m = m, centred_covariate = fc_times - mean_train_times ) } # Compute diagonal of covariance matrix diag_SPD <- sqrt(spd(alpha_gp = alpha_gp, rho_gp = rho_gp, sqrt(eigenvalues))) # Compute GP trend forecast as.vector((diag_SPD * b_gp) %*% t(eigenfunctions)) } #### Old gp() prepping functions; these are now redundant because # brms is used to evaluate gp() effects and produce the relevant # eigenfunctions / eigenvalues, but keeping the functions here for # now in case they are needed for later work #### #' #' Compute the mth eigen function of an approximate GP #' #' Credit to Paul Burkner from brms: https://github.com/paul-buerkner/brms/R/formula-gp.R#L289 #' #' @noRd #' eigen_fun_cov_exp_quad <- function(x, m, L) { #' x <- as.matrix(x) #' D <- ncol(x) #' stopifnot(length(m) == D, length(L) == D) #' out <- vector("list", D) #' for (i in seq_cols(x)) { #' out[[i]] <- 1 / sqrt(L[i]) * #' sin((m[i] * pi) / (2 * L[i]) * (x[, i] + L[i])) #' } #' Reduce("*", out) #' } #' #' Compute squared differences #' #' Credit to Paul Burkner from brms: https://github.com/paul-buerkner/brms/R/formula-gp.R#L241 #' #' @param x vector or matrix #' #' @param x_new optional vector of matrix with the same ncol as x #' #' @return an nrow(x) times nrow(x_new) matrix #' #' @details if matrices are passed results are summed over the columns #' #' @noRd #' diff_quad <- function(x, x_new = NULL) { #' x <- as.matrix(x) #' if (is.null(x_new)) { #' x_new <- x #' } else { #' x_new <- as.matrix(x_new) #' } #' .diff_quad <- function(x1, x2) (x1 - x2)^2 #' out <- 0 #' for (i in seq_cols(x)) { #' out <- out + outer(x[, i], x_new[, i], .diff_quad) #' } #' out #' } #' #' Extended range of input data for which predictions should be made #' #' Credit to Paul Burkner from brms: https://github.com/paul-buerkner/brms/R/formula-gp.R#L301 #' #' @noRd #' choose_L <- function(x, c) { #' if (!length(x)) { #' range <- 1 #' } else { #' range <- max(1, max(x, na.rm = TRUE) - min(x, na.rm = TRUE)) #' } #' c * range #' } #' #' Mean-center and scale the particular covariate of interest #' #' so that the maximum Euclidean distance between any two points is 1 #' #' @noRd #' scale_cov <- function(data, covariate, by, level, #' mean, max_dist){ #' Xgp <- data[[covariate]] #' if(!is.na(by) & #' !is.na(level)){ #' Xgp <- data[[covariate]][data[[by]] == level] #' } #' #' # Compute max Euclidean distance if not supplied #' if(is.na(max_dist)){ #' Xgp_max_dist <- sqrt(max(diff_quad(Xgp))) #' } else { #' Xgp_max_dist <- max_dist #' } #' #' # Scale #' Xgp <- Xgp / Xgp_max_dist #' #' # Compute mean if not supplied (after scaling) #' if(is.na(mean)){ #' Xgp_mean <- mean(Xgp, na.rm = TRUE) #' } else { #' Xgp_mean <- mean #' } #' #' # Center #' Xgp <- Xgp - Xgp_mean #' #' return(list(Xgp = Xgp, #' Xgp_mean = Xgp_mean, #' Xgp_max_dist = Xgp_max_dist)) #' } #' #' #' Prep GP eigenfunctions #' #' @noRd #' prep_eigenfunctions = function(data, #' covariate, #' by = NA, #' level = NA, #' k, #' boundary, #' mean = NA, #' max_dist = NA, #' scale = TRUE, #' L, #' initial_setup = FALSE){ #' #' # Extract and scale covariate (scale set to FALSE if this is a prediction #' # step so that we can scale by the original training covariate values supplied #' # in mean and max_dist) #' covariate_cent <- scale_cov(data = data, #' covariate = covariate, #' by = by, #' level = level, #' mean = mean, #' max_dist = max_dist)$Xgp #' #' # Construct matrix of eigenfunctions #' eigenfunctions <- matrix(NA, nrow = length(covariate_cent), #' ncol = k) #' if(missing(L)){ #' L <- choose_L(covariate_cent, boundary) #' } #' #' for(m in 1:k){ #' eigenfunctions[, m] <- eigen_fun_cov_exp_quad(x = matrix(covariate_cent), #' m = m, #' L = L) #' } #' #' # Multiply eigenfunctions by the 'by' variable if one is supplied #' if(!is.na(by)){ #' if(!is.na(level)){ #' # no multiplying needed as this is a factor by variable, #' # but we need to pad the eigenfunctions with zeros #' # for the observations where the by is a different level; #' # the design matrix is always sorted by time and then by series #' # in mvgam #' if(initial_setup){ #' sorted_by <- data.frame(time = data$time, #' series = data$series, #' byvar = data[[by]]) %>% #' dplyr::arrange(time, series) %>% #' dplyr::pull(byvar) #' } else { #' sorted_by <- data[[by]] #' } #' #' full_eigens <- matrix(0, nrow = length(data[[by]]), #' ncol = NCOL(eigenfunctions)) #' full_eigens[(1:length(data[[by]]))[ #' sorted_by == level],] <- eigenfunctions #' eigenfunctions <- full_eigens #' } else { #' eigenfunctions <- eigenfunctions * data[[by]] #' } #' } #' eigenfunctions #' } #' #' #' Prep Hilbert Basis GP covariates #' #' @noRd #' prep_gp_covariate = function(data, #' response, #' covariate, #' by = NA, #' level = NA, #' scale = TRUE, #' boundary = 5.0/4, #' k = 20, #' family = gaussian()){ #' #' # Get default gp param priors from a call to brms::get_prior() #' def_gp_prior <- suppressWarnings(brms::get_prior(formula(paste0(response, #' ' ~ gp(', covariate, #' ifelse(is.na(by), ', ', #' paste0(', by = ', by, ', ')), #' 'k = ', k, #' ', scale = ', #' scale, #' ', c = ', #' boundary, #' ')')), data = data, #' family = family)) #' def_gp_prior <- def_gp_prior[def_gp_prior$prior != '',] #' def_rho <- def_gp_prior$prior[min(which(def_gp_prior$class == 'lscale'))] #' if(def_rho == ''){ #' def_rho <- 'inv_gamma(1.5, 5);' #' } #' def_alpha <- def_gp_prior$prior[min(which(def_gp_prior$class == 'sdgp'))] #' if(def_alpha == ''){ #' def_alpha<- 'student_t(3, 0, 2.5);' #' } #' #' # Prepare the covariate #' if(scale){ #' max_dist <- NA #' } else { #' max_dist <- 1 #' } #' #' covariate_cent <- scale_cov(data = data, #' covariate = covariate, #' by = by, #' mean = NA, #' max_dist = max_dist, #' level = level) #' #' covariate_mean <- covariate_cent$Xgp_mean #' covariate_max_dist <- covariate_cent$Xgp_max_dist #' covariate_cent <- covariate_cent$Xgp #' #' # Construct vector of eigenvalues for GP covariance matrix; the #' # same eigenvalues are always used in prediction, so we only need to #' # create them when prepping the data. They will need to be included in #' # the Stan data list #' L <- choose_L(covariate_cent, boundary) #' eigenvalues <- vector() #' for(m in 1:k){ #' eigenvalues[m] <- sqrt(lambda(boundary = L, #' m = m)) #' } #' #' # Construct matrix of eigenfunctions; this will change depending on the values #' # of the covariate, so it needs to be computed and included as data but also needs #' # to be computed to make predictions #' eigenfunctions <- prep_eigenfunctions(data = data, #' covariate = covariate, #' by = by, #' level = level, #' L = L, #' k = k, #' boundary = boundary, #' mean = covariate_mean, #' max_dist = covariate_max_dist, #' scale = scale, #' initial_setup = TRUE) #' #' # Make attributes table using a cleaned version of the covariate #' # name to ensure there are no illegal characters in the Stan code #' byname <- ifelse(is.na(by), '', paste0(':', by)) #' covariate_name <- paste0('gp(', covariate, ')', byname) #' if(!is.na(level)){ #' covariate_name <- paste0(covariate_name, level) #' } #' att_table <- list(effect = 'gp', #' name = covariate_name, #' covariate = covariate, #' by = by, #' level = level, #' k = k, #' boundary = boundary, #' L = L, #' scale = scale, #' def_rho = def_rho, #' def_alpha = def_alpha, #' mean = covariate_mean, #' max_dist = covariate_max_dist, #' eigenvalues = eigenvalues) #' #' # Items to add to Stan data #' # Number of basis functions #' covariate_name <- clean_gpnames(covariate_name) #' data_lines <- paste0('int k_', covariate_name, #' '; // basis functions for approximate gp\n') #' append_dat <- list(k = k) #' names(append_dat) <- paste0('k_', covariate_name, '') #' #' # Approximate GP eigenvalues #' data_lines <- paste0(data_lines, paste0( #' 'vector[', #' 'k_', covariate_name, #' '] l_', covariate_name, '; // approximate gp eigenvalues\n'), #' collapse = '\n') #' append_dat2 <- list(slambda = eigenvalues) #' names(append_dat2) <- paste0('l_', covariate_name, '') #' append_dat <- append(append_dat, append_dat2) #' #' # Return necessary objects in a list #' list(att_table = att_table, #' data_lines = data_lines, #' data_append = append_dat, #' eigenfunctions = eigenfunctions) #' } ================================================ FILE: R/gratia_methods.R ================================================ #### Functions to ensure gratia methods work with mvgam, using the Enhance functionality # in the Description #### # Add eval_smooth and draw methods to gratia namespace # on load .onLoad <- function(libname, pkgname) { if (requireNamespace("gratia", quietly = TRUE)) { registerS3method( "eval_smooth", "moi.smooth", eval_smoothDotmoiDotsmooth, envir = asNamespace("gratia") ) registerS3method( "eval_smooth", "mod.smooth", eval_smoothDotmodDotsmooth, envir = asNamespace("gratia") ) registerS3method( "eval_smooth", "hilbert.smooth", eval_smoothDothilbertDotsmooth, envir = asNamespace("gratia") ) registerS3method( "draw", "mvgam", drawDotmvgam, envir = asNamespace("gratia") ) } } #' Enhance post-processing of \pkg{mvgam} models using \pkg{gratia} functionality #' #' These evaluation and plotting functions exist to allow some popular `gratia` #' methods to work with `mvgam` or `jsdgam` models #' #' @name gratia_mvgam_enhancements #' #' @param object a fitted mvgam, the result of a call to [mvgam()] #' #' @param model a fitted `mgcv` model of clas `gam` or `bam` #' #' @param data a data frame of covariate values at which to evaluate the #' model's smooth functions #' #' @param smooth a smooth object of class `"gp.smooth"` (returned from a model using either the #' `dynamic()` function or the `gp()` function) or of class `"moi.smooth"` or `"mod.smooth"` #' (returned from a model using the 'moi' or 'mod' basis) #' #' @param trend_effects logical specifying whether smooth terms from the `trend_formula` should #' be drawn. If `FALSE`, only terms from the observation formula are drawn. If `TRUE`, only #' terms from the `trend_formula` are drawn #' #' @param select character, logical, or numeric; which smooths to plot. If #' `NULL`, the default, then all model smooths are drawn. #' Character `select` matches the labels for smooths #' as shown for example in the output from `summary(object)`. Logical #' `select` operates as per numeric `select` in the order that smooths are #' stored #' #' @param parametric logical; plot parametric terms also? Note that `select` is #' used for selecting which smooths to plot. The `terms` argument is used to #' select which parametric effects are plotted. The default, as with #' [mgcv::plot.gam()], is to not draw parametric effects #' #' @param terms character; which model parametric terms should be drawn? The #' Default of `NULL` will plot all parametric terms that can be drawn. #' @param residuals currently ignored for `mvgam` models #' #' @param scales character; should all univariate smooths be plotted with the #' same y-axis scale? If `scales = "free"`, the default, each univariate #' smooth has its own y-axis scale. If `scales = "fixed"`, a common y axis #' scale is used for all univariate smooths. #' #' Currently does not affect the y-axis scale of plots of the parametric #' terms #' #' @param constant numeric; a constant to add to the estimated values of the #' smooth. `constant`, if supplied, will be added to the estimated value #' before the confidence band is computed #' #' @param fun function; a function that will be applied to the estimated values #' and confidence interval before plotting. Can be a function or the name of a #' function. Function `fun` will be applied after adding any `constant`, if #' provided #' #' @param ci_level numeric between 0 and 1; the coverage of credible interval. #' @param n numeric; the number of points over the range of the covariate at #' which to evaluate the smooth #' #' @param n_3d,n_4d numeric; the number of points over the range of last #' covariate in a 3D or 4D smooth. The default is `NULL` which achieves the #' standard behaviour of using `n` points over the range of all covariate, #' resulting in `n^d` evaluation points, where `d` is the dimension of the #' smooth. For `d > 2` this can result in very many evaluation points and slow #' performance. For smooths of `d > 4`, the value of `n_4d` will be used for #' all dimensions `> 4`, unless this is `NULL`, in which case the default #' behaviour (using `n` for all dimensions) will be observed #' #' @param unconditional ignored for `mvgam` models as all appropriate #' uncertainties are already included in the posterior estimates #' #' @param overall_uncertainty ignored for `mvgam` models as all appropriate #' uncertainties are already included in the posterior estimates #' #' @param dist numeric; if greater than 0, this is used to determine when #' a location is too far from data to be plotted when plotting 2-D smooths. #' The data are scaled into the unit square before deciding what to exclude, #' and `dist` is a distance within the unit square. See #' [mgcv::exclude.too.far()] for further details #' #' @param rug logical; draw a rug plot at the bottom of each plot for 1-D #' smooths or plot locations of data for higher dimensions. #' @param contour logical; should contours be draw on the plot using #' [ggplot2::geom_contour()] #' #' @param grouped_by logical; should factor by smooths be drawn as one panel #' per level of the factor (`FALSE`, the default), or should the individual #' smooths be combined into a single panel containing all levels (`TRUE`)? #' #' @param ci_alpha numeric; alpha transparency for confidence or simultaneous #' interval #' #' @param ci_col colour specification for the confidence/credible intervals #' band. Affects the fill of the interval #' #' @param smooth_col colour specification for the smooth line #' #' @param resid_col colour specification for residual points. Ignored #' #' @param contour_col colour specification for contour lines #' #' @param n_contour numeric; the number of contour bins. Will result in #' `n_contour - 1` contour lines being drawn. See [ggplot2::geom_contour()] #' #' @param partial_match logical; should smooths be selected by partial matches #' with `select`? If `TRUE`, `select` can only be a single string to match #' against #' #' @param discrete_colour a suitable colour scale to be used when plotting #' discrete variables #' #' @param discrete_fill a suitable fill scale to be used when plotting #' discrete variables. #' @param continuous_colour a suitable colour scale to be used when plotting #' continuous variables #' #' @param continuous_fill a suitable fill scale to be used when plotting #' continuous variables #' #' @param position Position adjustment, either as a string, or the result of a #' call to a position adjustment function #' #' @param angle numeric; the angle at which the x axis tick labels are to be #' drawn passed to the `angle` argument of [ggplot2::guide_axis()] #' #' @param ncol,nrow numeric; the numbers of rows and columns over which to #' spread the plots #' #' @param guides character; one of `"keep"` (the default), `"collect"`, or #' `"auto"`. Passed to [patchwork::plot_layout()] #' #' @param widths,heights The relative widths and heights of each column and #' row in the grid. Will get repeated to match the dimensions of the grid. If #' there is more than 1 plot and `widths = NULL`, the value of `widths` will #' be set internally to `widths = 1` to accommodate plots of smooths that #' use a fixed aspect ratio.= #' #' @param crs the coordinate reference system (CRS) to use for the plot. All #' data will be projected into this CRS. See [ggplot2::coord_sf()] for #' details #' #' @param default_crs the coordinate reference system (CRS) to use for the #' non-sf layers in the plot. If left at the default `NULL`, the CRS used is #' 4326 (WGS84), which is appropriate for spline-on-the-sphere smooths, which #' are parameterized in terms of latitude and longitude as coordinates. See #' [ggplot2::coord_sf()] for more details #' #' @param lims_method character; affects how the axis limits are determined. See #' [ggplot2::coord_sf()]. Be careful; in testing of some examples, changing #' this to `"orthogonal"` for example with the chlorophyll-a example from #' Simon Wood's GAM book quickly used up all the RAM in my test system and the #' OS killed R. This could be incorrect usage on my part; right now the grid #' of points at which SOS smooths are evaluated (if not supplied by the user) #' can produce invalid coordinates for the corners of tiles as the grid is #' generated for tile centres without respect to the spacing of those tiles #' #' @param wrap logical; wrap plots as a patchwork? If \code{FALSE}, a list of #' ggplot objects is returned, 1 per term plotted #' #' @param envir an environment to look up the data within #' #' @param ... additional arguments passed to other methods #' #' @details These methods allow `mvgam` models to be *Enhanced* if users have the `gratia` #' package installed, making available the popular `draw()` function to plot partial effects #' of `mvgam` smooth functions using [ggplot2::ggplot()] utilities #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # Fit a simple GAM and draw partial effects of smooths using 'gratia' #' set.seed(0) #' dat <- mgcv::gamSim( #' eg = 1, #' n = 200, #' scale = 2 #' ) #' #' mod <- mvgam( #' formula = y ~ s(x1, bs = 'moi') + #' te(x0, x2), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' if (require("gratia")) { #' gratia::draw(mod) #' } #' } NULL #' @rdname gratia_mvgam_enhancements #' @aliases draw.mvgam #' @export #' `drawDotmvgam` <- function( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) { if (trend_effects) { if (is.null(object$trend_call)) { stop('no trend_formula exists so there are no trend-level terms to plot') } object$trend_mgcv_model <- relabel_gps(object$trend_mgcv_model) object$trend_mgcv_model$call$data <- NULL object$trend_mgcv_model$cmX <- object$trend_mgcv_model$coefficients sm_plots <- gratia::draw( object = object$trend_mgcv_model, data = data, select = select, parametric = parametric, terms = terms, residuals = FALSE, scales = scales, ci_level = ci_level, n = n, n_3d = n_3d, n_4d = n_4d, unconditional = FALSE, overall_uncertainty = FALSE, constant = constant, fun = fun, dist = dist, rug = rug, contour = contour, grouped_by = grouped_by, ci_alpha = ci_alpha, ci_col = ci_col, smooth_col = smooth_col, resid_col = "steelblue3", contour_col = contour_col, n_contour = n_contour, partial_match = partial_match, discrete_colour = discrete_colour, discrete_fill = discrete_fill, continuous_colour = continuous_colour, continuous_fill = continuous_fill, position = position, angle = angle, ncol = ncol, nrow = nrow, guides = guides, widths = widths, heights = heights, crs = crs, default_crs = default_crs, lims_method = lims_method, wrap = wrap, envir = envir, ... ) } else { object$mgcv_model <- relabel_gps(object$mgcv_model) object$mgcv_model$call$data <- NULL object$mgcv_model$cmX <- object$mgcv_model$coefficients sm_plots <- gratia::draw( object = object$mgcv_model, data = data, select = select, parametric = parametric, terms = terms, residuals = FALSE, scales = scales, ci_level = ci_level, n = n, n_3d = n_3d, n_4d = n_4d, unconditional = FALSE, overall_uncertainty = FALSE, constant = constant, fun = fun, dist = dist, rug = rug, contour = contour, grouped_by = grouped_by, ci_alpha = ci_alpha, ci_col = ci_col, smooth_col = smooth_col, resid_col = "steelblue3", contour_col = contour_col, n_contour = n_contour, partial_match = partial_match, discrete_colour = discrete_colour, discrete_fill = discrete_fill, continuous_colour = continuous_colour, continuous_fill = continuous_fill, position = position, angle = angle, ncol = ncol, nrow = nrow, guides = guides, widths = widths, heights = heights, crs = crs, default_crs = default_crs, lims_method = lims_method, wrap = wrap, envir = envir, ... ) } sm_plots } #' @rdname gratia_mvgam_enhancements #' @aliases eval_smooth.hilbert.smooth #' @export eval_smoothDothilbertDotsmooth = function( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) { insight::check_if_installed("gratia") model$cmX <- model$coefficients # deal with data if supplied data <- process_user_data_for_eval( data = data, model = model, n = n, n_3d = n_3d, n_4d = n_4d, id = which_smooth( model, gratia::smooth_label(smooth) ) ) # by variables by_var <- gratia::by_variable(smooth) if (by_var == "NA") { by_var <- NA_character_ } # Compute the gp() eigenfunctions for newdata using the supplied brms_mock object # Requires a dataframe of all relevant variables for the gp effects mock_terms <- brms::brmsterms(attr(model, 'brms_mock')$formula) terms_needed <- unique(all.vars(mock_terms$formula)[-1]) # Only use actual values of those covariates needed for this smooth terms_smooth <- intersect(terms_needed, colnames(data)) newdata_mock <- data.frame(data[[terms_smooth[1]]]) if (length(terms_smooth) > 1L) { for (i in 2:length(terms_smooth)) { newdata_mock <- cbind(newdata_mock, data.frame(data[[terms_smooth[i]]])) } } colnames(newdata_mock) <- terms_smooth newdata_mock$.fake_gp_y <- rnorm(NROW(newdata_mock)) # Fill in other covariates as fixed values from the original data other_terms <- setdiff(terms_needed, colnames(data)) if (length(other_terms) > 0) { newdata_mock <- cbind( newdata_mock, do.call( cbind, lapply(seq_along(other_terms), function(x) { df <- data.frame( var = rep(model$model[[other_terms[x]]][1], NROW(newdata_mock)) ) colnames(df) <- other_terms[x] df }) ) ) } brms_mock_data <- brms::standata( attr(model, 'brms_mock'), newdata = newdata_mock, internal = TRUE ) # Extract GP attributes gp_att_table <- attr(model, 'gp_att_table') bys <- unlist(purrr::map(gp_att_table, 'by'), use.names = FALSE) lvls <- unlist(purrr::map(gp_att_table, 'level'), use.names = FALSE) # Extract eigenfunctions for each gp effect eigenfuncs <- eigenfunc_list( stan_data = brms_mock_data, mock_df = newdata_mock, by = bys, level = lvls ) # Which GP term are we plotting? gp_covariate <- smooth$term level <- ifelse(is.null(smooth$by.level), NA, smooth$by.level) gp_names <- gsub(' ', '', unlist(purrr::map(gp_att_table, 'name'))) if (!is.na(level)) { gp_select <- which( gp_names == smooth$label & unlist(purrr::map(gp_att_table, 'level')) == level ) } else { gp_select <- which( gp_names == smooth$label & which(bys %in% by_var) ) } # Compute eigenfunctions for this GP term X <- eigenfuncs[[gp_select]] # Extract mean coefficients start <- purrr::map(gp_att_table, 'first_coef')[[gp_select]] end <- purrr::map(gp_att_table, 'last_coef')[[gp_select]] betas <- model$coefficients[start:end] fit <- as.vector(X %*% betas) ## want full vcov for component-wise CI V <- model$Vp ## variables for component-wise CIs for smooths column_means <- model[["cmX"]] lcms <- length(column_means) nc <- ncol(V) meanL1 <- smooth[["meanL1"]] eta_idx <- lss_eta_index(model) para.seq <- start:end if (isTRUE(overall_uncertainty) && attr(smooth, "nCons") > 0L) { if (lcms < nc) { column_means <- c(column_means, rep(0, nc - lcms)) } Xcm <- matrix(column_means, nrow = nrow(X), ncol = nc, byrow = TRUE) if (!is.null(meanL1)) { Xcm <- Xcm / meanL1 } Xcm[, para.seq] <- X # only apply the uncertainty from linear predictors of which this smooth # is a part of idx <- vapply( eta_idx, function(i, beta) any(beta %in% i), FUN.VALUE = logical(1L), beta = para.seq ) idx <- unlist(eta_idx[idx]) rs <- rowSums( (Xcm[, idx, drop = FALSE] %*% V[idx, idx, drop = FALSE]) * Xcm[, idx, drop = FALSE] ) } else { rs <- rowSums((X %*% V[para.seq, para.seq, drop = FALSE]) * X) } ## standard error of the estimate se.fit <- sqrt(pmax(0, rs)) # convert to the gratia tidy format label <- smooth$label ## identify which vars are needed for this smooth... keep_vars <- c(smooth$term, smooth$by) keep_vars <- keep_vars[!keep_vars %in% 'NA'] ## ... then keep only those vars data <- dplyr::select(data, dplyr::all_of(keep_vars)) ## tibble object tbl <- tibble::tibble( .smooth = rep(label, nrow(X)), .estimate = fit, .se = se.fit ) ## bind on the data tbl <- dplyr::bind_cols(tbl, data) ## nest all columns with varying data eval_sm <- tidyr::nest( tbl, data = tidyr::all_of(c(".estimate", ".se", names(data))) ) ## add on info regarding by variable eval_sm <- add_by_var_column(eval_sm, by_var = by_var) ## add on spline type info eval_sm <- add_smooth_type_column(eval_sm, sm_type = "GP") # set some values to NA if too far from the data if (gratia::smooth_dim(smooth) == 2L && (!is.null(dist) && dist > 0)) { eval_sm <- gratia::too_far_to_na( smooth, input = eval_sm, reference = model[["model"]], cols = c(".estimate", ".se"), dist = dist ) } return(eval_sm) } #' @rdname gratia_mvgam_enhancements #' @aliases eval_smooth.mod.smooth #' @export eval_smoothDotmodDotsmooth = function( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) { insight::check_if_installed("gratia") model$cmX <- model$coefficients ## deal with data if supplied data <- process_user_data_for_eval( data = data, model = model, n = n, n_3d = n_3d, n_4d = n_4d, id = which_smooth( model, gratia::smooth_label(smooth) ) ) by_var <- gratia::by_variable(smooth) # even if not a by as we want NA later if (by_var == "NA") { by_var <- NA_character_ } ## values of spline at data eval_sm <- gratia::spline_values( smooth, data = data, unconditional = unconditional, model = model, overall_uncertainty = overall_uncertainty ) ## add on info regarding by variable eval_sm <- add_by_var_column(eval_sm, by_var = by_var) ## add on spline type info eval_sm <- add_smooth_type_column(eval_sm, sm_type = "Mono dec P spline") # set some values to NA if too far from the data if (gratia::smooth_dim(smooth) == 2L && (!is.null(dist) && dist > 0)) { eval_sm <- gratia::too_far_to_na( smooth, input = eval_sm, reference = model[["model"]], cols = c(".estimate", ".se"), dist = dist ) } ## return eval_sm } #' @rdname gratia_mvgam_enhancements #' @aliases eval_smooth.moi.smooth #' @export eval_smoothDotmoiDotsmooth = function( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) { insight::check_if_installed("gratia") model$cmX <- model$coefficients ## deal with data if supplied data <- process_user_data_for_eval( data = data, model = model, n = n, n_3d = n_3d, n_4d = n_4d, id = which_smooth( model, gratia::smooth_label(smooth) ) ) by_var <- gratia::by_variable(smooth) # even if not a by as we want NA later if (by_var == "NA") { by_var <- NA_character_ } ## values of spline at data eval_sm <- gratia::spline_values( smooth, data = data, unconditional = unconditional, model = model, overall_uncertainty = overall_uncertainty ) ## add on info regarding by variable eval_sm <- add_by_var_column(eval_sm, by_var = by_var) ## add on spline type info eval_sm <- add_smooth_type_column(eval_sm, sm_type = "Mono inc P spline") # set some values to NA if too far from the data if (gratia::smooth_dim(smooth) == 2L && (!is.null(dist) && dist > 0)) { eval_sm <- gratia::too_far_to_na( smooth, input = eval_sm, reference = model[["model"]], cols = c(".estimate", ".se"), dist = dist ) } ## return eval_sm } #' Utility functions; full credit goes to Gavin Simpson, the developer and #' maintainer of the gratia package #' @noRd `is.gamm` <- function(object) { inherits(object, "gamm") } #' @noRd `is.gamm4` <- function(object) { is.list(object) & (!is.null(object[["gam"]])) } #' @noRd `is.gam` <- function(object) { inherits(object, "gam") } #' @noRd `is.bam` <- function(object) { inherits(object, "bam") } #' @noRd `which_smooth` <- function(object, term) { if (is.gamm(object) || is.gamm4(object)) { object <- object[["gam"]] } smooths <- gratia::smooths(object) which(term == smooths) } #' @noRd `process_user_data_for_eval` <- function( data, model, n, n_3d, n_4d, id, var_order = NULL ) { if (is.null(data)) { data <- gratia::smooth_data( model = model, n = n, n_3d = n_3d, n_4d = n_4d, id = id, var_order = var_order ) } else { smooth <- gratia::get_smooths_by_id(model, id)[[1L]] vars <- smooth_variable(smooth) by_var <- gratia::by_variable(smooth) if (!identical(by_var, "NA")) { vars <- append(vars, by_var) } ## if this is a by variable, filter the by variable for the required ## level now if (gratia::is_factor_by_smooth(smooth)) { data <- data %>% dplyr::filter(.data[[by_var]] == gratia::by_level(smooth)) } } data } #' @noRd `add_by_var_column` <- function(object, by_var, n = NULL) { if (is.null(n)) { n <- NROW(object) } insight::check_if_installed("tibble") tibble::add_column(object, .by = rep(by_var, times = n), .after = 1L) } #' @noRd `add_smooth_type_column` <- function(object, sm_type, n = NULL) { if (is.null(n)) { n <- NROW(object) } insight::check_if_installed("tibble") tibble::add_column(object, .type = rep(sm_type, times = n), .after = 1L) } #' @noRd lss_eta_index <- function(object) { function(object) { lpi <- attr(formula(object), "lpi") if (is.null(lpi)) { lpi <- list(seq_along(coef(object))) } attr(lpi, "overlap") <- NULL lpi } } #' @noRd smooth_variable <- function(smooth) { gratia::check_is_mgcv_smooth(smooth) smooth[["term"]] } ================================================ FILE: R/hindcast.mvgam.R ================================================ #'@title Extract hindcasts for a fitted \code{mvgam} object #'@name hindcast.mvgam #'@importFrom stats predict #'@inheritParams predict.mvgam #'@param ... Ignored #'@details Posterior hindcasts (i.e. retrodictions) are drawn from the fitted \code{mvgam} and #'organized into a convenient format for plotting #'@return An object of class \code{mvgam_forecast} containing hindcast distributions. #'See \code{\link{mvgam_forecast-class}} for details. #' #'@seealso [plot.mvgam_forecast()], [summary.mvgam_forecast()], #'[forecast.mvgam()], [fitted.mvgam()], [predict.mvgam()] #'@export hindcast <- function(object, ...) { UseMethod("hindcast", object) } #'@rdname hindcast.mvgam #'@method hindcast mvgam #' @examples #' \dontrun{ #' simdat <- sim_mvgam(n_series = 3, trend_model = AR()) #' mod <- mvgam(y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2) #' #' # Hindcasts on response scale #' hc <- hindcast(mod) #' str(hc) #' head(summary(hc), 12) #' plot(hc, series = 1) #' plot(hc, series = 2) #' plot(hc, series = 3) #' #' # Hindcasts as expectations #' hc <- hindcast(mod, type = 'expected') #' head(summary(hc), 12) #' plot(hc, series = 1) #' plot(hc, series = 2) #' plot(hc, series = 3) #' #' # Estimated latent trends #' hc <- hindcast(mod, type = 'trend') #' head(summary(hc), 12) #' plot(hc, series = 1) #' plot(hc, series = 2) #' plot(hc, series = 3) #' } #'@export hindcast.mvgam = function(object, type = 'response', ...) { # Check arguments series <- 'all' type <- match.arg( arg = type, choices = c( "link", "response", "trend", "expected", "latent_N", "detection" ) ) data_train <- object$obs_data data_train <- validate_series_time( data_train, trend_model = attr(object$model_data, 'trend_model') ) last_train <- max(data_train$index..time..index) - (min(data_train$index..time..index) - 1) n_series <- NCOL(object$ytimes) n_predcols <- dim(mcmc_chains(object$model_output, 'ypred')) ends <- seq(0, n_predcols[2], length.out = NCOL(object$ytimes) + 1) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] # Extract hindcasts for storing in the returned object series_hcs <- lapply(seq_len(n_series), function(series) { to_extract <- switch( type, 'link' = 'mus', 'expected' = 'mus', 'response' = 'ypred', 'trend' = 'trend', 'latent_N' = 'mus', 'detection' = 'mus' ) if (object$family == 'nmix' & type == 'link') { to_extract <- 'trend' } if (object$fit_engine == 'stan') { preds <- mcmc_chains(object$model_output, to_extract)[, seq(series, n_predcols[2], by = n_series), drop = FALSE ][, 1:last_train] } else { preds <- mcmc_chains(object$model_output, to_extract)[, starts[series]:ends[series], drop = FALSE ][, 1:last_train] } if (object$family == 'nmix' & type == 'link') { preds <- exp(preds) } if (type %in% c('expected', 'latent_N', 'detection')) { # Extract family-specific parameters for this series family_pars <- extract_family_pars(object = object) par_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(matrix( rep(as.vector(family_pars[[j]][, series]), NCOL(preds)), nrow = NROW(preds), byrow = FALSE )) } else { as.vector(matrix( rep(family_pars[[j]], NCOL(preds)), nrow = NROW(preds), byrow = FALSE )) } }) names(par_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep( as.vector(attr(object$mgcv_model, 'trials')[, series]), NROW(preds) ), nrow = NROW(preds), byrow = TRUE )) par_extracts$trials <- trials } # Compute expectations as one long vector Xpmat <- matrix(as.vector(preds)) attr(Xpmat, 'model.offset') <- 0 if (object$family == 'nmix') { preds <- mcmc_chains(object$model_output, 'detprob')[, object$ytimes[1:last_train, series], drop = FALSE ] Xpmat <- matrix(qlogis(as.vector(preds))) attr(Xpmat, 'model.offset') <- 0 latent_lambdas <- as.vector(mcmc_chains(object$model_output, 'trend')[, seq(series, n_predcols[2], by = n_series), drop = FALSE ][, 1:last_train]) latent_lambdas <- exp(latent_lambdas) cap <- as.vector(t(replicate( n_predcols[1], object$obs_data$cap[which( as.numeric(object$obs_data$series) == series )] ))) } else { latent_lambdas <- NULL cap <- NULL } if (type == 'latent_N') { preds <- mcmc_chains(object$model_output, 'latent_ypred')[, seq(series, n_predcols[2], by = n_series), drop = FALSE ][, 1:last_train] } else { preds <- matrix( as.vector(mvgam_predict( family = object$family, Xp = Xpmat, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = 1, family_pars = par_extracts )), nrow = NROW(preds) ) } } preds }) names(series_hcs) <- levels(data_train$series) series_obs <- lapply(seq_len(n_series), function(series) { s_name <- levels(data_train$series)[series] data.frame( series = data_train$series, time = data_train$index..time..index, y = data_train$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(y) }) names(series_obs) <- levels(data_train$series) series_train_times <- lapply(seq_len(n_series), function(series) { s_name <- levels(data_train$series)[series] data.frame( series = data_train$series, time = data_train$time, y = data_train$y ) %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(time) }) names(series_train_times) <- levels(data_train$series) series_fcs <- structure( list( call = object$call, trend_call = object$trend_call, family = object$family, trend_model = object$trend_model, drift = object$drift, use_lv = object$use_lv, fit_engine = object$fit_engine, type = type, series_names = levels(data_train$series), train_observations = series_obs, train_times = series_train_times, test_observations = NULL, test_times = NULL, hindcasts = series_hcs, forecasts = NULL ), class = 'mvgam_forecast' ) return(series_fcs) } ================================================ FILE: R/how_to_cite.R ================================================ #' Generate a methods description for \pkg{mvgam} models #' #' Create a brief but fully referenced methods description, along with a useful #' list of references, for fitted \code{mvgam} and \code{jsdgam} models. #' #' @name how_to_cite.mvgam #' #' @param object \code{list} object of class \code{mvgam} resulting from a call #' to [mvgam()] or [jsdgam()] #' #' @param ... ignored #' #' @details This function uses the model's structure to come up with a very #' basic but hopefully useful methods description that can help users to #' appropriately acknowledge the hard work of developers and champion open #' science. Please do not consider the text returned by this function to be a #' completely adequate methods section; it is only meant to get you started. #' #' @return An object of class \code{how_to_cite} containing a text description #' of the methods as well as lists of both primary and additional references. #' #' @author Nicholas J Clark #' #' @seealso \code{\link[utils]{citation}}, \code{\link{mvgam}}, #' \code{\link{jsdgam}} #' #' @examples #' \dontrun{ #' #-------------------------------------------------- #' # Simulate 4 time series with hierarchical seasonality #' # and a VAR(1) dynamic process #' #-------------------------------------------------- #' set.seed(0) #' #' simdat <- sim_mvgam( #' seasonality = 'hierarchical', #' trend_model = VAR(cor = TRUE), #' family = gaussian() #' ) #' #' # Fit an appropriate model #' mod1 <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' data = simdat$data_train, #' family = gaussian(), #' trend_model = VAR(cor = TRUE), #' chains = 2, #' silent = 2 #' ) #' #' how_to_cite(mod1) #' #' #-------------------------------------------------- #' # For a GP example, simulate data using the mgcv package #' #-------------------------------------------------- #' dat <- mgcv::gamSim(1, n = 30, scale = 2) #' #' # Fit a model that uses an approximate GP from brms #' mod2 <- mvgam( #' y ~ gp(x2, k = 12), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' how_to_cite(mod2) #' } #' #'@export how_to_cite <- function(object, ...) { UseMethod("how_to_cite", object) } #'@export #'@export print.how_to_cite = function(x, ...) { cat("Methods text skeleton\n") cat(insight::format_message(x$methods_text)) cat('\n') cat("\nPrimary references\n") for (i in seq_along(x$citations)) { cat(insight::format_message(x$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(x$other_citations)) { cat(insight::format_message(x$other_citations[[i]])) cat('\n') } invisible(x) } #'@rdname how_to_cite.mvgam #'@method how_to_cite mvgam #'@export how_to_cite.mvgam <- function(object, ...) { current_year <- format(Sys.Date(), "%Y") citations <- vector(mode = 'list') # mvgam-specific methods mvgam_text <- paste0( "We used the R package mvgam (version ", utils::packageVersion("mvgam"), "; Clark & Wells, 2023) to construct, fit and interrogate the model.", " mvgam fits Bayesian State-Space models that can include flexible", " predictor effects in both the process and observation components", " by incorporating functionalities from the brms (Burkner 2017),", " mgcv (Wood 2017) and splines2 (Wang & Yan, 2023) packages." ) citations[[ 1 ]] <- "Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974" citations[[ 2 ]] <- "Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01" citations[[ 3 ]] <- "Wood, SN (2017). Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC." citations[[ 4 ]] <- "Wang W and Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517. doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020." # Any specials; first check whether this model used a VAR / VARMA process specials_text <- NULL trend_model <- attr(object$model_data, 'trend_model') if ( trend_model %in% c( 'VAR', 'VARcor', 'VARhiercor', 'VAR1', 'VAR1cor', 'VAR1hiercor', 'VARMA', 'VARMAcor', 'VARMA1,1cor' ) ) { specials_text <- c( specials_text, " To encourage stability and prevent forecast variance from increasing indefinitely, we enforced stationarity of the Vector Autoregressive process following methods described by Heaps (2023) and Clark et al. (2025)." ) citations <- append( citations, list( "Heaps, SE (2023). Enforcing stationarity through the prior in vector autoregressions. Journal of Computational and Graphical Statistics 32, 74-83." ) ) citations <- append( citations, list( "Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM, Karunarathna KANK (2025). Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ 13:e18929." ) ) } # Check for approximate GPs if ( !is.null(attr(object$mgcv_model, 'gp_att_table')) | !is.null(attr(object$trend_mgcv_model, 'gp_att_table')) | trend_model == 'GP' ) { specials_text <- c( specials_text, " Gaussian Process functional effects were estimated using a low-rank Hilbert space approximation following methods described by Riutort-Mayol et al. (2023)." ) citations <- append( citations, list( "Riutort-Mayol G, Burkner PC, Andersen MR, Solin A and Vehtari A (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing 33, 1. https://doi.org/10.1007/s11222-022-10167-2" ) ) } # Check for piecewise trends if ( trend_model %in% c('PWlogistic', 'PWlinear') ) { specials_text <- c( specials_text, " Piecewise dynamic trends were parameterized and estimated following methods described by Taylor and Letham (2018)." ) citations <- append( citations, list( "Taylor S and Letham B (2018). Forecasting at scale. The American Statistician 72(1) 37-45. https://doi.org/10.1080/00031305.2017.1380080" ) ) } # Was this a jsdgam? if (inherits(object, 'jsdgam')) { specials_text <- c( specials_text, " To ensure identifiability of factors, factor loadings were constrained following Lopes & West (2004)." ) citations <- append( citations, list( "Lopes HF and West M (2014). Bayesian model assessment in factor analysis. Statistica Sinica 14(1) 41-67. https://www.jstor.org/stable/24307179" ) ) } # Stan-specific methods citations <- append( citations, list( "Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76." ) ) stan_text <- paste0( " The mvgam-constructed model and observed data", " were passed to the probabilistic programming environment Stan" ) if (object$backend == 'cmdstanr') { stan_text <- paste0( stan_text, " (version ", cmdstanr::cmdstan_version(), "; Carpenter et al. 2017, Stan Development Team ", current_year, "), specifically through the cmdstanr interface (Gabry & Cesnovar, 2021)." ) citations <- append( citations, list(paste0( "Gabry J, Cesnovar R, Johnson A, and Bronder S (", current_year, "). cmdstanr: R Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org." )) ) } else { stan_text <- paste0( stan_text, " (version ", rstan::stan_version(), "; Carpenter et al. 2017)", ", specifically through the rstan interface (Stan Development Team ", current_year, ")" ) citations <- append( citations, list(paste0( "Stan Development Team (", current_year, "). RStan: the R interface to Stan. R package version ", utils::packageVersion("rstan"), ". https://mc-stan.org/." )) ) } if (object$algorithm == 'sampling') { stan_text <- paste0( stan_text, " We ran ", object$model_output@sim$chains, " Hamiltonian Monte Carlo chains for ", object$model_output@sim$warmup, " warmup iterations and ", object$model_output@sim$iter - object$model_output@sim$warmup, " sampling iterations for joint posterior estimation.", " Rank normalized split Rhat (Vehtari et al. 2021) and effective", " sample sizes were used to monitor convergence." ) citations <- append( citations, list( "Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2) 667-718. https://doi.org/10.1214/20-BA1221." ) ) } if (object$algorithm %in% c('meanfield', 'fullrank')) { stan_text <- paste0( stan_text, " We used Stan's Automatic Differentiation Variational Inference algorithm", " (Kucukelbir et al. 2017) for posterior approximation, specifically using ", object$algorithm, " algorithm to draw ", object$model_output@sim$iter, " samples from the approximate joint posterior." ) citations <- append( citations, list( "Kucukelbir A, Tran D, Ranganath R, Gelman A, and Blei DM (2017). Automatic Differentiation Variational Inference. Journal of Machine Learning Research 18 1-45." ) ) } if (object$algorithm == c('laplace')) { stan_text <- paste0( stan_text, " We used Stan's Laplace approximation algorithm", " to draw ", object$model_output@sim$iter, " samples from the approximate joint posterior." ) } if (object$algorithm == c('pathfinder')) { stan_text <- paste0( stan_text, " We used Stan's Pathfinder variational approximation algorithm (Zhang et al. 2022)", " to draw ", object$model_output@sim$iter, " samples from the approximate joint posterior." ) citations <- append( citations, list( "Zhang L, Carpenter B, Gelman A, and Vehtari A (2022). Pathfinder: parallel Quasi-Newton variational inference. Journal of Machine Learning Research 23(306), 1-49. http://jmlr.org/papers/v23/21-0889.html." ) ) } # Append texts all_text <- paste0(mvgam_text, specials_text, stan_text) # List of additional, possibly very useful references other_citations <- vector(mode = 'list') other_citations[[ 1 ]] <- "Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret statistical models using marginaleffects for R and Python. Journal of Statistical Software, 111(9), 1-32. https://doi.org/10.18637/jss.v111.i09" other_citations[[ 2 ]] <- "Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). Visualization in Bayesian workflow. Journal of the Royal Statatistical Society A, 182, 389-402. doi:10.1111/rssa.12378." other_citations[[ 3 ]] <- "Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4." other_citations[[ 4 ]] <- "Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499-2523. https://doi.org/10.1080/00949655.2020.1783262" out <- structure( list( methods_text = all_text, citations = citations, other_citations = other_citations ), class = 'how_to_cite' ) return(out) } ================================================ FILE: R/index-mvgam.R ================================================ #' Index \code{mvgam} objects #' #' @aliases variables #' #' Index variables and their `mgcv` coefficient names #' #' @param x A \code{mvgam} object or another \R object for which #' the methods are defined. #' #' @param ... Arguments passed to individual methods (if applicable). #' #' @name index-mvgam NULL #' @rdname index-mvgam #' #' @importFrom posterior variables #' #' @param x \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @method variables mvgam #' #' @return a `list` object of the variables that can be extracted, along #' with their aliases #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # Simulate data and fit a model #' simdat <- sim_mvgam( #' n_series = 1, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Extract model variables #' variables(mod) #' } #' #' @export #' @export variables variables.mvgam = function(x, ...) { parnames <- dimnames(x$model_output)$parameters # Observation distribution parameters if ( any(grepl( paste(c('sigma_obs', 'phi', 'nu', 'shape'), collapse = '|'), parnames )) ) { observation_pars <- data.frame( orig_name = parnames[grepl( paste(c('sigma_obs', 'phi', 'nu', 'shape'), collapse = '|'), parnames )], alias = NA ) } else { observation_pars <- NULL } # Linear predictor parameters observation_linpreds <- data.frame( orig_name = parnames[ grepl('mus[', parnames, fixed = TRUE) & !grepl('trend_mus[', parnames, fixed = TRUE) ], alias = NA ) if (!is.null(x$trend_call) & !inherits(x, 'jsdgam')) { trend_linpreds <- data.frame( orig_name = parnames[grepl('trend_mus[', parnames, fixed = TRUE)], alias = NA ) } else { trend_linpreds <- NULL } # Posterior predictions posterior_preds <- data.frame( orig_name = parnames[grepl('ypred[', parnames, fixed = TRUE)], alias = NA ) # Beta coefficient parameters b_names <- colnames(mcmc_chains(x$model_output, 'b')) mgcv_names <- names(coef(x$mgcv_model)) observation_betas <- data.frame(orig_name = b_names, alias = mgcv_names) if (!is.null(x$trend_call) & !inherits(x, 'jsdgam')) { b_names <- colnames(mcmc_chains(x$model_output, 'b_trend')) mgcv_names <- gsub( 'series', 'trend', paste0(names(coef(x$trend_mgcv_model)), '_trend') ) trend_betas <- data.frame(orig_name = b_names, alias = mgcv_names) } else { trend_betas <- NULL } # Population parameters from hierarchical (random) effects if (any(unlist(purrr::map(x$mgcv_model$smooth, inherits, 'random.effect')))) { re_labs <- unlist(lapply( purrr::map(x$mgcv_model$smooth, 'term'), paste, collapse = ',' ))[ unlist(purrr::map(x$mgcv_model$smooth, inherits, 'random.effect')) ] observation_re_params <- data.frame( orig_name = c( rownames(mcmc_summary(x$model_output, 'mu_raw', ISB = TRUE)), rownames(mcmc_summary(x$model_output, 'sigma_raw', ISB = TRUE)) ), alias = c(paste0('mean(', re_labs, ')'), paste0('sd(', re_labs, ')')) ) } else { observation_re_params <- NULL } trend_re_params <- NULL if (!is.null(x$trend_call) & !inherits(x, 'jsdgam')) { if ( any(unlist(purrr::map( x$trend_mgcv_model$smooth, inherits, 'random.effect' ))) ) { re_labs <- unlist(lapply( purrr::map(x$trend_mgcv_model$smooth, 'term'), paste, collapse = ',' ))[ unlist(purrr::map(x$trend_mgcv_model$smooth, inherits, 'random.effect')) ] re_labs <- gsub('series', 'trend', re_labs) trend_re_params <- data.frame( orig_name = c( rownames(mcmc_summary(x$model_output, 'mu_raw_trend', ISB = TRUE)), rownames(mcmc_summary(x$model_output, 'sigma_raw_trend', ISB = TRUE)) ), alias = c( paste0('mean(', re_labs, ')_trend'), paste0('sd(', re_labs, ')_trend') ) ) } else { trend_re_params <- NULL } } # Smoothing parameters if (any(grepl('rho[', parnames, fixed = TRUE))) { observation_smoothpars <- data.frame( orig_name = parnames[grepl('rho[', parnames, fixed = TRUE)], alias = paste0(x$sp_names, '_rho') ) } else { observation_smoothpars <- NULL } if ( any(grepl('rho_trend[', parnames, fixed = TRUE)) & !inherits(x, 'jsdgam') ) { trend_smoothpars <- data.frame( orig_name = parnames[grepl('rho_trend[', parnames, fixed = TRUE)], alias = paste0(x$trend_sp_names, '_rho_trend') ) } else { trend_smoothpars <- NULL } # Trend state parameters if ( any( grepl('trend[', parnames, fixed = TRUE) & !grepl('_trend[', parnames, fixed = TRUE) ) & !inherits(x, 'jsdgam') ) { trend_states <- grepl('trend[', parnames, fixed = TRUE) & !grepl('_trend[', parnames, fixed = TRUE) trends <- data.frame(orig_name = parnames[trend_states], alias = NA) } else { trends <- NULL } # Trend dynamics parameters if ( any( grepl( paste( c( 'sigma', 'alpha_gp', 'rho_gp', 'ar1', 'ar2', 'ar3', 'A', 'Sigma', 'error', 'alpha_cor', 'theta', 'k_trend', 'delta_trend', 'm_trend' ), collapse = '|' ), parnames ) & !grepl('sigma_obs', parnames, fixed = TRUE) & !grepl('sigma_raw', parnames, fixed = TRUE) ) ) { trend_pars <- grepl( paste( c( 'sigma', 'alpha_gp', 'rho_gp', 'ar1', 'ar2', 'ar3', 'A', 'Sigma', 'error', 'alpha_cor', 'theta', 'k_trend', 'delta_trend', 'm_trend' ), collapse = '|' ), parnames ) & !grepl('sigma_obs', parnames, fixed = TRUE) & !grepl('sigma_raw', parnames, fixed = TRUE) trend_pars <- data.frame(orig_name = parnames[trend_pars], alias = NA) } else { trend_pars <- NULL } return(list( observation_pars = observation_pars, observation_linpreds = observation_linpreds, observation_betas = observation_betas, observation_smoothpars = observation_smoothpars, observation_re_params = observation_re_params, posterior_preds = posterior_preds, trend_pars = trend_pars, trend_linpreds = trend_linpreds, trend_betas = trend_betas, trend_smoothpars = trend_smoothpars, trend_re_params = trend_re_params, trends = trends )) } ================================================ FILE: R/interpret_mvgam.R ================================================ #' Interpret the formula specified to mvgam and replace any dynamic terms #' with the correct Gaussian Process smooth specification #' @importFrom stats formula terms as.formula terms.formula #' @noRd interpret_mvgam = function(formula, N, family) { # Check for proper binomial specification if (!missing(family)) { if (is.character(family)) { if (family == 'beta') { family <- betar() } family <- try(eval(parse(text = family)), silent = TRUE) if (inherits(family, 'try-error')) { stop("family not recognized", call. = FALSE) } } if (is.function(family)) { family <- family() } if (family$family %in% c('binomial', 'beta_binomial')) { # Check that response terms use the cbind() syntax resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { stop( 'Binomial family requires cbind() syntax in the formula left-hand side', call. = FALSE ) } else { if (any(grepl('cbind', resp_terms))) {} else { stop( 'Binomial family requires cbind() syntax in the formula left-hand side', call. = FALSE ) } } } } facs <- colnames(attr(terms.formula(formula), 'factors')) # Check if formula has an intercept keep_intercept <- attr(terms(formula), 'intercept') == 1 # Re-arrange so that random effects always come last if (any(grepl('bs = \"re\"', facs, fixed = TRUE))) { newfacs <- facs[!grepl('bs = \"re\"', facs, fixed = TRUE)] refacs <- facs[grepl('bs = \"re\"', facs, fixed = TRUE)] int <- attr(terms.formula(formula), 'intercept') # Preserve offset if included if (!is.null(attr(terms(formula(formula)), 'offset'))) { newformula <- as.formula(paste( dimnames(attr(terms(formula), 'factors'))[[1]][1], '~', grep( 'offset', rownames(attr(terms.formula(formula), 'factors')), value = TRUE ), '+', paste( paste(newfacs, collapse = '+'), '+', paste(refacs, collapse = '+'), collapse = '+' ), ifelse(int == 0, ' - 1', '') )) } else { newformula <- as.formula(paste( dimnames(attr(terms(formula), 'factors'))[[1]][1], '~', paste( paste(newfacs, collapse = '+'), '+', paste(refacs, collapse = '+'), collapse = '+' ), ifelse(int == 0, ' - 1', '') )) } } else { newformula <- formula } attr(newformula, '.Environment') <- attr(formula, '.Environment') # Check if any terms use the gp wrapper response <- terms.formula(newformula)[[2]] tf <- terms.formula(newformula, specials = c("gp")) which_gp <- attr(tf, "specials")$gp if (length(which_gp) != 0L) { gp_details <- vector(length = length(which_gp), mode = 'list') for (i in seq_along(which_gp)) { gp_details[[i]] <- eval(parse( text = rownames(attr(tf, "factors"))[which_gp[i]] )) } } # Check if any terms use the dynamic wrapper response <- terms.formula(newformula)[[2]] tf <- attr(terms.formula(newformula, keep.order = TRUE), 'term.labels') which_dynamics <- grep('dynamic(', tf, fixed = TRUE) # Update the formula to the correct Gaussian Process implementation if (length(which_dynamics) != 0L) { dyn_details <- vector(length = length(which_dynamics), mode = 'list') if (length(which_dynamics > 1)) { for (i in seq_along(which_dynamics)) { dyn_details[[i]] <- eval(parse(text = tf[which_dynamics[i]])) } } # k is set based on the number of timepoints available; want to ensure # it is large enough to capture the expected wiggliness of the latent GP # (a smaller rho will require more basis functions for accurate approximation) dyn_to_gpspline = function(term, N) { if (term$rho > N - 1) { stop( 'Argument "rho" in dynamic() cannot be larger than (max(time) - 1)', call. = FALSE ) } k <- term$k if (is.null(k)) { if (N > 8) { k <- min( 50, min(N, max(8, ceiling(N / (term$rho - (term$rho / 10))))) ) } else { k <- N } } paste0( "s(time,by=", term$term, ",bs='gp',m=c(", ifelse(term$stationary, '-', ''), "2,", term$rho, ",2),", "k=", k, ")" ) } dyn_to_gphilbert = function(term, N) { k <- term$k if (is.null(k)) { if (N > 8) { k <- min(40, min(N - 1, max(8, N - 1))) } else { k <- N - 1 } } paste0( "gp(time,by=", term$term, ",c=5/4,", "k=", k, ",scale=", term$scale, ")" ) } # Replace dynamic terms with the correct specification termlabs <- attr(terms(newformula, keep.order = TRUE), 'term.labels') for (i in seq_along(which_dynamics)) { if (is.null(dyn_details[[i]]$rho)) { termlabs[which_dynamics[i]] <- dyn_to_gphilbert(dyn_details[[i]], N = N) } else { termlabs[which_dynamics[i]] <- dyn_to_gpspline(dyn_details[[i]], N = N) } } # Return the updated formula for passing to mgcv updated_formula <- reformulate(termlabs, rlang::f_lhs(newformula)) attr(updated_formula, '.Environment') <- attr(newformula, '.Environment') } else { updated_formula <- newformula } if (!keep_intercept) { updated_formula <- update(updated_formula, . ~ . - 1) attr(updated_formula, '.Environment') <- attr(newformula, '.Environment') } return(updated_formula) } ================================================ FILE: R/irf.mvgam.R ================================================ #' Calculate latent VAR impulse response functions #' #' Compute Generalized or Orthogonalized Impulse Response Functions (IRFs) from #' \code{mvgam} models with Vector Autoregressive dynamics #' #' @name irf.mvgam #' @param object \code{list} object of class \code{mvgam} resulting from a call to [mvgam()] #' that used a Vector Autoregressive latent process model (either as `VAR(cor = FALSE)` or #' `VAR(cor = TRUE)`; see [VAR()] for details) #' @param h Positive \code{integer} specifying the forecast horizon over which to calculate #' the IRF #' @param cumulative \code{Logical} flag indicating whether the IRF should be cumulative #' @param orthogonal \code{Logical} flag indicating whether orthogonalized IRFs should be #' calculated. Note that the order of the variables matters when calculating these #' @param ... ignored #' @details #' See \code{\link{mvgam_irf-class}} for a full description of the quantities that are #' computed and returned by this function, along with key references. #' @return An object of \code{\link{mvgam_irf-class}} containing the posterior IRFs. This #' object can be used with the supplied S3 functions [plot.mvgam_irf()] #' and [summary.mvgam_irf()] #' @author Nicholas J Clark #' @seealso \code{\link{mvgam_irf-class}}, [VAR()], [plot.mvgam_irf()], [stability()], [fevd()] #' @examples #' \dontrun{ #' # Fit a model to the portal time series that uses a latent VAR(1) #' mod <- mvgam( #' formula = captures ~ -1, #' trend_formula = ~ trend, #' trend_model = VAR(cor = TRUE), #' family = poisson(), #' data = portal_data, #' chains = 2, #' silent = 2 #' ) #' #' # Plot the autoregressive coefficient distributions; #' # use 'dir = "v"' to arrange the order of facets #' # correctly #' mcmc_plot( #' mod, #' variable = 'A', #' regex = TRUE, #' type = 'hist', #' facet_args = list(dir = 'v') #' ) #' #' # Calulate Generalized IRFs for each series #' irfs <- irf( #' mod, #' h = 12, #' cumulative = FALSE #' ) #' #' # Plot them #' plot(irfs, series = 1) #' plot(irfs, series = 2) #' plot(irfs, series = 3) #' plot(irfs, series = 4) #' #' # Calculate posterior median, upper and lower 95th quantiles #' # of the impulse responses #' summary(irfs) #' } #' @export irf <- function(object, ...) { UseMethod("irf", object) } #' @rdname irf.mvgam #' @method irf mvgam #' @export irf.mvgam <- function( object, h = 10, cumulative = FALSE, orthogonal = FALSE, ... ) { validate_pos_integer(h) trend_model <- attr(object$model_data, "trend_model") if (!trend_model %in% c("VAR", "VARcor", "VAR1", "VAR1cor")) { stop( "Only VAR(1) models currently supported for calculating IRFs", call. = FALSE ) } beta_vars <- mcmc_chains(object$model_output, "A") sigmas <- mcmc_chains(object$model_output, "Sigma") n_series <- object$n_lv if (is.null(n_series)) { n_series <- nlevels(object$obs_data$series) } all_irfs <- lapply(seq_len(NROW(beta_vars)), function(draw) { # Get necessary VAR parameters into a simple list format x <- list( K = n_series, A = matrix( beta_vars[draw, ], nrow = n_series, ncol = n_series, byrow = TRUE ), Sigma = matrix( sigmas[draw, ], nrow = n_series, ncol = n_series, byrow = TRUE ), p = 1 ) # Calculate the IRF gen_irf(x, h = h, cumulative = cumulative, orthogonal = orthogonal) }) class(all_irfs) <- "mvgam_irf" attr(all_irfs, "irf_type") <- ifelse( orthogonal, "Orthogonalized", "Generalized" ) return(all_irfs) } #### Functions to compute Generalized Impulse Response functions # Much of this code is modified from R code generously provided by Clinton Watkins: # https://www.clintonwatkins.com/posts/2021-generalised-impulse-response-function-R/ #### #' Calculate impulse response functions #' @noRd gen_irf <- function(x, h = 6, cumulative = TRUE, orthogonal = FALSE) { impulse <- paste0("process_", 1:x$K) # Create arrays to hold calculations IRF_o <- array( data = 0, dim = c(h, x$K, x$K), dimnames = list(NULL, impulse, impulse) ) IRF_g <- array( data = 0, dim = c(h, x$K, x$K), dimnames = list(NULL, impulse, impulse) ) IRF_g1 <- array(data = 0, dim = c(h, x$K, x$K)) # Estimation of orthogonalised or generalised IRFs if (orthogonal) { var_ma <- var_psi(x, h) } else { var_ma <- var_phi(x, h) } sigma_u <- x$Sigma P <- t(chol(sigma_u)) sig_jj <- diag(sigma_u) for (jj in 1:x$K) { indx_ <- matrix(0, x$K, 1) indx_[jj, 1] <- 1 for (kk in 1:h) { IRF_o[kk, , jj] <- var_ma[,, kk] %*% P %*% indx_ # Peseran-Shin eqn 7 (OIRF) IRF_g1[kk, , jj] <- var_ma[,, kk] %*% sigma_u %*% indx_ IRF_g[kk, , jj] <- sig_jj[jj]^(-0.5) * IRF_g1[kk, , jj] # Peseran-Shin eqn 10 (GIRF) } } if (orthogonal == TRUE) { irf <- IRF_o } else if (orthogonal == FALSE) { irf <- IRF_g } else { stop("\nError! Orthogonalised or generalised IRF?\n") } idx <- length(impulse) irs <- list() for (ii in 1:idx) { irs[[ii]] <- matrix(irf[1:(h), impulse, impulse[ii]], nrow = h) colnames(irs[[ii]]) <- impulse if (cumulative) { if (length(impulse) > 1) { irs[[ii]] <- apply(irs[[ii]], 2, cumsum) } if (length(impulse) == 1) { tmp <- matrix(cumsum(irs[[ii]])) colnames(tmp) <- impulse irs[[ii]] <- tmp } } } names(irs) <- impulse return(irs) } #' Convert a VAR A matrix to its moving average representation #' @noRd var_phi <- function(x, h = 10) { h <- abs(as.integer(h)) K <- x$K p <- x$p A <- as.array(x$A) if (h >= p) { As <- array(0, dim = c(K, K, h + 1)) for (i in (p + 1):(h + 1)) { As[,, i] <- matrix(0, nrow = K, ncol = K) } } else { As <- array(0, dim = c(K, K, p)) } As[,, 1] <- A Phi <- array(0, dim = c(K, K, h + 1)) Phi[,, 1] <- diag(K) Phi[,, 2] <- Phi[,, 1] %*% As[,, 1] if (h > 1) { for (i in 3:(h + 1)) { tmp1 <- Phi[,, 1] %*% As[,, i - 1] tmp2 <- matrix(0, nrow = K, ncol = K) idx <- (i - 2):1 for (j in 1:(i - 2)) { tmp2 <- tmp2 + Phi[,, j + 1] %*% As[,, idx[j]] } Phi[,, i] <- tmp1 + tmp2 } } return(Phi) } #' Convert a VAR A matrix to its orthogonalised moving average representation #' @noRd var_psi <- function(x, h = 10) { h <- abs(as.integer(h)) Phi <- var_phi(x, h = h) Psi <- array(0, dim = dim(Phi)) sigma_u <- x$Sigma P <- t(chol(sigma_u)) dim3 <- dim(Phi)[3] for (i in 1:dim3) { Psi[,, i] <- Phi[,, i] %*% P } return(Psi) } ================================================ FILE: R/jsdgam.R ================================================ #'Fit Joint Species Distribution Models in \pkg{mvgam} #' #'This function sets up a Joint Species Distribution Model whereby the residual #'associations among species can be modelled in a reduced-rank format using a #'set of latent factors. The factor specification is extremely flexible, #'allowing users to include spatial, temporal or any other type of predictor #'effects to more efficiently capture unmodelled residual associations, while #'the observation model can also be highly flexible (including all smooth, GP #'and other effects that \pkg{mvgam} can handle) #' #'@inheritParams mvgam #'@inheritParams ZMVN #' #'@param formula A \code{formula} object specifying the GAM observation model #' formula. These are exactly like the formula for a GLM except that smooth #' terms, `s()`, `te()`, `ti()`, `t2()`, as well as time-varying `dynamic()` #' terms, nonparametric `gp()` terms and offsets using `offset()`, can be #' added to the right hand side to specify that the linear predictor depends #' on smooth functions of predictors (or linear functionals of these). #' Details of the formula syntax used by \pkg{mvgam} can be found in #' \code{\link{mvgam_formulae}} #' #'@param factor_formula A \code{formula} object specifying the linear predictor #' effects for the latent factors. Use `by = trend` within calls to functional #' terms (i.e. `s()`, `te()`, `ti()`, `t2()`, `dynamic()`, or `gp()`) to #' ensure that each factor captures a different axis of variation. See the #' example below as an illustration #' #'@param factor_knots An optional \code{list} containing user specified knot #' values to be used for basis construction of any smooth terms in #' `factor_formula`. For most bases the user simply supplies the knots to be #' used, which must match up with the `k` value supplied (note that the #' number of knots is not always just `k`). Different terms can use different #' numbers of knots, unless they share a covariate #' #'@param data A \code{dataframe} or \code{list} containing the model response #' variable and covariates required by the GAM \code{formula} and #' \code{factor_formula} objects #' #'@param family \code{family} specifying the observation family for the #' outcomes. Currently supported families are: #'\itemize{ #' \item`gaussian()` for real-valued data #' \item`betar()` for proportional data on `(0,1)` #' \item`lognormal()` for non-negative real-valued data #' \item`student_t()` for real-valued data #' \item`Gamma()` for non-negative real-valued data #' \item`bernoulli()` for binary data #' \item`poisson()` for count data #' \item`nb()` for overdispersed count data #' \item`binomial()` for count data with imperfect detection when the number #' of trials is known; note that the `cbind()` function must be used to #' bind the discrete observations and the discrete number of trials #' \item`beta_binomial()` as for `binomial()` but allows for overdispersion} #'Default is `poisson()`. See \code{\link{mvgam_families}} for more details #' #' @param species The unquoted name of the `factor` variable that indexes the #' different response units in `data` (usually `'species'` in a JSDM). #' Defaults to `series` to be consistent with other `mvgam` models #' #'@param n_lv \code{integer} the number of latent factors to use for modelling #' residual associations. Cannot be `> n_species`. Defaults arbitrarily to `2` #' #'@param threads \code{integer} Experimental option to use multithreading for #' within-chain parallelisation in \code{Stan}. We recommend its use only if #' you are experienced with \code{Stan}'s `reduce_sum` function and have a #' slow running model that cannot be sped up by any other means. Currently #' works for all families when using \pkg{cmdstanr} as the backend #' #'@param priors An optional \code{data.frame} with prior definitions (in Stan #' syntax) or, preferentially, a vector containing objects of class #' `brmsprior` (see. \code{\link[brms]{prior}} for details). See #' [get_mvgam_priors] and for more information on changing default prior #' distributions #' #'@param ... Other arguments to pass to [mvgam] #' #'@author Nicholas J Clark #' #'@details Joint Species Distribution Models allow for responses of multiple #'species to be learned hierarchically, whereby responses to environmental #'variables in `formula` can be partially pooled and any latent, unmodelled #'residual associations can also be learned. In \pkg{mvgam}, both of these #'effects can be modelled with the full power of latent factor Hierarchical #'GAMs, providing unmatched flexibility to model full communities of species. #'When calling [jsdgam], an initial State-Space model using `trend = 'None'` is #'set up and then modified to include the latent factors and their linear #'predictors. Consequently, you can inspect priors for these models using #'[get_mvgam_priors] by supplying the relevant `formula`, `factor_formula`, #'`data` and `family` arguments and keeping the default `trend = 'None'`. #' #' In a JSDGAM, the expectation of response \eqn{Y_{ij}} is modelled with #' #' \deqn{g(\mu_{ij}) = X_i\beta + u_i\theta_j,} #' #' where \eqn{g(.)} is a known link function, #' \eqn{X} is a design matrix of linear predictors (with associated \eqn{\beta} #' coefficients), \eqn{u} are \eqn{n_{lv}}-variate latent factors #' (\eqn{n_{lv}}<<\eqn{n_{species}}) and \eqn{\theta_j} are species-specific #' loadings on the latent factors, respectively. The design matrix \eqn{X} and #' \eqn{\beta} coefficients are constructed and modelled using `formula` and #' can contain any of `mvgam`'s predictor effects, including random intercepts #' and slopes, multidimensional penalized smooths, GP effects etc... The factor #' loadings \eqn{\theta_j} are constrained for identifiability but can be used #' to reconstruct an estimate of the species' residual variance-covariance #' matrix using \eqn{\Theta \Theta'} (see the example below and #' [residual_cor()] for details). The latent factors are further modelled using: #'\deqn{ #'u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) #'} #'where the second design matrix \eqn{Q} and associated \eqn{\beta_{factor}} #'coefficients are constructed and modelled using `factor_formula`. Again, the #'effects that make up this linear predictor can contain any of `mvgam`'s #'allowed predictor effects, providing enormous flexibility for modelling #'species' communities. #' #'@seealso [mvgam()], [residual_cor()] #' #'@references Nicholas J Clark & Konstans Wells (2023). Dynamic generalised #'additive models (DGAMs) for forecasting discrete ecological time series. #'Methods in Ecology and Evolution. 14:3, 771-784. #' \cr #' \cr #'David I Warton, F Guillaume Blanchet, Robert B O'Hara, Otso Ovaskainen, Sara #'Taskinen, Steven C Walker & Francis KC Hui (2015). So many variables: joint #'modeling in community ecology. Trends in Ecology & Evolution 30:12, 766-779. #' #'@return A \code{list} object of class \code{mvgam} containing model output, #'the text representation of the model file, the mgcv model output (for easily #'generating simulations at unsampled covariate values), Dunn-Smyth residuals #'for each species and key information needed for other functions in the #'package. See \code{\link{mvgam-class}} for details. Use #'`methods(class = "mvgam")` for an overview on available methods #' #'@examples #'\dontrun{ #' # ======================================================================== #' # Example 1: Basic JSDGAM with Portal Data #' # ======================================================================== #' #' # Fit a JSDGAM to the portal_data captures #' mod <- jsdgam( #' formula = captures ~ #' # Fixed effects of NDVI and mintemp, row effect as a GP of time #' ndvi_ma12:series + mintemp:series + gp(time, k = 15), #' factor_formula = ~ -1, #' data = portal_data, #' unit = time, #' species = series, #' family = poisson(), #' n_lv = 2, #' silent = 2, #' chains = 2 #' ) #' #' # Plot covariate effects #' library(ggplot2); theme_set(theme_bw()) #' plot_predictions( #' mod, #' condition = c('ndvi_ma12', 'series', 'series') #' ) #' #' plot_predictions( #' mod, #' condition = c('mintemp', 'series', 'series') #' ) #' #' # A residual correlation plot #' plot(residual_cor(mod)) #' #' # An ordination biplot can also be constructed #' # from the factor scores and their loadings #' if(requireNamespace('ggrepel', quietly = TRUE)){ #' ordinate(mod, alpha = 0.7) #' } #' #' # ======================================================================== #' # Example 2: Advanced JSDGAM with Spatial Predictors #' # ======================================================================== #' #' # Simulate latent count data for 500 spatial locations and 10 species #' set.seed(0) #' N_points <- 500 #' N_species <- 10 #' #' # Species-level intercepts (on the log scale) #' alphas <- runif(N_species, 2, 2.25) #' #' # Simulate a covariate and species-level responses to it #' temperature <- rnorm(N_points) #' betas <- runif(N_species, -0.5, 0.5) #' #' # Simulate points uniformly over a space #' lon <- runif(N_points, min = 150, max = 155) #' lat <- runif(N_points, min = -20, max = -19) #' #' # Set up spatial basis functions as a tensor product of lat and lon #' sm <- mgcv::smoothCon( #' mgcv::te(lon, lat, k = 5), #' data = data.frame(lon, lat), #' knots = NULL #' )[[1]] #' #' # The design matrix for this smooth is in the 'X' slot #' des_mat <- sm$X #' dim(des_mat) #' #' # Function to generate a random covariance matrix where all variables #' # have unit variance (i.e. diagonals are all 1) #' random_Sigma = function(N){ #' L_Omega <- matrix(0, N, N); #' L_Omega[1, 1] <- 1; #' for (i in 2 : N) { #' bound <- 1; #' for (j in 1 : (i - 1)) { #' L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); #' bound <- bound - L_Omega[i, j] ^ 2; #' } #' L_Omega[i, i] <- sqrt(bound); #' } #' Sigma <- L_Omega %*% t(L_Omega); #' return(Sigma) #' } #' #' # Simulate a variance-covariance matrix for the correlations among #' # basis coefficients #' Sigma <- random_Sigma(N = NCOL(des_mat)) #' #' # Now simulate the species-level basis coefficients hierarchically, where #' # spatial basis function correlations are a convex sum of a base correlation #' # matrix and a species-level correlation matrix #' basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) #' base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) #' for(t in 1:N_species){ #' corOmega <- (cov2cor(Sigma) * 0.7) + #' (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) #' basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) #' } #' #' # Simulate the latent spatial processes #' st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ #' data.frame( #' lat = lat, #' lon = lon, #' species = paste0('species_', t), #' temperature = temperature, #' process = alphas[t] + #' betas[t] * temperature + #' des_mat %*% basis_coefs[t,] #' ) #' })) #' #' # Now take noisy observations at some of the points (60) #' obs_points <- sample(1:N_points, size = 60, replace = FALSE) #' obs_points <- data.frame( #' lat = lat[obs_points], #' lon = lon[obs_points], #' site = 1:60 #' ) #' #' # Keep only the process data at these points #' st_process %>% #' dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>% #' # now take noisy Poisson observations of the process #' dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>% #' dplyr::mutate(species = factor( #' species, #' levels = paste0('species_', 1:N_species) #' )) %>% #' dplyr::group_by(lat, lon) -> dat #' #' # View the count distributions for each species #' ggplot(dat, aes(x = count)) + #' geom_histogram() + #' facet_wrap(~ species, scales = 'free') #' #' ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) + #' geom_point(size = 2.25) + #' facet_wrap(~ species, scales = 'free') + #' scale_color_viridis_c() #' #' # ------------------------------------------------------------------------ #' # Model Fitting with Custom Priors #' # ------------------------------------------------------------------------ #' #' # Inspect default priors for a joint species model with three spatial factors #' priors <- get_mvgam_priors( #' formula = count ~ #' # Environmental model includes random slopes for #' # a linear effect of temperature #' s(species, bs = 're', by = temperature), #' #' # Each factor estimates a different nonlinear spatial process, using #' # 'by = trend' as in other mvgam State-Space models #' factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, #' n_lv = 3, #' #' # The data and grouping variables #' data = dat, #' unit = site, #' species = species, #' #' # Poisson observations #' family = poisson() #' ) #' head(priors) #' #' # Fit a JSDM that estimates hierarchical temperature responses #' # and that uses three latent spatial factors #' mod <- jsdgam( #' formula = count ~ #' # Environmental model includes random slopes for a #' # linear effect of temperature #' s(species, bs = 're', by = temperature), #' #' # Each factor estimates a different nonlinear spatial process, using #' # 'by = trend' as in other mvgam State-Space models #' factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, #' n_lv = 3, #' #' # Change default priors for fixed random effect variances and #' # factor GP marginal deviations to standard normal #' priors = c( #' prior(std_normal(), class = sigma_raw), #' prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend1`), #' prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend2`), #' prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend3`) #' ), #' #' # The data and the grouping variables #' data = dat, #' unit = site, #' species = species, #' #' # Poisson observations #' family = poisson(), #' chains = 2, #' silent = 2 #' ) #' #' # ------------------------------------------------------------------------ #' # Model Visualization and Diagnostics #' # ------------------------------------------------------------------------ #' #' # Plot the implicit species-level intercept estimates #' plot_predictions(mod, condition = 'species', type = 'link') #' #' # Plot species' hierarchical responses to temperature #' plot_predictions( #' mod, #' condition = c('temperature', 'species', 'species'), #' type = 'link' #' ) #' #' # Plot posterior median estimates of the latent spatial factors #' plot(mod, type = 'smooths', trend_effects = TRUE) #' #' # Or using gratia, if you have it installed #' if(requireNamespace('gratia', quietly = TRUE)){ #' gratia::draw(mod, trend_effects = TRUE, dist = 0) #' } #' #' # Plot species' randomized quantile residual distributions #' # as a function of latitude #' pp_check( #' mod, #' type = 'resid_ribbon_grouped', #' group = 'species', #' x = 'lat', #' ndraws = 200 #' ) #' #' # ------------------------------------------------------------------------ #' # Residual Correlation Analysis #' # ------------------------------------------------------------------------ #' #' # Calculate residual spatial correlations #' post_cors <- residual_cor(mod) #' names(post_cors) #' #' # Look at lower and upper credible interval estimates for #' # some of the estimated correlations #' post_cors$cor[1:5, 1:5] #' post_cors$cor_upper[1:5, 1:5] #' post_cors$cor_lower[1:5, 1:5] #' #' # Plot of the posterior median correlations for those estimated #' # to be non-zero #' plot(post_cors, cluster = TRUE) #' #' # An ordination biplot can also be constructed #' # from the factor scores and their loadings #' if(requireNamespace('ggrepel', quietly = TRUE)){ #' ordinate(mod) #' } #' #' # ------------------------------------------------------------------------ #' # Model Validation and Prediction #' # ------------------------------------------------------------------------ #' #' # Posterior predictive checks and ELPD-LOO can ascertain model fit #' pp_check( #' mod, #' type = "pit_ecdf_grouped", #' group = "species", #' ndraws = 200 #' ) #' loo(mod) #' #' # Forecast log(counts) for entire region (site value doesn't matter as long #' # as each spatial location has a different and unique site identifier); #' # note this calculation takes a few minutes because of the need to calculate #' # draws from the stochastic latent factors #' newdata <- st_process %>% #' dplyr::mutate(species = factor( #' species, #' levels = paste0('species_', 1:N_species) #' )) %>% #' dplyr::group_by(lat, lon) %>% #' dplyr::mutate(site = dplyr::cur_group_id()) %>% #' dplyr::ungroup() #' preds <- predict(mod, newdata = newdata) #' #' # Plot the median log(count) predictions on a grid #' newdata$log_count <- preds[,1] #' ggplot(newdata, aes(x = lon, y = lat, col = log_count)) + #' geom_point(size = 1.5) + #' facet_wrap(~ species, scales = 'free') + #' scale_color_viridis_c() + #' theme_classic() #' #' # Not needed for general use; cleans up connections for automated testing #' closeAllConnections() #' } #'@export jsdgam = function( formula, factor_formula = ~ -1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, residuals = TRUE, ... ) { #### Validate arguments and initialise the model skeleton #### validate_pos_integer(n_lv) # Prep the trend so that the data can be structured in the usual # mvgam fashion (with 'time' and 'series' variables) unit <- deparse0(substitute(unit)) subgr <- deparse0(substitute(species)) prepped_trend <- prep_jsdgam_trend(unit = unit, subgr = subgr, data = data) data_train <- validate_series_time(data = data, trend_model = prepped_trend) # Set up a simple trend_map to get the model dimensions correct; # this requires that we only have n_lv trends and that each series # only maps to one distinct trend, resulting in a loading matrix of # the correct size (n_series x n_lv) trend_map <- prep_jsdgam_trendmap(data_train, n_lv) # Set up the model structure but leave autoformat off so that the # model file can be easily modified mod <- suppressWarnings(mvgam( formula = formula, trend_formula = factor_formula, knots = knots, trend_knots = factor_knots, family = family, share_obs_params = share_obs_params, priors = priors, trend_model = 'None', trend_map = trend_map, data = data_train, noncentred = TRUE, run_model = FALSE, autoformat = FALSE, backend = backend, ... )) model_file <- mod$model_file #### Modify model data and model file #### # Remove Z from supplied data model_file <- model_file[ -grep( "matrix[n_series, n_lv] Z; // matrix mapping series to latent states", model_file, fixed = TRUE ) ] # Add M to data block model_file[grep( 'int n_lv; // number of dynamic factors', model_file, fixed = TRUE )] <- paste0( 'int n_lv; // number of dynamic factors\n', 'int M; // number of nonzero lower-triangular factor loadings' ) model_file <- readLines(textConnection(model_file), n = -1) # Update parameters model_file <- model_file[ -grep("// latent state SD terms", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("vector[n_lv] sigma;", model_file, fixed = TRUE) ] model_file[grep( "matrix[n, n_lv] LV_raw;", model_file, fixed = TRUE )] <- paste0( "matrix[n, n_lv] LV_raw;\n\n", "// factor lower triangle loadings\n", "vector[M] L_lower;\n", "// factor diagonal loadings\n", "vector[n_lv] L_diag;" ) model_file <- readLines(textConnection(model_file), n = -1) # Update transformed parameters model_file <- model_file[-grep("// latent states", model_file, fixed = TRUE)] model_file <- model_file[-grep("lv_coefs = Z;", model_file, fixed = TRUE)] model_file <- model_file[ -grep("matrix[n, n_lv] LV;", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("trend_mus = X_trend * b_trend;", model_file, fixed = TRUE) ] model_file[grep( "matrix[n_series, n_lv] lv_coefs;", model_file, fixed = TRUE )] <- paste0( "matrix[n_series, n_lv] lv_coefs = rep_matrix(0, n_series, n_lv);\n", 'matrix[n, n_lv] LV;\n' ) starts <- grep( "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));", model_file, fixed = TRUE ) ends <- starts + 5 model_file <- model_file[-(starts:ends)] # Simplified latent variable creation if no terms in factor_formula if ( is.null(rownames(attr(terms.formula(factor_formula), 'factors'))) & is.null(colnames(attr(terms.formula(factor_formula), 'factors'))) ) { model_file[grep( "// latent process linear predictors", model_file, fixed = TRUE )] <- paste0( "// latent process linear predictors\n", "trend_mus = X_trend * b_trend;\n\n", "// constraints allow identifiability of loadings\n", "{\n", "int idx;\n", "idx = 0;\n", "for(j in 1 : n_lv) lv_coefs[j, j] = L_diag[j];\n", "for(j in 1 : n_lv) {\n", "for(k in (j + 1) : n_series) {\n", "idx = idx + 1;\n", "lv_coefs[k, j] = L_lower[idx];\n", "}\n", "}\n", "}\n\n", "// raw latent factors\n", "LV = LV_raw;\n" ) } else { model_file[grep( "// latent process linear predictors", model_file, fixed = TRUE )] <- paste0( "// latent process linear predictors\n", "trend_mus = X_trend * b_trend;\n\n", "// constraints allow identifiability of loadings\n", "{\n", "int idx;\n", "idx = 0;\n", "for(j in 1 : n_lv) lv_coefs[j, j] = L_diag[j];\n", "for(j in 1 : n_lv) {\n", "for(k in (j + 1) : n_series) {\n", "idx = idx + 1;\n", "lv_coefs[k, j] = L_lower[idx];\n", "}\n", "}\n", "}\n\n", "// raw latent factors (with linear predictors)\n", "for (j in 1 : n_lv) {\n", "for (i in 1 : n) {\n", "LV[i, j] = trend_mus[ytimes_trend[i, j]] + LV_raw[i, j];\n", "}\n}\n" ) } model_file <- model_file[ -grep("// derived latent states", model_file, fixed = TRUE) ] model_file <- readLines(textConnection(model_file), n = -1) # Update model block sigma_prior <- grep( "// priors for latent state SD parameters", model_file, fixed = TRUE ) + 1 model_file <- model_file[-sigma_prior] # Use standard normal for loadings in most models, apart from # those using identify link if (family_links(mod$family) != 'identity') { model_file[grep( "// priors for latent state SD parameters", model_file, fixed = TRUE )] <- paste0( "// priors for factors and loading coefficients\n", "L_lower ~ std_normal();\n", "L_diag ~ std_normal();" ) model_file <- readLines(textConnection(model_file), n = -1) } else { model_file[grep( "// priors for latent state SD parameters", model_file, fixed = TRUE )] <- paste0( "// priors for factors and loading coefficients\n", "L_lower ~ student_t(3, 0, 1);\n", "L_diag ~ student_t(3, 0, 1);" ) model_file <- readLines(textConnection(model_file), n = -1) } # Update generated quantities model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] mus;\n', 'vector[n_lv] sigma;' ) model_file[grep( "penalty = 1.0 / (sigma .* sigma);", model_file, fixed = TRUE )] <- paste0( "penalty = rep_vector(1.0, n_lv);\n", "sigma = rep_vector(1.0, n_lv);" ) model_file <- readLines(textConnection(model_file), n = -1) model_file <- sanitise_modelfile(model_file) # Remove Z from model_data as it is no longer needed model_data <- mod$model_data model_data$Z <- NULL # Add M to model_data n_series <- NCOL(model_data$ytimes) model_data$M <- n_lv * (n_series - n_lv) + n_lv * (n_lv - 1) / 2 #### Autoformat the Stan code #### if (requireNamespace('cmdstanr', quietly = TRUE) & backend == 'cmdstanr') { if ( requireNamespace('cmdstanr') & cmdstanr::cmdstan_version() >= "2.29.0" ) { model_file <- .autoformat( model_file, overwrite_file = FALSE, backend = 'cmdstanr', silent = silent >= 1L ) } model_file <- readLines(textConnection(model_file), n = -1) } else { model_file <- .autoformat( model_file, overwrite_file = FALSE, backend = 'rstan', silent = silent >= 1L ) model_file <- readLines(textConnection(model_file), n = -1) } # Remove lp__ from monitor params if VB is to be used param <- unique(c(mod$monitor_pars, 'Sigma', 'LV')) if (algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')) { param <- param[!param %in% 'lp__'] } #### Determine what to return #### if (!run_model) { mod$model_file <- model_file mod$monitor_pars <- param attr(model_data, 'trend_model') <- 'None' attr(model_data, 'prepped_trend_model') <- prepped_trend attr(model_data, 'noncentred') <- NULL attr(model_data, 'threads') <- threads mod$model_data <- model_data out <- mod } else { # Check if cmdstan is accessible; if not, use rstan if (backend == 'cmdstanr') { if (!requireNamespace('cmdstanr', quietly = TRUE)) { if (silent < 2) { message('cmdstanr library not found; defaulting to rstan') } use_cmdstan <- FALSE } else { use_cmdstan <- TRUE if (is.null(cmdstanr::cmdstan_version(error_on_NA = FALSE))) { warning( 'cmdstanr library found but Cmdstan not found. Defaulting to rstan' ) use_cmdstan <- FALSE } } } #### Run the model #### if (use_cmdstan) { # Prepare threading and generate the model cmd_mod <- .model_cmdstanr(model_file, threads = threads, silent = silent) # Condition the model using Cmdstan out_gam_mod <- .sample_model_cmdstanr( model = cmd_mod, algorithm = algorithm, prior_simulation = FALSE, data = model_data, chains = chains, parallel = parallel, silent = silent, max_treedepth = control$max_treedepth, adapt_delta = control$adapt_delta, threads = threads, burnin = burnin, samples = samples, param = param, save_all_pars = FALSE, ... ) } else { # Condition the model using rstan requireNamespace('rstan', quietly = TRUE) out_gam_mod <- .sample_model_rstan( model = model_file, algorithm = algorithm, prior_simulation = FALSE, data = model_data, chains = chains, parallel = parallel, silent = silent, max_treedepth = control$max_treedepth, adapt_delta = control$adapt_delta, threads = threads, burnin = burnin, samples = samples, thin = thin, ... ) } # After modeling (add a new class to make predictions and other post-processing # simpler) out1 <- mod out1$model_output <- out_gam_mod class(out1) <- c('mvgam') if (residuals) { mod_residuals <- dsresids_vec(out1) } else { mod_residuals <- NULL } rm(out1) # Add the posterior median coefficients to the mgcv objects ss_gam <- mod$mgcv_model V <- cov(mcmc_chains(out_gam_mod, 'b')) ss_gam$Vp <- ss_gam$Vc <- V p <- mcmc_summary( out_gam_mod, 'b', variational = algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace') )[, c(4)] names(p) <- names(ss_gam$coefficients) ss_gam$coefficients <- p trend_mgcv_model <- mod$trend_mgcv_model V <- cov(mcmc_chains(out_gam_mod, 'b_trend')) trend_mgcv_model$Vp <- trend_mgcv_model$Vc <- V p <- mcmc_summary( out_gam_mod, 'b_trend', variational = algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace') )[, c(4)] names(p) <- names(trend_mgcv_model$coefficients) trend_mgcv_model$coefficients <- p #### Return the output as class mvgam #### trim_data <- list() attr(trim_data, 'threads') <- threads attr(trim_data, 'noncentred') <- NULL attr(trim_data, 'trend_model') <- 'None' attr(trim_data, 'prepped_trend_model') <- prepped_trend # Extract sampler arguments dots <- list(...) if ('adapt_delta' %in% names(dots)) { message( 'argument "adapt_delta" should be supplied as an element in "control"' ) adapt_delta <- dots$adapt_delta dots$adapt_delta <- NULL } else { adapt_delta <- control$adapt_delta if (is.null(adapt_delta)) adapt_delta <- 0.8 } if ('max_treedepth' %in% names(dots)) { message( 'argument "max_treedepth" should be supplied as an element in "control"' ) max_treedepth <- dots$max_treedepth dots$max_treedepth <- NULL } else { max_treedepth <- control$max_treedepth if (is.null(max_treedepth)) max_treedepth <- 10 } out <- structure( list( call = mod$call, trend_call = factor_formula, family = mod$family, share_obs_params = mod$share_obs_params, trend_model = 'None', trend_map = trend_map, drift = FALSE, priors = mod$priors, model_output = out_gam_mod, model_file = model_file, model_data = if (return_model_data) { model_data } else { trim_data }, inits = NULL, monitor_pars = param, sp_names = mod$sp_names, trend_sp_names = mod$trend_sp_names, mgcv_model = ss_gam, trend_mgcv_model = trend_mgcv_model, ytimes = mod$ytimes, resids = mod_residuals, use_lv = TRUE, n_lv = n_lv, upper_bounds = mod$upper_bounds, obs_data = mod$obs_data, test_data = mod$test_data, fit_engine = 'stan', backend = backend, algorithm = algorithm, max_treedepth = max_treedepth, adapt_delta = adapt_delta ), class = c('mvgam', 'jsdgam') ) } return(out) } #' Prep trend for jsdgam #' @noRd prep_jsdgam_trend = function(data, unit, subgr) { unit <- as_one_character(unit) subgr <- as_one_character(subgr) validate_var_exists( data = data, variable = unit, type = 'num/int', name = 'data', trend_char = 'ZMVN' ) validate_var_exists( data = data, variable = subgr, type = 'factor', name = 'data', trend_char = 'ZMVN' ) out <- structure( list( trend_model = 'ZMVN', ma = FALSE, cor = TRUE, unit = unit, gr = "NA", subgr = subgr, label = NULL ), class = 'mvgam_trend' ) } #' @noRd prep_jsdgam_trendmap = function(data, n_lv) { if (n_lv > nlevels(data$series)) { stop( 'Number of factors must be <= number of levels in species', call. = FALSE ) } data.frame( trend = rep(1:n_lv, nlevels(data$series))[1:nlevels(data$series)], series = factor(levels(data$series), levels = levels(data$series)) ) } ================================================ FILE: R/lfo_cv.mvgam.R ================================================ #'@title Approximate leave-future-out cross-validation of fitted \pkg{mvgam} objects #'@name lfo_cv.mvgam #'@importFrom stats update logLik #'@param object \code{list} object of class \code{mvgam}. See [mvgam()] #'@param data A \code{dataframe} or \code{list} containing the model response variable and covariates #'required by the GAM \code{formula}. Should include columns: #''series' (character or factor index of the series IDs) #''time' (numeric index of the time point for each observation). #'Any other variables to be included in the linear predictor of \code{formula} must also be present #'@param min_t Integer specifying the minimum training time required before making predictions #'from the data. Default is either the `30`th timepoint in the observational data, #'or whatever training time allows for at least #'`10` lfo-cv calculations, if possible. #'This value is essentially arbitrary so it is highly recommended to change it #'to something that is more suitable to the #'data and models being evaluated. #'@param fc_horizon Integer specifying the number of time steps ahead for evaluating forecasts #'@param pareto_k_threshold Proportion specifying the threshold over which the Pareto shape parameter #'is considered unstable, triggering a model refit. Default is `0.7` #'@param silent Verbosity level between `0` and `2`. If `1` (the default), most of the informational #'messages of compiler and sampler are suppressed. If `2`, even more messages are suppressed. The #'actual sampling progress is still printed. Set `refresh = 0` to turn this off as well. If using #'`backend = "rstan"` you can also set open_progress = FALSE to prevent opening additional #'progress bars. #'@param ... Ignored #'@details Approximate leave-future-out cross-validation uses an expanding training window scheme #' to evaluate a model on its forecasting ability. The steps used in this function mirror those laid out #' in the [lfo vignette from the `loo` package](https://mc-stan.org/loo/articles/loo2-lfo.html), #' written by Paul Bürkner, Jonah Gabry, Aki Vehtari. First, we refit the model using the first `min_t` #' observations to perform a single exact `fc_horizon`-ahead forecast step. This forecast is evaluated against #' the `min_t + fc_horizon` out of sample observations using the Expected Log Predictive Density (ELPD). #' Next, we approximate each successive round of #' expanding window forecasts by moving forward one step at a time `for i in 1:N_evaluations` and re-weighting #' draws from the model's posterior predictive distribution using Pareto Smoothed #' Importance Sampling (PSIS). In each iteration `i`, PSIS weights are obtained for the next observation #' that would have been included in the model if we had re-fit (i.e. the last observation that would have #' been in the training data, or `min_t + i`). If these importance ratios are stable, we consider the #' approximation adequate and use the re-weighted posterior's forecast for evaluating the next holdout #' set of testing observations (`(min_t + i + 1):(min_t + i + fc_horizon)`). At some point the #' importance ratio variability will become too large and importance sampling will fail. This is #' indicated by the estimated shape parameter `k` of the generalized Pareto distribution #' crossing a certain threshold `pareto_k_threshold`. Only then do we refit the model using #' all of the observations up to the time of the failure. We then restart the process and iterate forward #' until the next refit is triggered (Bürkner et al. 2020). #'@return A `list` of class `mvgam_lfo` containing the approximate ELPD scores, #'the Pareto-k shape values and 'the specified `pareto_k_threshold` #'@seealso \code{\link{forecast}}, \code{\link{score}}, \code{\link{compare_mvgams}} #'@references Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models #'Journal of Statistical Computation and Simulation. 90:14, 2499-2523. #'@examples #'\dontrun{ #'# Simulate from a Poisson-AR2 model with a seasonal smooth #'set.seed(100) #'dat <- sim_mvgam(T = 75, #' n_series = 1, #' prop_trend = 0.75, #' trend_model = 'AR2', #' family = poisson()) #' #'# Plot the time series #'plot_mvgam_series(data = dat$data_train, #' newdata = dat$data_test, #' series = 1) #' #'# Fit an appropriate model #'mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(p = 2), #' family = poisson(), #' data = dat$data_train, #' newdata = dat$data_test, #' chains = 2, #' silent = 2) #' #'# Fit a less appropriate model #'mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), #' trend_model = RW(), #' family = poisson(), #' data = dat$data_train, #' newdata = dat$data_test, #' chains = 2, #' silent = 2) #' #'# Compare Discrete Ranked Probability Scores for the testing period #'fc_ar2 <- forecast(mod_ar2) #'fc_rw <- forecast(mod_rw) #'score_ar2 <- score(fc_ar2, score = 'drps') #'score_rw <- score(fc_rw, score = 'drps') #'sum(score_ar2$series_1$score) #'sum(score_rw$series_1$score) #' #'# Now use approximate leave-future-out CV to compare #'# rolling forecasts; start at time point 40 to reduce #'# computational time and to ensure enough data is available #'# for estimating model parameters #'lfo_ar2 <- lfo_cv(mod_ar2, #' min_t = 40, #' fc_horizon = 3, #' silent = 2) #'lfo_rw <- lfo_cv(mod_rw, #' min_t = 40, #' fc_horizon = 3, #' silent = 2) #' #'# Plot Pareto-K values and ELPD estimates #'plot(lfo_ar2) #'plot(lfo_rw) #' #'# Proportion of timepoints in which AR2 model gives better forecasts #'length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / #' length(lfo_ar2$elpds) #' #'# A higher total ELPD is preferred #'lfo_ar2$sum_ELPD #'lfo_rw$sum_ELPD #'} #'@author Nicholas J Clark #'@export lfo_cv <- function(object, ...) { UseMethod("lfo_cv", object) } #'@rdname lfo_cv.mvgam #'@method lfo_cv mvgam #'@export lfo_cv.mvgam = function( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... ) { validate_proportional(pareto_k_threshold) validate_pos_integer(fc_horizon) if (missing(data)) { all_data <- object$obs_data } else { all_data <- validate_series_time( data, name = 'data', trend_model = object$trend_model ) } N <- max(all_data$index..time..index) all_unique_times <- sort(unique(all_data$index..time..index)) # Default minimum training time is the 30th timepoint, or # whatever training time allows for at least 10 lfo_cv calculations if (missing(min_t)) { if (length(all_unique_times) > 30) { min_t <- pmin(max(1, N - 10 - fc_horizon), all_unique_times[30]) } else if (length(all_unique_times) < 30 & length(all_unique_times) > 20) { min_t <- pmin(max(1, N - 10 - fc_horizon), all_unique_times[20]) } else if (length(all_unique_times) < 20 & length(all_unique_times) > 10) { min_t <- pmin(max(1, N - 10 - fc_horizon), all_unique_times[10]) } else { min_t <- 1 } } if (min_t < 0) { min_t <- 1 } validate_pos_integer(min_t) if (min_t >= N) { stop('Argument "min_t" is >= the maximum training time', call. = FALSE) } # Store the Expected Log Predictive Density (EPLD) at each time point approx_elpds <- rep(NA, N) # Initialize the process for i = min_t, generating a # conditional forecast for all of the future data data_splits <- cv_split(all_data, last_train = min_t, fc_horizon = fc_horizon) # Fit model to training and forecast all remaining testing observations noncentred <- if (is.null(attr(object$model_data, 'noncentred'))) { FALSE } else { TRUE } if (silent < 1L) { cat('Approximating elpd for training point', min_t, '...\n') } fit_past <- update( object, data = data_splits$data_train, newdata = data_splits$data_test, lfo = TRUE, noncentred = noncentred, silent = silent ) # Calculate log likelihoods of forecast observations for the next # fc_horizon ahead observations fc_indices <- which( c(data_splits$data_train$time, data_splits$data_test$time) %in% (min_t + 1):(min_t + fc_horizon) ) loglik_past <- logLik(fit_past) # Store the EPLD estimate approx_elpds[min_t + 1] <- log_mean_exp(sum_rows(loglik_past[, fc_indices])) # Iterate over i > min_t i_refit <- min_t refits <- min_t ks <- 0 for (i in (min_t + 1):(N - fc_horizon)) { if (silent < 1L) { cat('Approximating elpd for training point', i, '...\n') } # Get log likelihoods of what would be the # last training observations for calculating Pareto k values last_obs_indices <- which( c(data_splits$data_train$time, data_splits$data_test$time) %in% (i_refit + 1):i ) logratio <- sum_rows(loglik_past[, last_obs_indices]) # Use PSIS to estimate whether the Pareto shape parameter of the # importance weights is below the specified threshold; a lower value # indicates the importance ratios have finite variance and can be # used for approximating prediction error psis_obj <- suppressWarnings(loo::psis(logratio)) k <- loo::pareto_k_values(psis_obj) ks <- c(ks, k) # If k is too high, refit the model based on the first i observations; # in other words, the last refit did not provide stable enough predictions # of what would be the last set of training observations; we instead need # to include these in the training data, resulting in a slightly larger # model if (k > pareto_k_threshold) { i_refit <- i refits <- c(refits, i) # Subset the data to now include the last set of training observations data_splits <- cv_split(all_data, last_train = i, fc_horizon = fc_horizon) # Re-fit the model fit_past <- update( fit_past, data = data_splits$data_train, newdata = data_splits$data_test, lfo = TRUE, noncentred = noncentred, silent = silent ) # Calculate ELPD as before fc_indices <- which( c(data_splits$data_train$time, data_splits$data_test$time) %in% (i + 1):(i + fc_horizon) ) loglik_past <- logLik(fit_past) approx_elpds[i + 1] <- log_mean_exp(sum_rows(loglik_past[, fc_indices])) } else { # If k below threshold, calculate log likelihoods for the # forecast observations using the normalised importance weights # to weight the posterior draws fc_indices <- which( c(data_splits$data_train$time, data_splits$data_test$time) %in% (i + 1):(i + fc_horizon) ) lw <- loo::weights.importance_sampling(psis_obj, normalize = TRUE)[, 1] approx_elpds[i + 1] <- log_sum_exp( lw + sum_rows(loglik_past[, fc_indices]) ) } } return(structure( list( elpds = approx_elpds[(min_t + 1):(N - fc_horizon)], sum_ELPD = sum(approx_elpds, na.rm = TRUE), pareto_ks = ks[-1], eval_timepoints = (min_t + 1):(N - fc_horizon), pareto_k_threshold = pareto_k_threshold ), class = 'mvgam_lfo' )) } #' Plot Pareto-k and ELPD values from a `mvgam_lfo` object #' #' This function takes an object of class `mvgam_lfo` and creates several #' informative diagnostic plots #' @importFrom graphics layout axis lines abline polygon points #' @param x An object of class `mvgam_lfo` #' @param ... Ignored #' @return A `ggplot` object presenting Pareto-k and ELPD values over the #' evaluation timepoints. For the Pareto-k plot, a dashed red line indicates the #' specified threshold chosen for triggering model refits. For the ELPD plot, #' a dashed red line indicates the bottom 10% quantile of ELPD values. Points below #' this threshold may represent outliers that were more difficult to forecast #' @export plot.mvgam_lfo = function(x, ...) { object <- x # Plot Pareto-k values over time object$pareto_ks[which(is.infinite(object$pareto_ks))] <- max(object$pareto_ks[which(!is.infinite(object$pareto_ks))]) dplyr::tibble( eval_timepoints = object$eval_timepoints, elpds = object$elpds, pareto_ks = object$pareto_ks ) -> obj_tribble # Hack so we don't have to import tidyr just to use pivot_longer once dplyr::bind_rows( obj_tribble %>% dplyr::select(eval_timepoints, elpds) %>% dplyr::mutate(name = 'elpds', value = elpds) %>% dplyr::select(-elpds), obj_tribble %>% dplyr::select(eval_timepoints, pareto_ks) %>% dplyr::mutate(name = 'pareto_ks', value = pareto_ks) %>% dplyr::select(-pareto_ks) ) %>% dplyr::left_join( dplyr::tribble( ~name , ~threshold , "elpds" , quantile(object$elpds, probs = 0.15, na.rm = TRUE) , "pareto_ks" , object$pareto_k_threshold ), by = "name" ) %>% dplyr::rowwise() %>% dplyr::mutate( colour = dplyr::case_when( name == 'elpds' & value < threshold ~ "outlier", name == 'pareto_ks' & value > threshold ~ "outlier", TRUE ~ "inlier" ) ) %>% dplyr::ungroup() %>% ggplot2::ggplot(ggplot2::aes(eval_timepoints, value)) + ggplot2::facet_wrap( ~ factor( name, levels = c("pareto_ks", "elpds"), labels = c("Pareto K", "ELPD") ), ncol = 1, scales = "free_y" ) + ggplot2::geom_hline( ggplot2::aes(yintercept = threshold), colour = "#A25050", linetype = "dashed", linewidth = 1 ) + ggplot2::geom_line(linewidth = 0.5, col = "grey30") + ggplot2::geom_point(shape = 16, colour = 'white', size = 2) + ggplot2::geom_point( ggplot2::aes(colour = colour), shape = 16, show.legend = F, size = 1.5 ) + ggplot2::scale_colour_manual(values = c("grey30", "#8F2727")) + ggplot2::labs(x = "Evaluation time", y = NULL) + ggplot2::theme_bw() } #' Function to generate training and testing splits #' @noRd cv_split = function(data, last_train, fc_horizon = 1) { if (inherits(data, 'list')) { # Find indices of training and testing splits temp_dat = data.frame( time = data$index..time..index, series = data$series ) %>% dplyr::mutate(index = dplyr::row_number()) %>% dplyr::arrange(time, series) indices_train <- temp_dat %>% dplyr::filter(time <= last_train) %>% dplyr::pull(index) indices_test <- temp_dat %>% dplyr::filter(time > last_train) %>% dplyr::pull(index) # Split data_train <- lapply(data, function(x) { if (is.matrix(x)) { matrix(x[indices_train, ], ncol = NCOL(x)) } else { x[indices_train] } }) data_test <- lapply(data, function(x) { if (is.matrix(x)) { matrix(x[indices_test, ], ncol = NCOL(x)) } else { x[indices_test] } }) } else { data_train <- data %>% dplyr::filter(index..time..index <= last_train) %>% dplyr::arrange(index..time..index, series) data_test <- data %>% dplyr::filter(index..time..index > last_train) %>% dplyr::arrange(index..time..index, series) } return(list(data_train = data_train, data_test = data_test)) } #' More stable version of log(sum(exp(x))) #' @noRd log_sum_exp <- function(x) { max_x <- max(x) max_x + log(sum(exp(x - max_x))) } #' More stable version of log(mean(exp(x))) #' @noRd log_mean_exp <- function(x) { log_sum_exp(x) - log(length(x)) } #' Summing without NAs #' @noRd sum_rows = function(x) { if (NCOL(x) > 1) { out <- rowSums(x, na.rm = TRUE) } else { out <- x[!is.na(x)] } return(out) } ================================================ FILE: R/logLik.mvgam.R ================================================ #' @title Compute pointwise Log-Likelihoods from fitted \pkg{mvgam} objects #' #' @importFrom parallel setDefaultCluster stopCluster #' #' @param object \code{list} object of class \code{mvgam} or \code{jsdgam} #' #' @param linpreds Optional `matrix` of linear predictor draws to use for #' calculating pointwise log-likelihoods. #' #' @param newdata Optional `data.frame` or `list` object specifying which series #' each column in `linpreds` belongs to. If `linpreds` is supplied, then #' `newdata` must also be supplied. #' #' @param family_pars Optional `list` containing posterior draws of #' family-specific parameters (i.e. shape, scale or overdispersion parameters). #' Required if `linpreds` and `newdata` are supplied. #' #' @param include_forecast Logical. If `newdata` were fed to the model to #' compute forecasts, should the log-likelihood draws for these observations #' also be returned. Defaults to `TRUE`. #' #' @param ... Ignored #' #' @return A `matrix` of dimension `n_samples x n_observations` containing the #' pointwise log-likelihood draws for all observations in `newdata`. If no #' `newdata` is supplied, log-likelihood draws are returned for all observations #' that were originally fed to the model (training observations and, if supplied #' to the original model via the `newdata` argument in \code{\link{mvgam}}, #' testing observations). #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam( #' n_series = 1, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Extract log-likelihood values #' lls <- logLik(mod) #' str(lls) #' } #' #' @export logLik.mvgam = function( object, linpreds, newdata, family_pars, include_forecast = TRUE, ... ) { if (!missing(linpreds) & missing(newdata)) { stop('argument "newdata" must be supplied when "linpreds" is supplied') } if (!missing(linpreds) & missing(family_pars)) { stop('argument "family_pars" must be supplied when "linpreds" is supplied') } if (!missing(newdata) & missing(linpreds)) { stop('argument "linpreds" must be supplied when "newdata" is supplied') } if (!missing(family_pars) & missing(linpreds)) { stop('argument "linpreds" must be supplied when "family_pars" is supplied') } # Extract the linear predictor draws if (missing(linpreds)) { mus <- mcmc_chains(object$model_output, 'mus') } else { mus <- linpreds } # Need to know which series each observation belongs to so we can # pull out appropriate family-level parameters (overdispersions, shapes, etc...) if (!missing(newdata)) { all_dat <- data.frame(series = newdata$series, y = newdata$y) if (object$family == 'nmix') { all_dat$cap <- newdata$cap } } else { if (is.null(object$test_data)) { all_dat <- data.frame( series = object$obs_data$series, time = object$obs_data$time, y = object$obs_data$y ) %>% dplyr::arrange(time, series) if (object$family == 'nmix') { all_dat$cap <- data.frame( series = object$obs_data$series, time = object$obs_data$time, cap = object$obs_data$cap ) %>% dplyr::select(series, time, cap) %>% dplyr::arrange(time, series) %>% dplyr::pull(cap) } } else { all_dat <- data.frame( series = c(object$obs_data$series, object$test_data$series), time = c(object$obs_data$time, object$test_data$time), y = c(object$obs_data$y, object$test_data$y) ) %>% dplyr::arrange(time, series) if (object$family == 'nmix') { all_dat$cap <- data.frame( series = c(object$obs_data$series, object$test_data$series), time = c(object$obs_data$time, object$test_data$time), cap = c(object$obs_data$cap, object$test_data$cap) ) %>% dplyr::select(series, time, cap) %>% dplyr::arrange(time, series) %>% dplyr::pull(cap) } } } obs <- all_dat$y series_obs <- as.numeric(all_dat$series) # Supply forecast NAs if include_forecast is FALSE if (!is.null(object$test_data) & !include_forecast & missing(newdata)) { n_fc_obs <- length(object$test_data$y) n_obs <- length(obs) obs[((n_obs - n_fc_obs) + 1):n_obs] <- NA } # Family-specific parameters family <- object$family if (missing(family_pars)) { family_pars <- extract_family_pars(object = object) n_series <- NCOL(object$ytimes) } else { n_series <- length(object$series_names) } # Family parameters spread into a vector family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, series_obs]) } else { as.vector(matrix( rep(family_pars[[j]], NCOL(mus)), nrow = NROW(mus), byrow = FALSE )) } }) names(family_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep(as.vector(attr(object$mgcv_model, 'trials')), NROW(mus)), nrow = NROW(mus), byrow = TRUE )) family_extracts$trials <- trials } # Create a truth matrix that can also be spread to a vector truth_mat <- matrix(rep(obs, NROW(mus)), nrow = NROW(mus), byrow = TRUE) # Log-likelihood as a vector Xp <- as.matrix(as.vector(mus)) attr(Xp, 'model.offset') <- 0 if (family == 'nmix') { Xp <- as.matrix(qlogis(as.vector(mcmc_chains( object$model_output, 'detprob' )))) attr(Xp, 'model.offset') <- 0 latent_lambdas <- exp(as.vector(mcmc_chains(object$model_output, 'trend'))) cap_mat <- matrix( rep(all_dat$cap, NROW(mus)), nrow = NROW(mus), byrow = TRUE ) cap <- as.vector(cap_mat) } else { latent_lambdas <- NULL cap <- NULL } log_lik_vec <- mvgam_predict( family = family, family_pars = family_extracts, truth = as.vector(truth_mat), latent_lambdas = latent_lambdas, cap = cap, type = 'link', Xp = Xp, betas = 1, density = TRUE ) # Convert back to matrix and return log_lik_mat <- matrix(log_lik_vec, nrow = NROW(mus)) return(log_lik_mat) } ================================================ FILE: R/loo.mvgam.R ================================================ #' LOO information criteria for \pkg{mvgam} models #' #' Extract the LOOIC (leave-one-out information criterion) using [loo::loo()]. #' #' @importFrom loo loo is.loo #' #' @param x Object of class `mvgam` or `jsdgam` #' #' @param incl_dynamics Deprecated and currently ignored #' #' @param ... Additional arguments for [loo::loo()] #' #' @rdname loo.mvgam #' #' @return For `loo.mvgam`, an object of class `psis_loo` (see [loo::loo()] #' for details). For `loo_compare.mvgam`, an object of class `compare.loo` #' (see [loo::loo_compare()] for details). #' #' @details #' When comparing two (or more) fitted `mvgam` models, we can estimate the #' difference in their in-sample predictive accuracies using the Expected Log #' Predictive Density (ELPD). This metric can be approximated using Pareto #' Smoothed Importance Sampling (PSIS), which re-weights posterior draws to #' approximate predictions for a datapoint had it not been included in the #' original model fit (i.e. leave-one-out cross-validation). #' #' See [loo::loo()] and [loo::loo_compare()] for further details on how this #' importance sampling works. #' #' Note: In-sample predictive metrics such as PSIS-LOO can sometimes be overly #' optimistic for models that include process error components (e.g. those with #' `trend_model`, `trend_formula`, or `factor_formula`). Consider using #' out-of-sample evaluations for further scrutiny (see #' \code{\link{forecast.mvgam}}, \code{\link{score.mvgam_forecast}}, #' \code{\link{lfo_cv}}). #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' #-------------------------------------------------- #' # Simulate 4 time series with hierarchical seasonality #' # and independent AR1 dynamic processes #' #-------------------------------------------------- #' set.seed(111) #' #' simdat <- sim_mvgam( #' seasonality = 'hierarchical', #' trend_model = AR(), #' family = gaussian() #' ) #' #' # Fit a model with shared seasonality #' mod1 <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' data = rbind(simdat$data_train, simdat$data_test), #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' conditional_effects(mod1) #' #' mc.cores.def <- getOption('mc.cores') #' options(mc.cores = 1) #' loo(mod1) #' #' # Fit a model with hierarchical seasonality #' mod2 <- update( #' mod1, #' formula = y ~ s(season, bs = 'cc', k = 6) + #' s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), #' chains = 2, #' silent = 2 #' ) #' #' conditional_effects(mod2) #' loo(mod2) #' #' # Add AR1 dynamic errors to mod2 #' mod3 <- update( #' mod2, #' trend_model = AR(), #' chains = 2, #' silent = 2 #' ) #' #' conditional_effects(mod3) #' plot(mod3, type = 'trend') #' loo(mod3) #' #' #-------------------------------------------------- #' # Compare models using LOO #' #-------------------------------------------------- #' loo_compare(mod1, mod2, mod3) #' options(mc.cores = mc.cores.def) #' #' #-------------------------------------------------- #' # Compare forecast abilities using LFO-CV #' #-------------------------------------------------- #' #' lfo_mod2 <- lfo_cv(mod2, min_t = 92) #' lfo_mod3 <- lfo_cv(mod3, min_t = 92) #' #' # Plot forecast ELPD differences #' plot( #' y = lfo_mod2$elpds - lfo_mod3$elpds, #' x = lfo_mod2$eval_timepoints, #' pch = 16, #' ylab = 'ELPD_mod2 - ELPD_mod3', #' xlab = 'Evaluation timepoint' #' ) #' #' abline(h = 0, lty = 'dashed') #' } #' #' @export loo.mvgam <- function(x, incl_dynamics = FALSE, ...) { # Families with observation error components can give strange log-likelihood estimates # if process error components were also included in the model (this is because the # observation error estimates may be very small); use incl_dynamics = TRUE for these # families to ensure all errors are propagated appropriately when calculating the # log-likelihood incl_dynamics <- FALSE if ( x$family %in% c( "gaussian", "lognormal", "student" ) ) { incl_dynamics <- TRUE } if (x$family == 'nmix' | incl_dynamics) { logliks <- logLik(x, include_forecast = FALSE) } else { x$series_names <- levels(x$obs_data$series) logliks <- logLik( x, linpreds = predict( x, newdata = x$obs_data, type = 'link', summary = FALSE, process_error = FALSE ), newdata = x$obs_data, family_pars = extract_family_pars(x), include_forecast = FALSE ) } logliks <- clean_ll(x, logliks) releffs <- loo::relative_eff( exp(logliks), chain_id = sort(rep( 1:x$model_output@sim$chains, (NROW(logliks) / x$model_output@sim$chains) )) ) loo::loo(logliks, r_eff = releffs, ...) } #' @importFrom loo loo_compare #' #' @param x Object of class `mvgam` #' #' @param ... More \code{mvgam} objects #' #' @param model_names If `NULL` (the default) will use model names derived #' from deparsing the call. Otherwise will use the passed values as model names #' #' @param incl_dynamics Deprecated and currently ignored #' #' @rdname loo.mvgam #' #' @export loo_compare.mvgam <- function( x, ..., model_names = NULL, incl_dynamics = FALSE ) { models <- split_mod_dots(x, ..., model_names = model_names) loos <- named_list(names(models)) for (i in seq_along(models)) { loos[[i]] <- loo(models[[i]], incl_dynamics = incl_dynamics) } loo_compare(loos) } #' @export #' @importFrom loo loo loo::loo #' @export #' @importFrom loo loo_compare loo::loo_compare #'@noRd split_mod_dots = function(x, ..., model_names = NULL, other = TRUE) { dots <- list(x, ...) names <- substitute(list(x, ...), env = parent.frame())[-1] names <- ulapply(names, deparse) if (!is.null(model_names)) { names <- model_names } if (length(names)) { if (!length(names(dots))) { names(dots) <- names } else { has_no_name <- !nzchar(names(dots)) names(dots)[has_no_name] <- names[has_no_name] } } is_mvgam <- unlist(lapply(dots, function(y) inherits(y, 'mvgam'))) models <- dots[is_mvgam] out <- dots[!is_mvgam] if (length(out)) { stop( "Only model objects can be passed to '...' for this method.", call. = FALSE ) } models } #'@noRd named_list = function(names, values = NULL) { if (!is.null(values)) { if (length(values) <= 1L) { values <- replicate(length(names), values) } values <- as.list(values) stopifnot(length(values) == length(names)) } else { values <- vector("list", length(names)) } setNames(values, names) } #'@noRd clean_ll = function(x, logliks) { # First remove any columns that are all NA (these had missing observations) logliks <- logliks[, !apply(logliks, 2, function(x) all(!is.finite(x)))] # Next resample any remaining non-finite values (occasionally happens with # some observation families) samp_noinf = function(x) { x_finite <- x[is.finite(x)] x[!is.finite(x)] <- sample( x_finite, length(x[!is.finite(x)]), replace = TRUE ) x } logliks <- apply(logliks, 2, samp_noinf) # return logliks } ================================================ FILE: R/lv_correlations.R ================================================ #' Calculate trend correlations based on latent factor loadings for #' \pkg{mvgam} models #' #' This function uses factor loadings from a fitted dynamic factor #' \code{mvgam} model to calculate temporal correlations among series' trends. #' #' @importFrom stats cov2cor cov #' #' @param object \code{list} object of class \code{mvgam} that used latent #' factors, either with `use_lv = TRUE` or by supplying a `trend_map`. See #' [mvgam()] for details and for an example. #' #' @return A \code{list} object containing the mean posterior correlations and #' the full array of posterior correlations. #' #' @details Although this function will still work, it is now recommended to use #' [residual_cor()] to obtain residual correlation information in a more #' user-friendly format that allows for a deeper investigation of relationships #' among the time series. #' #' @seealso [residual_cor()], [plot.mvgam_residcor()] #' #' @examples #' \dontrun{ #' #-------------------------------------------------- #' # Fit a model that uses two AR(1) dynamic factors to model #' # the temporal dynamics of the four rodent species in the portal_data #' #-------------------------------------------------- #' mod <- mvgam( #' captures ~ series, #' trend_model = AR(), #' use_lv = TRUE, #' n_lv = 2, #' data = portal_data, #' chains = 2, #' silent = 2 #' ) #' #' # Plot the two dynamic factors #' plot(mod, type = 'factors') #' #' # Calculate correlations among the series #' lvcors <- lv_correlations(mod) #' names(lvcors) #' lapply(lvcors, class) #' #' # Recommended: use residual_cor() instead #' lvcors <- residual_cor(mod) #' names(lvcors) #' lvcors$cor #' #' # Plot credible correlations as a matrix #' plot(lvcors, cluster = TRUE) #' #' # Not needed for general use; cleans up connections for automated testing #' closeAllConnections() #' } #' #' @export lv_correlations = function(object) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } # Series start and end indices ends <- seq( 0, dim(mcmc_chains(object$model_output, 'ypred'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] # Total number of MCMC samples n_preds <- dim(mcmc_chains(object$model_output, 'trend')[, starts[1]:ends[1] ])[1] data_train <- object$obs_data # Total number of observations per series if (inherits(data_train, 'list')) { n_obs <- length(data_train$y) / NCOL(object$ytimes) } else { n_obs <- NROW(data_train) / NCOL(object$ytimes) } # Extract series trends series_trends <- lapply(seq_len(length(ends)), function(y) { if (object$fit_engine == 'stan') { # For stan objects, trend is stored as a vector in column-major order mcmc_chains(object$model_output, 'trend')[, seq( y, dim(mcmc_chains(object$model_output, 'trend'))[2], by = NCOL(object$ytimes) )] } else { mcmc_chains(object$model_output, 'trend')[, starts[y]:ends[y]][, 1:n_obs] } }) # Get list of trend correlation estimates all_trend_cors <- lapply(seq_len(n_preds), function(x) { cov2cor(cov(do.call( cbind, lapply(series_trends, function(y) { y[x, ] }) ))) }) # Calculate posterior mean correlations mean_correlations <- Reduce(`+`, all_trend_cors) / length(all_trend_cors) rownames(mean_correlations) <- colnames(mean_correlations) <- levels( data_train$series ) list( mean_correlations = mean_correlations, posterior_correlations = all_trend_cors ) } ================================================ FILE: R/marginaleffects.mvgam.R ================================================ #' Helper functions for \pkg{marginaleffects} calculations in \pkg{mvgam} models #' #' @importFrom stats coef model.frame #' @importFrom insight find_predictors get_data #' @importFrom marginaleffects get_coef set_coef get_vcov get_predict #' @importFrom utils getFromNamespace #' #' @inheritParams marginaleffects::get_coef #' @inheritParams marginaleffects::set_coef #' @inheritParams marginaleffects::get_vcov #' @inheritParams marginaleffects::get_predict #' @inheritParams insight::get_data #' @inheritParams insight::find_predictors #' #' @param mfx Ignored #' @param newparams Ignored #' @param ndraws Ignored #' @param se.fit Ignored #' #' @param trend_effects `logical`, extract from the process model component #' (only applicable if a `trend_formula` was specified in the model) #' #' @param process_error `logical`. If `TRUE`, uncertainty in the latent #' process (or trend) model is incorporated in predictions #' #' @return Objects suitable for internal 'marginaleffects' functions to proceed. #' See [marginaleffects::get_coef()], [marginaleffects::set_coef()], #' [marginaleffects::get_vcov()], [marginaleffects::get_predict()], #' [insight::get_data()] and [insight::find_predictors()] for details #' #' @name mvgam_marginaleffects #' #' @author Nicholas J Clark NULL #' @export #' @importFrom marginaleffects predictions marginaleffects::predictions #' @export #' @importFrom marginaleffects avg_predictions marginaleffects::avg_predictions #' @export #' @importFrom marginaleffects plot_predictions marginaleffects::plot_predictions #' @export #' @importFrom marginaleffects slopes marginaleffects::slopes #' @export #' @importFrom marginaleffects plot_slopes marginaleffects::plot_slopes #' @export #' @importFrom marginaleffects comparisons marginaleffects::comparisons #' @export #' @importFrom marginaleffects plot_comparisons marginaleffects::plot_comparisons #' @export #' @importFrom marginaleffects datagrid marginaleffects::datagrid #' @export #' @importFrom marginaleffects hypotheses marginaleffects::hypotheses #' @export #' @importFrom marginaleffects get_predict marginaleffects::get_predict #' @export #' @importFrom insight get_data insight::get_data #' Functions needed for working with \pkg{marginaleffects} #' @rdname mvgam_marginaleffects #' @export get_coef.mvgam <- function(model, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(model$trend_call)) { stop('no trend_formula exists so there no trend-level coefficients') } } if (!trend_effects) { b <- coef(model$mgcv_model) } else { b <- coef(model$trend_mgcv_model) } return(b) } #' @rdname mvgam_marginaleffects #' @export set_coef.mvgam <- function(model, coefs, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(model$trend_call)) { stop('no trend_formula exists so there no trend-level coefficients') } } out <- model if (!trend_effects) { out$mgcv_model$coefficients <- coefs } else { out$trend_mgcv_model$coefficients <- coefs } return(out) } #' @rdname mvgam_marginaleffects #' @export get_vcov.mvgam <- function(model, vcov = NULL, ...) { if (!is.null(vcov) && !is.logical(vcov)) { insight::format_warning( "The `vcov` argument is not supported for models of this class." ) } return(NULL) } #' @rdname mvgam_marginaleffects #' @export get_predict.mvgam <- function( model, newdata, type = 'response', mfx, newparams, ndraws, se.fit, process_error = FALSE, ... ) { preds <- predict( object = model, newdata = newdata, type = type, process_error = process_error, summary = FALSE, ... ) if ("rowid" %in% colnames(newdata)) { out <- data.frame( rowid = newdata[["rowid"]], estimate = apply(preds, 2, median) ) } else { out <- data.frame( rowid = seq_len(NCOL(preds)), estimate = apply(preds, 2, median) ) } attr(out, "posterior_draws") <- t(preds) return(out) } #' Functions needed for getting data / objects with \pkg{insight} #' @rdname mvgam_marginaleffects #' @export get_data.mvgam = function(x, source = "environment", verbose = TRUE, ...) { resp_terms <- as.character(rlang::f_lhs(x$call)) if (length(resp_terms) == 1L) { resp <- resp_terms } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] resp <- resp_terms[1] } } mf <- tryCatch( { # Drop response observations if a trend call was used because often # there won't be an easy way to match them up (for example if multiple # series depend on a shared latent trend) if (!is.null(x$trend_call)) { # Original series, time and outcomes orig_dat <- data.frame( series = x$obs_data$series, time = x$obs_data$index..time..index, y = x$obs_data$y ) # Add indicators of trend names as factor levels using the trend_map trend_indicators <- vector(length = length(orig_dat$time)) for (i in 1:length(orig_dat$time)) { trend_indicators[i] <- x$trend_map$trend[which( x$trend_map$series == orig_dat$series[i] )] } trend_indicators <- factor( paste0('trend', trend_indicators), levels = paste0('trend', unique(x$trend_map$trend)) ) # Trend-level data, before any slicing that took place orig_dat %>% dplyr::bind_cols(data.frame( trend_series = trend_indicators, row_num = 1:length(x$obs_data$index..time..index) )) -> trend_level_data # # We only kept one time observation per trend trend_level_data %>% dplyr::group_by(trend_series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> idx # Extract model.frame for trend_level effects and add the # trend indicators mf_data <- model.frame(x, trend_effects = TRUE) mf_data$trend_series <- trend_level_data$trend_series[idx] mf_data$time <- trend_level_data$time[idx] if ('series' %in% names(mf_data)) { mf_data %>% dplyr::select(-series) -> mf_data } # Now join with the original data so the original observations can # be included trend_level_data %>% dplyr::left_join(mf_data, by = c('trend_series', 'time')) %>% dplyr::select(-trend_series, -row_num, -trend_y) -> mf_data # Extract any predictors from the observation level model and # bind to the trend level model.frame mf_obs <- model.frame(x, trend_effects = FALSE) mf_data <- cbind(mf_obs, mf_data) %>% subset(., select = which(!duplicated(names(.)))) # Now get the observed response, in case there are any # NAs there that need to be updated mf_data[, resp] <- x$obs_data$y } else { mf_data <- model.frame(x, trend_effects = FALSE) mf_data[, resp] <- x$obs_data[[resp]] } mf_data }, error = function(x) { NULL } ) prep_data <- utils::getFromNamespace(".prepare_get_data", "insight") out <- prep_data(x, mf, effects = "all", verbose = verbose) # Remove colnames with cbind in them as these can be artifacts in # binomial model data preparations if (any(grepl('cbind', colnames(out)))) { out %>% dplyr::select(-matches('cbind')) -> out } return(out) } #' @rdname mvgam_marginaleffects #' @export get_data.mvgam_prefit = function( x, source = "environment", verbose = TRUE, ... ) { resp_terms <- as.character(rlang::f_lhs(x$call)) if (length(resp_terms) == 1L) { resp <- resp_terms } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] resp <- resp_terms[1] } } mf <- tryCatch( { # Drop response observations if a trend call was used because often # there won't be an easy way to match them up (for example if multiple # series depend on a shared latent trend) if (!is.null(x$trend_call)) { # Original series, time and outcomes orig_dat <- data.frame( series = x$obs_data$series, time = x$obs_data$index..time..index, y = x$obs_data$y ) # Add indicators of trend names as factor levels using the trend_map trend_indicators <- vector(length = length(orig_dat$time)) for (i in 1:length(orig_dat$time)) { trend_indicators[i] <- x$trend_map$trend[which( x$trend_map$series == orig_dat$series[i] )] } trend_indicators <- as.factor(paste0('trend', trend_indicators)) # Trend-level data, before any slicing that took place trend_level_data <- data.frame( trend_series = trend_indicators, series = orig_dat$series, time = orig_dat$time, y = orig_dat$y, row_num = 1:length(x$obs_data$index..time..index) ) # # We only kept one time observation per trend trend_level_data %>% dplyr::group_by(trend_series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> idx # Extract model.frame for trend_level effects and add the # trend indicators mf_data <- model.frame(x, trend_effects = TRUE) mf_data$trend_series <- trend_level_data$trend_series[idx] mf_data$time <- trend_level_data$time[idx] if ('series' %in% names(mf_data)) { mf_data %>% dplyr::select(-series) -> mf_data } # Now join with the original data so the original observations can # be included trend_level_data %>% dplyr::left_join(mf_data, by = c('trend_series', 'time')) %>% dplyr::select(-trend_series, -row_num, -trend_y) -> mf_data # Extract any predictors from the observation level model and # bind to the trend level model.frame mf_obs <- model.frame(x, trend_effects = FALSE) mf_data <- cbind(mf_obs, mf_data) %>% subset(., select = which(!duplicated(names(.)))) # Now get the observed response, in case there are any # NAs there that need to be updated mf_data[, resp] <- x$obs_data$y } else { mf_data <- model.frame(x, trend_effects = FALSE) mf_data[, resp] <- x$obs_data[[resp]] } mf_data }, error = function(x) { NULL } ) prep_data <- utils::getFromNamespace(".prepare_get_data", "insight") out <- prep_data(x, mf, effects = "all", verbose = verbose) # Remove colnames with cbind in them as these can be artifacts in # binomial model data preparations if (any(grepl('cbind', colnames(out)))) { out %>% dplyr::select(-matches('cbind')) -> out } return(out) } #' @rdname mvgam_marginaleffects #' @export find_predictors.mvgam = function( x, effects = c('fixed', 'random', 'all'), component = c( 'all', 'conditional', 'zi', 'zero_inflated', 'dispersion', 'instruments', 'correlation', 'smooth_terms' ), flatten = FALSE, verbose = TRUE, ... ) { obs_preds <- insight::find_predictors( x$mgcv_model, effects = effects, component = component, flatten = flatten ) if (!is.null(x$trend_call)) { preds <- list() trend_preds <- insight::find_predictors( x$trend_mgcv_model, effects = effects, component = component, flatten = flatten ) if (flatten) { preds <- unique(c(obs_preds, trend_preds)) } else { preds$conditional <- unique(c( obs_preds$conditional, trend_preds$conditional )) } } else { preds <- obs_preds } if (x$family == 'nmix') { if (flatten) { preds <- c(preds, 'cap') } else { preds$conditional <- c(preds$conditional, 'cap') } } # Check for offsets and add appropriately if (!is.null(attr(terms(x$call), "offset"))) { off_names <- grep( 'offset', rownames(attr(terms.formula(x$call), 'factors')), value = TRUE ) off_names <- sub( "offset\\((.*)\\)$", "\\1", grep('offset', off_names, value = TRUE) ) if (flatten) { for (i in 1:length(off_names)) { preds <- c(preds, off_names[i]) } } else { for (i in 1:length(off_names)) { preds$conditional <- c(preds$conditional, off_names[i]) } } } # Any other required variables, needed for grouped models if (!inherits(attr(x$model_data, 'trend_model'), 'mvgam_trend')) { trend_model <- list( trend_model = attr(x$model_data, 'trend_model'), unit = 'time', gr = 'NA', subgr = 'series' ) } else { trend_model <- attr(x$model_data, 'trend_model') } other_vars <- c(trend_model$unit, trend_model$gr, trend_model$subgr) if (!is.null(attr(x$model_data, 'prepped_trend_model'))) { prepped_model <- attr(x$model_data, 'prepped_trend_model') other_vars <- c( other_vars, c(prepped_model$unit, prepped_model$gr, prepped_model$subgr) ) } if (flatten) { other_vars <- setdiff(unique(other_vars), c('NA', preds)) preds <- c(preds, other_vars) } else { other_vars <- setdiff(unique(other_vars), c('NA', preds$conditional)) preds$conditional <- c(preds$conditional, other_vars) } return(preds) } #' @rdname mvgam_marginaleffects #' @export find_predictors.mvgam_prefit = function( x, effects = c('fixed', 'random', 'all'), component = c( 'all', 'conditional', 'zi', 'zero_inflated', 'dispersion', 'instruments', 'correlation', 'smooth_terms' ), flatten = FALSE, verbose = TRUE, ... ) { obs_preds <- insight::find_predictors( x$mgcv_model, effects = effects, component = component, flatten = flatten ) if (!is.null(x$trend_call)) { preds <- list() trend_preds <- insight::find_predictors( x$trend_mgcv_model, effects = effects, component = component, flatten = flatten ) if (flatten) { preds <- unique(c(obs_preds, trend_preds)) } else { preds$conditional <- unique(c( obs_preds$conditional, trend_preds$conditional )) } } else { preds <- obs_preds } if (x$family == 'nmix') { if (flatten) { preds <- c(preds, 'cap') } else { preds$conditional <- c(preds$conditional, 'cap') } } # Any other required variables, needed for grouped models if (!inherits(attr(x$model_data, 'trend_model'), 'mvgam_trend')) { trend_model <- list( trend_model = attr(x$model_data, 'trend_model'), unit = 'time', gr = 'NA', subgr = 'series' ) } else { trend_model <- attr(x$model_data, 'trend_model') } other_vars <- c(trend_model$unit, trend_model$gr, trend_model$subgr) if (!is.null(attr(x$model_data, 'prepped_trend_model'))) { prepped_model <- attr(x$model_data, 'prepped_trend_model') other_vars <- c( other_vars, c(prepped_model$unit, prepped_model$gr, prepped_model$subgr) ) } if (flatten) { other_vars <- setdiff(unique(other_vars), c('NA', preds)) preds <- c(preds, other_vars) } else { other_vars <- setdiff(unique(other_vars), c('NA', preds$conditional)) preds$conditional <- c(preds$conditional, other_vars) } return(preds) } ================================================ FILE: R/mcmc_plot.mvgam.R ================================================ #' MCMC plots of \pkg{mvgam} parameters, as implemented in \pkg{bayesplot} #' #' Convenient way to call MCMC plotting functions #' implemented in the \pkg{bayesplot} package for \pkg{mvgam} models #' @importFrom bayesplot color_scheme_set color_scheme_get #' @inheritParams brms::mcmc_plot #' @inheritParams as.data.frame.mvgam #' @param type The type of the plot. #' Supported types are (as names) \code{hist}, \code{dens}, #' \code{hist_by_chain}, \code{dens_overlay}, #' \code{violin}, \code{intervals}, \code{areas}, #' \code{areas_ridges}, \code{combo}, \code{acf}, #' \code{acf_bar}, \code{trace}, \code{trace_highlight}, #' \code{scatter}, \code{hex}, \code{pairs}, \code{violin}, #' \code{rhat}, \code{rhat_hist}, \code{neff}, \code{neff_hist} #' and \code{nuts_energy}. #' For an overview on the various plot types see #' \code{\link[bayesplot:MCMC-overview]{MCMC-overview}}. #' @return A \code{\link[ggplot2:ggplot]{ggplot}} object #' that can be further customized using the \pkg{ggplot2} package. #' @seealso \code{\link{mvgam_draws}} for an overview of some of the shortcut strings #' that can be used for argument `variable` #' @examples #' \dontrun{ #' simdat <- sim_mvgam(n_series = 1, trend_model = AR()) #' mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2) #' mcmc_plot(mod) #' mcmc_plot(mod, type = 'neff_hist') #' mcmc_plot(mod, variable = 'betas', type = 'areas') #' mcmc_plot(mod, variable = 'trend_params', type = 'combo') #' } #' @export mcmc_plot.mvgam = function( object, type = 'intervals', variable = NULL, regex = FALSE, use_alias = TRUE, ... ) { # Set red colour scheme col_scheme <- attr(color_scheme_get(), 'scheme_name') color_scheme_set('red') # Check type validity valid_types <- as.character(bayesplot::available_mcmc("")) valid_types <- sub("^mcmc_", "", valid_types) if (!type %in% valid_types) { stop( "Invalid plot type. Valid plot types are: \n", paste0("'", valid_types, "'", collapse = ", "), call. = FALSE ) } # Set default params to plot # By default, don't plot the Betas as there can be hundreds # of them in spline models if (is.null(variable)) { all_pars <- variables(object) variable <- c( all_pars$observation_pars[, 1], all_pars$observation_smoothpars[, 1], all_pars$observation_re_params[, 1], all_pars$trend_pars[, 1], all_pars$trend_smoothpars[, 1], all_pars$trend_re_params[, 1] ) regex <- FALSE } # Formal arguments mcmc_fun <- get(paste0("mcmc_", type), asNamespace("bayesplot")) mcmc_arg_names <- names(formals(mcmc_fun)) mcmc_args <- list(...) if ("x" %in% mcmc_arg_names) { if (grepl("^nuts_", type)) { # x refers to a molten data.frame of NUTS parameters mcmc_args$x <- brms::nuts_params(object$model_output) } else { # x refers to a data.frame of draws draws <- as.array( object, variable = variable, regex = regex, use_alias = use_alias ) sel_variables <- dimnames(draws)$variable if (type %in% c("scatter", "hex") && length(sel_variables) != 2L) { stop( "Exactly 2 parameters must be selected for this type.", "\nParameters selected: ", paste0("'", sel_variables, "'", collapse = ", "), call. = FALSE ) } if (type == 'pairs' && length(sel_variables) == 1L) { stop( "2 or more parameters must be selected for this type.", "\nParameters selected: ", paste0("'", sel_variables, "'", collapse = ", "), call. = FALSE ) } mcmc_args$x <- draws } } if ("np" %in% mcmc_arg_names) { mcmc_args$np <- brms::nuts_params(object$model_output) } interval_type <- type %in% c("intervals", "areas") if ("rhat" %in% mcmc_arg_names && !interval_type) { mcmc_args$rhat <- rhat(object) } if ("ratio" %in% mcmc_arg_names) { mcmc_args$ratio <- neff_ratio(object) } # Generate plot and reset colour scheme out_plot <- do.call(mcmc_fun, args = mcmc_args) color_scheme_set(col_scheme) # Return the plot return(out_plot) } #' @export #' @importFrom brms mcmc_plot brms::mcmc_plot ================================================ FILE: R/model.frame.mvgam.R ================================================ #' Extract model.frame from a fitted \pkg{mvgam} object #' #' @inheritParams stats::model.frame #' #' @param trend_effects \code{logical}, return the model.frame from the #' observation model (if \code{FALSE}) or from the underlying process #' model (if \code{TRUE}) #' #' @param ... Ignored #' #' @method model.frame mvgam #' #' @author Nicholas J Clark #' #' @return A \code{matrix} containing the fitted model frame #' #' @export model.frame.mvgam = function(formula, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(formula$trend_call)) { out <- NULL } else { out <- stats::model.frame(formula$trend_mgcv_model) } } else { out <- stats::model.frame(formula$mgcv_model) # Identify response variable resp_terms <- as.character(rlang::f_lhs(formula$call)) if (length(resp_terms) == 1L) { resp <- resp_terms } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] resp <- resp_terms[1] } } # Now add the observed response, in case there are any # NAs there that need to be updated out[, resp] <- formula$obs_data$y # Ensure 'cap' is included if this is an N-mixture model if (formula$family == 'nmix') { out$cap <- formula$obs_data$cap } # Check for offsets and add appropriately if (!is.null(attr(terms(formula$call), "offset"))) { off_names <- grep( 'offset', rownames(attr(terms.formula(formula$call), 'factors')), value = TRUE ) off_names <- sub( "offset\\((.*)\\)$", "\\1", grep('offset', off_names, value = TRUE) ) for (i in 1:length(off_names)) { out[[off_names[i]]] <- formula$obs_data[[off_names[i]]] } } # Any other required variables, needed for grouped models if ( !inherits(attr(formula$model_data, 'trend_model'), 'mvgam_trend') & !inherits(formula$trend_model, 'mvgam_trend') ) { trend_model <- list( trend_model = attr(formula$model_data, 'trend_model'), unit = 'time', gr = 'NA', subgr = 'series' ) } if (inherits(attr(formula$model_data, 'trend_model'), 'mvgam_trend')) { trend_model <- attr(formula$model_data, 'trend_model') } if (inherits(formula$trend_model, 'mvgam_trend')) { trend_model <- formula$trend_model } other_vars <- c(trend_model$unit, trend_model$gr, trend_model$subgr) if (!is.null(attr(formula$model_data, 'prepped_trend_model'))) { prepped_model <- attr(formula$model_data, 'prepped_trend_model') other_vars <- c( other_vars, c(prepped_model$unit, prepped_model$gr, prepped_model$subgr) ) } other_vars <- setdiff(unique(other_vars), c('NA', colnames(out))) if (length(other_vars)) { orig_names <- colnames(out) for (i in 1:length(other_vars)) { out <- cbind(out, formula$obs_data[[other_vars[i]]]) } colnames(out) <- c(orig_names, other_vars) } } return(out) } #' @inheritParams model.frame.mvgam #' @rdname model.frame.mvgam #' @method model.frame mvgam_prefit #' @export model.frame.mvgam_prefit = function(formula, trend_effects = FALSE, ...) { # Check trend_effects if (trend_effects) { if (is.null(formula$trend_call)) { out <- NULL } else { out <- stats::model.frame(formula$trend_mgcv_model) } } else { out <- stats::model.frame(formula$mgcv_model) # Identify response variable resp_terms <- as.character(rlang::f_lhs(formula$call)) if (length(resp_terms) == 1L) { resp <- resp_terms } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] resp <- resp_terms[1] } } # Now add the observed response, in case there are any # NAs there that need to be updated out[, resp] <- formula$obs_data$y # Ensure 'cap' is included if this is an N-mixture model if (formula$family == 'nmix') { out$cap <- formula$obs_data$cap } # Check for offsets and add appropriately if (!is.null(attr(terms(formula$call), "offset"))) { off_names <- grep( 'offset', rownames(attr(terms.formula(formula$call), 'factors')), value = TRUE ) off_names <- sub( "offset\\((.*)\\)$", "\\1", grep('offset', off_names, value = TRUE) ) for (i in 1:length(off_names)) { out[[off_names[i]]] <- formula$obs_data[[off_names[i]]] } } # Any other required variables, needed for grouped models if (!inherits(attr(formula$model_data, 'trend_model'), 'mvgam_trend')) { trend_model <- list( trend_model = attr(formula$model_data, 'trend_model'), unit = 'time', gr = 'NA', subgr = 'series' ) } else { trend_model <- attr(formula$model_data, 'trend_model') } other_vars <- c(trend_model$unit, trend_model$gr, trend_model$subgr) if (!is.null(attr(formula$model_data, 'prepped_trend_model'))) { prepped_model <- attr(formula$model_data, 'prepped_trend_model') other_vars <- c( other_vars, c(prepped_model$unit, prepped_model$gr, prepped_model$subgr) ) } other_vars <- setdiff(unique(other_vars), c('NA', colnames(out))) if (length(other_vars)) { orig_names <- colnames(out) for (i in 1:length(other_vars)) { out <- cbind(out, formula$obs_data[[other_vars[i]]]) } colnames(out) <- c(orig_names, other_vars) } } return(out) } ================================================ FILE: R/monotonic.R ================================================ #' Monotonic splines in \pkg{mvgam} models #' #' Uses constructors from package \pkg{splines2} to build monotonically increasing #' or decreasing splines. Details also in Wang & Yan (2021). #' #' @inheritParams mgcv::smooth.construct.bs.smooth.spec #' #' @param object A smooth specification object, usually generated by a term #' `s(x, bs = "moi", ...)` or `s(x, bs = "mod", ...)` #' #' @details The constructor is not normally called directly, #' but is rather used internally by [mvgam]. If they are not supplied then the #' knots of the spline are placed evenly throughout the covariate values to #' which the term refers: For example, if fitting 101 data with an 11 #' knot spline of x then there would be a knot at every 10th (ordered) x value. #' The spline is an implementation of the closed-form I-spline basis based #' on the recursion formula given by Ramsay (1988), in which the basis coefficients #' must be constrained to either be non-negative (for monotonically increasing #' functions) or non-positive (monotonically decreasing) #' \cr #' \cr #' Take note that when using either monotonic basis, the number of basis functions #' `k` must be supplied as an even integer due to the manner in #' which monotonic basis functions are constructed #' #' @return An object of class `"moi.smooth"` or `"mod.smooth"`. In addition to #' the usual elements of a smooth class documented under \code{\link[mgcv]{smooth.construct}}, #' this object will contain a slot called `boundary` that defines the endpoints beyond #' which the spline will begin extrapolating (extrapolation is flat due to the first #' order penalty placed on the smooth function) #' #' @note This constructor will result in a valid smooth if using a call to #' \code{\link[mgcv]{gam}} or \code{\link[mgcv]{bam}}, however the resulting #' functions will not be guaranteed to be monotonic because constraints on #' basis coefficients will not be enforced #' #' @references #' Wang, Wenjie, and Jun Yan. "Shape-Restricted Regression Splines with R Package splines2." #' Journal of Data Science 19.3 (2021). #' \cr #' \cr #' Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3(4), 425--441. #' #' @importFrom mgcv smooth.construct #' #' @author Nicholas J Clark #' #' @name monotonic #' #' @examples #' \dontrun{ #' # Simulate data from a monotonically increasing function #' set.seed(123123) #' #' x <- runif(80) * 4 - 1 #' x <- sort(x) #' f <- exp(4 * x) / (1 + exp(4 * x)) #' y <- f + rnorm(80) * 0.1 #' plot(x, y) #' #' # A standard TRPS smooth doesn't capture monotonicity #' library(mgcv) #' #' mod_data <- data.frame(y = y, x = x) #' mod <- gam( #' y ~ s(x, k = 16), #' data = mod_data, #' family = gaussian() #' ) #' #' library(marginaleffects) #' plot_predictions( #' mod, #' by = 'x', #' newdata = data.frame( #' x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) #' ), #' points = 0.5 #' ) #' #' # Using the 'moi' basis in mvgam rectifies this #' mod_data$time <- 1:NROW(mod_data) #' mod2 <- mvgam( #' y ~ s(x, bs = 'moi', k = 18), #' data = mod_data, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' plot_predictions( #' mod2, #' by = 'x', #' newdata = data.frame( #' x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) #' ), #' points = 0.5 #' ) #' #' plot(mod2, type = 'smooth', realisations = TRUE) #' #' # 'by' terms that produce a different smooth for each level of the 'by' #' # factor are also allowed #' #' x <- runif(80) * 4 - 1 #' x <- sort(x) #' #' # Two different monotonic smooths, one for each factor level #' f <- exp(4 * x) / (1 + exp(4 * x)) #' f2 <- exp(3.5 * x) / (1 + exp(3 * x)) #' fac <- c(rep('a', 80), rep('b', 80)) #' y <- c( #' f + rnorm(80) * 0.1, #' f2 + rnorm(80) * 0.2 #' ) #' #' plot(x, y[1:80]) #' plot(x, y[81:160]) #' #' # Gather all data into a data.frame, including the factor 'by' variable #' mod_data <- data.frame(y, x, fac = as.factor(fac)) #' mod_data$time <- 1:NROW(mod_data) #' #' # Fit a model with different smooths per factor level #' mod <- mvgam( #' y ~ s(x, bs = 'moi', by = fac, k = 8), #' data = mod_data, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' # Visualise the different monotonic functions #' plot_predictions( #' mod, #' condition = c('x', 'fac', 'fac'), #' points = 0.5 #' ) #' #' plot(mod, type = 'smooth', realisations = TRUE) #' #' # First derivatives (on the link scale) should never be #' # negative for either factor level #' (derivs <- slopes( #' mod, #' variables = 'x', #' by = c('x', 'fac'), #' type = 'link' #' )) #' #' all(derivs$estimate > 0) #' } #' @export smooth.construct.moi.smooth.spec <- function(object, data, knots) { insight::check_if_installed("splines2") # Check arguments object$p.order <- 1 if (object$bs.dim < 0) { object$bs.dim <- 10 } `k(bs = 'moi')` <- object$bs.dim if (`k(bs = 'moi')` <= 1) { stop("Basis dimension is too small", call. = FALSE) } validate_pos_integer(`k(bs = 'moi')`) validate_even(`k(bs = 'moi')`) # Number of knots must be k / 2 nk <- object$bs.dim / 2L if (!is.null(object$id)) { stop("Monotonic splines don't work with ids", call. = FALSE) } # Check basis dimension if (length(object$term) != 1) { stop("Monotonic basis only handles 1D smooths", call. = FALSE) } # Find the data and specified knots x <- data[[object$term]] k <- knots[[object$term]] if (length(unique(x)) < nk) { warning("basis dimension is larger than number of unique covariates") } # Checks on knots if (is.null(k)) { xl <- min(x) xu <- max(x) } else { xl <- min(k) xu <- max(k) if (xl > min(x) || xu < max(x)) { stop("knot range does not include data", call. = FALSE) } if (length(k) != nk) { stop(paste("there should be ", nk - 1, " supplied knots"), call. = FALSE) } } if (!is.null(k)) { if (sum(colSums(object$X) == 0) > 0) { warning("there is *no* information about some basis coefficients") } } if (is.null(k)) { # Generate knots if missing k <- seq(xl, xu, length.out = nk + 2)[2:nk + 1] } # Set anchor points beyond which extrapolation will occur xr <- xu - xl boundary <- c(xl - xr * 0.01, xu + xr * 0.01) # Generate basis functions i_spline_basis <- splines2::iSpline( x, knots = k, degree = nk, Boundary.knots = boundary, intercept = TRUE ) nbasis <- dim(i_spline_basis)[2] # Fill in object object$boundary <- boundary object$bs.dim <- nbasis object$knots <- k class(object) <- c("moi.smooth") object$X <- i_spline_basis if (!is.null(object$xt$S)) { stop( 'Cannot accept supplied penalty matrices for monotonic splines', call. = FALSE ) } object$S <- list(diag(object$bs.dim)) object$rank <- object$bs.dim object$null.space.dim <- 0 object$C <- matrix(0, 0, ncol(object$X)) object$random <- TRUE return(object) } #' @export #' @author Nicholas J Clark #' @rdname monotonic smooth.construct.mod.smooth.spec <- function(object, data, knots) { insight::check_if_installed("splines2") # Check arguments object$p.order <- 1 if (object$bs.dim < 0) { object$bs.dim <- 10 } `k(bs = 'moi')` <- object$bs.dim if (`k(bs = 'moi')` <= 1) { stop("Basis dimension is too small", call. = FALSE) } validate_pos_integer(`k(bs = 'moi')`) validate_even(`k(bs = 'moi')`) # Number of knots must be k / 2 nk <- object$bs.dim / 2L if (!is.null(object$id)) { stop("Monotonic splines don't work with ids", call. = FALSE) } # Check basis dimension if (length(object$term) != 1) { stop("Monotonic basis only handles 1D smooths", call. = FALSE) } # Find the data and specified knots x <- data[[object$term]] k <- knots[[object$term]] if (length(unique(x)) < nk) { warning("basis dimension is larger than number of unique covariates") } # Checks on knots if (is.null(k)) { xl <- min(x) xu <- max(x) } else { xl <- min(k) xu <- max(k) if (xl > min(x) || xu < max(x)) { stop("knot range does not include data", call. = FALSE) } if (length(k) != nk) { stop(paste("there should be ", nk - 1, " supplied knots"), call. = FALSE) } } if (!is.null(k)) { if (sum(colSums(object$X) == 0) > 0) { warning("there is *no* information about some basis coefficients") } } if (is.null(k)) { # Generate knots if missing k <- seq(xl, xu, length.out = nk + 2)[2:nk + 1] } # Set anchor points beyond which extrapolation will occur xr <- xu - xl boundary <- c(xl - xr * 0.01, xu + xr * 0.01) # Generate basis functions i_spline_basis <- splines2::iSpline( x, knots = k, degree = nk, Boundary.knots = boundary, intercept = TRUE ) nbasis <- dim(i_spline_basis)[2] # Fill in object object$boundary <- boundary object$bs.dim <- nbasis object$knots <- k class(object) <- c("mod.smooth") object$X <- i_spline_basis if (!is.null(object$xt$S)) { stop( 'Cannot accept supplied penalty matrices for monotonic splines', call. = FALSE ) } object$S <- list(diag(object$bs.dim)) object$rank <- object$bs.dim object$null.space.dim <- 0 object$C <- matrix(0, 0, ncol(object$X)) object$random <- TRUE return(object) } # Prediction function for the `moi' smooth class #' @rdname monotonic #' @importFrom mgcv Predict.matrix #' @export Predict.matrix.moi.smooth <- function(object, data) { insight::check_if_installed("splines2") # Ensure extrapolation is flat (1st degree penalty behaviour) x <- data[[object$term]] boundary <- object$boundary x[x > max(boundary)] <- max(boundary) x[x < min(boundary)] <- min(boundary) Xp <- suppressWarnings(splines2::iSpline( x, knots = object$knots, degree = object$bs.dim / 2, Boundary.knots = boundary, intercept = TRUE )) return(as.matrix(Xp)) } # Prediction function for the `mod' smooth class #' @rdname monotonic #' @importFrom mgcv Predict.matrix #' @export Predict.matrix.mod.smooth <- function(object, data) { insight::check_if_installed("splines2") # Ensure extrapolation is flat (1st degree penalty behaviour) x <- data[[object$term]] boundary <- object$boundary x[x > max(boundary)] <- max(boundary) x[x < min(boundary)] <- min(boundary) Xp <- suppressWarnings(splines2::iSpline( x, knots = object$knots, degree = object$bs.dim / 2, Boundary.knots = boundary, intercept = TRUE )) return(as.matrix(Xp)) } add_mono_model_file = function(model_file, model_data, mgcv_model) { # Which smooths are monotonic? smooth_labs <- do.call( rbind, lapply(seq_along(mgcv_model$smooth), function(x) { data.frame( label = mgcv_model$smooth[[x]]$label, class = class(mgcv_model$smooth[[x]])[1] ) }) ) # Clean labels for inclusion in Stan code mono_names <- smooth_labs$label[which( smooth_labs$class %in% c('moi.smooth', 'mod.smooth') )] mono_names_clean <- clean_gpnames(mono_names) # What directions are the constraints? mono_directions <- smooth_labs %>% dplyr::filter(class %in% c('moi.smooth', 'mod.smooth')) %>% dplyr::mutate( direction = dplyr::case_when( class == 'moi.smooth' ~ 1, class == 'mod.smooth' ~ -1, TRUE ~ -1 ) ) %>% dplyr::pull(direction) # Update model file to include constrained coefficients b_line <- max(grep('b[', model_file, fixed = TRUE)) b_edits <- paste0( 'b[b_idx_', mono_names_clean, '] = abs(b_raw[b_idx_', mono_names_clean, ']) * ', mono_directions, ';', collapse = '\n' ) model_file[b_line] <- paste0(model_file[b_line], '\n', b_edits) model_file <- readLines(textConnection(model_file), n = -1) # Add the necessary indices to the model_data and data block mono_stan_lines <- '' for (covariate in seq_along(mono_names)) { coef_indices <- which( grepl(mono_names[covariate], names(coef(mgcv_model)), fixed = TRUE) & !grepl( paste0(mono_names[covariate], ':'), names(coef(mgcv_model)), fixed = TRUE ) == TRUE ) mono_stan_lines <- paste0( mono_stan_lines, paste0( 'array[', length(coef_indices), '] int b_idx_', mono_names_clean[covariate], '; // monotonic basis coefficient indices\n' ) ) mono_idx_data <- list(coef_indices) names(mono_idx_data) <- paste0('b_idx_', mono_names_clean[covariate]) model_data <- append(model_data, mono_idx_data) } model_file[grep( 'int ytimes[n, n_series];', model_file, fixed = TRUE )] <- paste0( model_file[grep( 'int ytimes[n, n_series];', model_file, fixed = TRUE )], '\n', mono_stan_lines ) model_file <- readLines(textConnection(model_file), n = -1) return(list(model_file = model_file, model_data = model_data)) } ================================================ FILE: R/mvgam-class.R ================================================ #' Fitted `mvgam` object description #' #' A fitted \code{mvgam} object returned by function \code{\link{mvgam}}. #' Run `methods(class = "mvgam")` to see an overview of available methods. #' #' @details A `mvgam` object contains the following elements: #' #' - `call` the original observation model formula #' #' - `trend_call` If a `trend_formula was supplied`, the original trend model #' formula is returned. Otherwise `NULL` #' #' - `family` \code{character} description of the observation distribution #' #' - `trend_model` \code{character} description of the latent trend model #' #' - `trend_map` \code{data.frame} describing the mapping of trend states to #' observations, if supplied in the original model. Otherwise `NULL` #' #' - `drift` Logical specifying whether a drift term was used in the trend #' model #' #' - `priors` If the model priors were updated from their defaults, the prior #' `dataframe` will be returned. Otherwise `NULL` #' #' - `model_output` The `MCMC` object returned by the fitting engine. If the #' model was fitted using `Stan`, this will be an object of class `stanfit` #' (see \code{\link[rstan]{stanfit-class}} for details). If `JAGS` was used #' as the backend, this will be an object of class `runjags` (see #' \code{\link[runjags]{runjags-class}} for details) #' #' - `model_file` The `character` string model file used to describe the model #' in either `Stan` or `JAGS` syntax #' #' - `model_data` If `return_model_data` was set to `TRUE` when fitting the #' model, the `list` object containing all data objects needed to condition #' the model is returned. Each item in the `list` is described in detail at #' the top of the `model_file`. Otherwise `NULL` #' #' - `inits` If `return_model_data` was set to `TRUE` when fitting the model, #' the initial value functions used to initialise the MCMC chains will be #' returned. Otherwise `NULL` #' #' - `monitor_pars` The parameters that were monitored during MCMC sampling #' are returned as a `character vector` #' #' - `sp_names` A `character vector` specifying the names for each smoothing #' parameter #' #' - `mgcv_model` An object of class `gam` containing the `mgcv` version of #' the observation model. This object is used for generating the linear #' predictor matrix when making predictions for new data. The coefficients #' in this model object will contain the posterior median coefficients from #' the GAM linear predictor, but these are only used if generating plots of #' smooth functions that `mvgam` currently cannot handle (such as plots for #' three-dimensional smooths). This model therefore should not be used for #' inference. See \code{\link[mgcv]{gamObject}} for details #' #' - `trend_mgcv_model` If a `trend_formula was supplied`, an object of class #' `gam` containing the `mgcv` version of the trend model. Otherwise `NULL` #' #' - `ytimes` The `matrix` object used in model fitting for indexing which #' series and timepoints were observed in each row of the supplied data. #' Used internally by some downstream plotting and prediction functions #' #' - `resids` A named `list` object containing posterior draws of Dunn-Smyth #' randomized quantile residuals #' #' - `use_lv` Logical flag indicating whether latent dynamic factors were used #' in the model #' #' - `n_lv` If `use_lv == TRUE`, the number of latent dynamic factors used in #' the model #' #' - `upper_bounds` If bounds were supplied in the original model fit, they #' will be returned. Otherwise `NULL` #' #' - `obs_data` The original data object (either a `list` or `dataframe`) #' supplied in model fitting. #' #' - `test_data` If test data were supplied (as argument `newdata` in the #' original model), it will be returned. Othwerise `NULL` #' #' - `fit_engine` `Character` describing the fit engine, either as `stan` or #' `jags` #' #' - `backend` `Character` describing the backend used for modelling, either #' as `rstan`, `cmdstanr` or `rjags` #' #' - `algorithm` `Character` describing the algorithm used for finding the #' posterior, either as `sampling`, `laplace`, `pathfinder`, `meanfield` or #' `fullrank` #' #' - `max_treedepth` If the model was fitted using `Stan`, the value supplied #' for the maximum treedepth tuning parameter is returned (see #' \code{\link[rstan]{stan}} for details). Otherwise `NULL` #' #' - `adapt_delta` If the model was fitted using `Stan`, the value supplied #' for the adapt_delta tuning parameter is returned (see #' \code{\link[rstan]{stan}} for details). Otherwise `NULL` #' #' @seealso [mvgam] #' #' @author Nicholas J Clark #' #' @name mvgam-class NULL ================================================ FILE: R/mvgam-package.R ================================================ #' @keywords internal "_PACKAGE" ## usethis namespace: start ## usethis namespace: end NULL ================================================ FILE: R/mvgam.R ================================================ #' Fit a Bayesian Dynamic GAM to Univariate or Multivariate Time Series #' #' @description #' This function estimates the posterior distribution for Generalised Additive #' Models (GAMs) that can include smooth spline functions, specified in the GAM #' formula, as well as latent temporal processes, specified by `trend_model`. #' #' Further modelling options include State-Space representations to allow covariates #' and dynamic processes to occur on the latent 'State' level while also capturing #' observation-level effects. Prior specifications are flexible and explicitly #' encourage users to apply prior distributions that actually reflect their beliefs. #' #' In addition, model fits can easily be assessed and compared with posterior #' predictive checks, forecast comparisons and leave-one-out / leave-future-out #' cross-validation. #' #' @importFrom parallel clusterExport stopCluster setDefaultCluster #' @importFrom stats formula terms rnorm update.formula predict #' @importFrom rlang missing_arg #' #' @param formula A `formula` object specifying the GAM observation model formula. #' These are exactly like the formula for a GLM except that smooth terms, `s()`, #' `te()`, `ti()`, `t2()`, as well as time-varying `dynamic()` terms, #' nonparametric `gp()` terms and offsets using `offset()`, can be added to the #' right hand side to specify that the linear predictor depends on smooth #' functions of predictors (or linear functionals of these). #' #' In `nmix()` family models, the `formula` is used to set up a linear predictor #' for the detection probability. Details of the formula syntax used by #' \pkg{mvgam} can be found in \code{\link{mvgam_formulae}} #' #' @param trend_formula An optional `formula` object specifying the GAM process #' model formula. If supplied, a linear predictor will be modelled for the #' latent trends to capture process model evolution separately from the #' observation model. #' #' **Important notes:** #' - Should not have a response variable specified on the left-hand side #' (e.g., `~ season + s(year)`) #' - Use `trend` instead of `series` for effects that vary across time series #' - Only available for `RW()`, `AR()` and `VAR()` trend models #' - In `nmix()` family models, sets up linear predictor for latent abundance #' - Consider dropping one intercept using `- 1` convention to avoid #' estimation challenges #' #' @param knots An optional `list` containing user specified knot values for #' basis construction. For most bases the user simply supplies the knots to be #' used, which must match up with the `k` value supplied. Different terms can #' use different numbers of knots, unless they share a covariate. #' #' @param trend_knots As for `knots` above, this is an optional `list` of knot #' values for smooth functions within the `trend_formula`. #' #' @param data A `dataframe` or `list` containing the model response variable #' and covariates required by the GAM `formula` and optional `trend_formula`. #' #' **Required columns for most models:** #' - `series`: A `factor` index of the series IDs (number of levels should equal #' number of unique series labels) #' - `time`: `numeric` or `integer` index of time points. For most dynamic trend #' types, time should be measured in discrete, regularly spaced intervals #' (i.e., `c(1, 2, 3, ...)`). Irregular spacing is allowed for `trend_model = CAR(1)`, #' but zero intervals are adjusted to `1e-12` to prevent sampling errors. #' #' **Special cases:** #' - Models with hierarchical temporal correlation (e.g., `AR(gr = region, subgr = species)`) #' should NOT include a `series` identifier #' - Models without temporal dynamics (`trend_model = 'None'` or `trend_model = ZMVN()`) #' don't require a `time` variable #' #' @param newdata Optional `dataframe` or `list` of test data containing the same #' variables as in `data`. If included, observations in variable `y` will be #' set to `NA` when fitting the model so that posterior simulations can be obtained. #' #' @param run_model `logical`. If `FALSE`, the model is not fitted but instead #' the function returns the model file and the data/initial values needed to #' fit the model outside of `mvgam`. #' #' @param prior_simulation `logical`. If `TRUE`, no observations are fed to the #' model, and instead simulations from prior distributions are returned. #' #' @param return_model_data `logical`. If `TRUE`, the list of data needed to fit #' the model is returned, along with initial values for smooth and AR parameters, #' once the model is fitted. Helpful for users who wish to modify the model file #' to add other stochastic elements. Default is `FALSE` unless `run_model == FALSE`. #' #' @param family `family` specifying the exponential observation family for the series. #' #' **Supported families:** #' - `gaussian()`: Real-valued data #' - `betar()`: Proportional data on `(0,1)` #' - `lognormal()`: Non-negative real-valued data #' - `student_t()`: Real-valued data #' - `Gamma()`: Non-negative real-valued data #' - `bernoulli()`: Binary data #' - `poisson()`: Count data (default) #' - `nb()`: Overdispersed count data #' - `binomial()`: Count data with imperfect detection when number of trials is known #' (use `cbind()` to bind observations and trials) #' - `beta_binomial()`: As `binomial()` but allows for overdispersion #' - `nmix()`: Count data with imperfect detection when number of trials is unknown #' (State-Space N-Mixture model with Poisson latent states and Binomial observations) #' #' See \code{\link{mvgam_families}} for more details. #' #' @param share_obs_params `logical`. If `TRUE` and the `family` has additional #' family-specific observation parameters (e.g., variance components, dispersion #' parameters), these will be shared across all outcome variables. Useful when #' multiple outcomes share properties. Default is `FALSE`. #' #' @param use_lv `logical`. If `TRUE`, use dynamic factors to estimate series' #' latent trends in a reduced dimension format. Only available for `RW()`, #' `AR()` and `GP()` trend models. Default is `FALSE`. #' See \code{\link{lv_correlations}} for examples. #' #' @param n_lv `integer` specifying the number of latent dynamic factors to use #' if `use_lv == TRUE`. Cannot exceed `n_series`. Default is #' `min(2, floor(n_series / 2))`. #' #' @param trend_model `character` or `function` specifying the time series dynamics #' for the latent trend. #' #' **Available options:** #' - `None`: No latent trend component (GAM component only, like \code{\link[mgcv]{gam}}) #' - `ZMVN` or `ZMVN()`: Zero-Mean Multivariate Normal (Stan only) #' - `'RW'` or `RW()`: Random Walk #' - `'AR1'`, `'AR2'`, `'AR3'` or `AR(p = 1, 2, 3)`: Autoregressive models #' - `'CAR1'` or `CAR(p = 1)`: Continuous-time AR (Ornstein–Uhlenbeck process) #' - `'VAR1'` or `VAR()`: Vector Autoregressive (Stan only) #' - `'PWlogistic'`, `'PWlinear'` or `PW()`: Piecewise trends (Stan only) #' - `'GP'` or `GP()`: Gaussian Process with squared exponential kernel (Stan only) #' #' **Additional features:** #' - Moving average and/or correlated process error terms available for most types #' (e.g., `RW(cor = TRUE)` for multivariate Random Walk) #' - Hierarchical correlations possible for structured data #' - See [mvgam_trends] for details and [ZMVN()] for examples #' #' @param trend_map Optional `data.frame` specifying which series should depend on #' which latent trends. Enables multiple series to depend on the same latent #' trend process with different observation processes. #' #' **Required structure:** #' - Column `series`: Single unique entry for each series (matching factor levels in data) #' - Column `trend`: Integer values indicating which trend each series depends on #' #' **Notes:** #' - Sets up latent factor model by enabling `use_lv = TRUE` #' - Process model intercept is NOT automatically suppressed #' - Not yet supported for continuous time models (`CAR()`) #' #' @param noncentred `logical`. Use non-centred parameterisation for autoregressive #' trend models? Can improve efficiency by avoiding degeneracies in latent dynamic #' random effects estimation. Benefits vary by model - highly informative data #' may perform worse with this option. Available for `RW()`, `AR()`, `CAR()`, #' or `trend = 'None'` with `trend_formula`. Not available for moving average #' or correlated error models. #' #' @param chains `integer` specifying the number of parallel chains for the model. #' Ignored for variational inference algorithms. #' #' @param burnin `integer` specifying the number of warmup iterations to tune #' sampling algorithms. Ignored for variational inference algorithms. #' #' @param samples `integer` specifying the number of post-warmup iterations for #' sampling the posterior distribution. #' #' @param thin Thinning interval for monitors. Ignored for variational inference #' algorithms. #' #' @param parallel `logical` specifying whether to use multiple cores for parallel #' MCMC simulation. If `TRUE`, uses `min(c(chains, parallel::detectCores() - 1))` cores. #' #' @param threads `integer`. Experimental option for within-chain parallelisation #' in Stan using `reduce_sum`. Recommended only for experienced Stan users with #' slow models. Currently works for all families except `nmix()` and when using #' Cmdstan backend. #' #' @param priors An optional `data.frame` with prior definitions or, preferably, #' a vector of `brmsprior` objects (see \code{\link[brms]{prior}()}). #' See [get_mvgam_priors()] and Details for more information. #' #' @param refit `logical`. Indicates whether this is a refit called using #' [update.mvgam()]. Users should leave as `FALSE`. #' #' @param lfo `logical`. Indicates whether this is part of [lfo_cv.mvgam] call. #' Returns lighter model version for speed. Users should leave as `FALSE`. #' #' @param residuals `logical`. Whether to compute series-level randomized quantile #' residuals. Default is `TRUE`. Set to `FALSE` to save time and reduce object #' size (can add later using [add_residuals]). #' #' @param backend Character string naming the package for Stan model fitting. #' Options are `"cmdstanr"` (default) or `"rstan"`. Can be set globally via #' `"brms.backend"` option. See https://mc-stan.org/rstan/ and #' https://mc-stan.org/cmdstanr/ for details. #' #' @param algorithm Character string naming the estimation approach: #' - `"sampling"`: MCMC (default) #' - `"meanfield"`: Variational inference with factorized normal distributions #' - `"fullrank"`: Variational inference with multivariate normal distribution #' - `"laplace"`: Laplace approximation (cmdstanr only) #' - `"pathfinder"`: Pathfinder algorithm (cmdstanr only) #' #' Can be set globally via `"brms.algorithm"` option. Limited testing suggests #' `"meanfield"` performs best among non-MCMC approximations for dynamic GAMs. #' #' @param autoformat `logical`. Use `stanc` parser to automatically format Stan #' code and check for deprecations. For development purposes - leave as `TRUE`. #' #' @param save_all_pars `logical`. Save draws from all variables defined in Stan's #' `parameters` block. Default is `FALSE`. #' #' @param control Named `list` for controlling sampler behaviour. Valid elements #' include `max_treedepth`, `adapt_delta` and `init`. #' #' @param silent Verbosity level between `0` and `2`. If `1` (default), most #' informational messages are suppressed. If `2`, even more messages are #' suppressed. Sampling progress is still printed - set `refresh = 0` to #' disable. For `backend = "rstan"`, also set `open_progress = FALSE` to #' prevent additional progress bars. #' #' @param ... Further arguments passed to Stan: #' - For `backend = "rstan"`: passed to \code{\link[rstan]{sampling}()} or #' \code{\link[rstan]{vb}()} #' - For `backend = "cmdstanr"`: passed to `cmdstanr::sample`, #' `cmdstanr::variational`, `cmdstanr::laplace` or `cmdstanr::pathfinder` methods #' #' @details #' Dynamic GAMs are useful when we wish to predict future values from time series #' that show temporal dependence but we do not want to rely on extrapolating from #' a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours). #' In addition, smooths can often try to wiggle excessively to capture any #' autocorrelation that is present in a time series, which exacerbates the problem #' of forecasting ahead. #' #' As GAMs are very naturally viewed through a Bayesian lens, and we often must #' model time series that show complex distributional features and missing data, #' parameters for \pkg{mvgam} models are estimated in a Bayesian framework using #' Markov Chain Monte Carlo by default. #' #' **Getting Started Resources:** #' - General overview: `vignette("mvgam_overview")` and `vignette("data_in_mvgam")` #' - Full list of vignettes: `vignette(package = "mvgam")` #' - Real-world examples: \code{\link{mvgam_use_cases}} #' - Quick reference: [mvgam cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) #' #' @section Model Specification Details: #' #' **Formula Syntax:** Details of the formula syntax used by \pkg{mvgam} can be #' found in \code{\link{mvgam_formulae}}. Note that it is possible to supply an #' empty formula where there are no predictors or intercepts in the observation #' model (i.e. `y ~ 0` or `y ~ -1`). In this case, an intercept-only observation #' model will be set up but the intercept coefficient will be fixed at zero. This #' can be handy if you wish to fit pure State-Space models where the variation in #' the dynamic trend controls the average expectation, and/or where intercepts are #' non-identifiable (as in piecewise trends). #' #' **Families and Link Functions:** Details of families supported by \pkg{mvgam} #' can be found in \code{\link{mvgam_families}}. #' #' **Trend Models:** Details of latent error process models supported by \pkg{mvgam} #' can be found in \code{\link{mvgam_trends}}. #' #' @section Prior Specifications: #' Default priors for intercepts and any variance parameters are chosen to be #' vaguely informative, but these should always be checked by the user. Prior #' distributions for most important model parameters can be altered (see #' [get_mvgam_priors()] for details). Note that latent trends are estimated on #' the link scale so choose priors accordingly. #' #' However more control over the model specification can be accomplished by setting #' \code{run_model = FALSE} and then editing the model code (found in the #' `model_file` slot in the returned object) before running the model using either #' \pkg{rstan} or \pkg{cmdstanr}. This is encouraged for complex modelling tasks. #' #' **Important:** No priors are formally checked to ensure they are in the right #' syntax so it is up to the user to ensure these are correct. #' #' @section Model Components: #' #' **Random Effects:** For any smooth terms using the random effect basis #' (\code{\link[mgcv]{smooth.construct.re.smooth.spec}}), a non-centred #' parameterisation is automatically employed to avoid degeneracies that are common #' in hierarchical models. Note however that centred versions may perform better #' for series that are particularly informative, so as with any foray into Bayesian #' modelling, it is worth building an understanding of the model's assumptions and #' limitations by following a principled workflow. Also note that models are #' parameterised using `drop.unused.levels = FALSE` in \code{\link[mgcv]{jagam}} #' to ensure predictions can be made for all levels of the supplied factor variable. #' #' **Observation Level Parameters:** When more than one series is included in #' \code{data} and an observation family that contains more than one parameter is #' used, additional observation family parameters (i.e. `phi` for `nb()` or `sigma` #' for `gaussian()`) are by default estimated independently for each series. But if #' you wish for the series to share the same observation parameters, set #' `share_obs_params = TRUE`. #' #' @section Model Diagnostics: #' #' **Residuals:** For each series, randomized quantile (i.e. Dunn-Smyth) residuals #' are calculated for inspecting model diagnostics. If the fitted model is #' appropriate then Dunn-Smyth residuals will be standard normal in distribution #' and no autocorrelation will be evident. When a particular observation is missing, #' the residual is calculated by comparing independent draws from the model's #' posterior distribution. #' #' @section Computational Backend: #' #' **Using Stan:** \pkg{mvgam} is primarily designed to use Hamiltonian Monte Carlo #' for parameter estimation via the software `Stan` (using either the `cmdstanr` #' or `rstan` interface). There are great advantages when using `Stan` over Gibbs / #' Metropolis Hastings samplers, which includes the option to estimate nonlinear #' effects via [Hilbert space approximate Gaussian Processes](https://arxiv.org/abs/2004.11408), #' the availability of a variety of inference algorithms (i.e. variational inference, #' laplacian inference etc...) and [capabilities to enforce stationarity for complex Vector Autoregressions](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). #' #' Because of the many advantages of `Stan` over `JAGS`, **further development of #' the package will only be applied to `Stan`**. This includes the planned addition #' of more response distributions, plans to handle zero-inflation, and plans to #' incorporate a greater variety of trend models. Users are strongly encouraged to #' opt for `Stan` over `JAGS` in any proceeding workflows. #' #' @section Recommended Workflow: #' #' **How to Start:** The [`mvgam` cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) #' is a good starting place if you are just learning to use the package. It gives #' an overview of the package's key functions and objects, as well as providing a #' reasonable workflow that new users can follow. #' #' **Recommended Steps:** #' #' 1. **Data Preparation:** Check that your data are in a suitable tidy format for #' \pkg{mvgam} modeling (see the [data formatting vignette](https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html) #' for guidance) #' #' 2. **Data Exploration:** Inspect features of the data using \code{\link{plot_mvgam_series}}. #' Now is also a good time to familiarise yourself with the package's example #' workflows that are detailed in the vignettes: #' - [Getting started vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html) #' - [Shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/shared_states.html) #' - [Time-varying effects vignette](https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html) #' - [State-Space models vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) #' - ["Fitting N-mixture models in `mvgam`"](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html) #' - ["Joint Species Distribution Models in `mvgam`"](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html) #' - ["Incorporating time-varying seasonality in forecast models"](https://ecogambler.netlify.app/blog/time-varying-seasonality/) #' - ["Temporal autocorrelation in GAMs and the `mvgam` package"](https://ecogambler.netlify.app/blog/autocorrelated-gams/) #' #' 3. **Model Structure:** Carefully think about how to structure linear predictor #' effects (i.e. smooth terms using \code{\link[mgcv]{s}()}, \code{\link[mgcv]{te}()} #' or \code{\link[mgcv]{ti}()}, GPs using \code{\link[brms]{gp}()}, dynamic #' time-varying effects using [dynamic()], and parametric terms), latent temporal #' trend components (see \code{\link{mvgam_trends}}) and the appropriate #' observation family (see \code{\link{mvgam_families}}). Use [get_mvgam_priors()] #' to see default prior distributions for stochastic parameters. #' #' 4. **Prior Specification:** Change default priors using appropriate prior knowledge #' (see \code{\link[brms]{prior}()}). When using State-Space models with a #' `trend_formula`, pay particular attention to priors for any variance parameters #' such as process errors and observation errors. Default priors on these parameters #' are chosen to be vaguely informative and to avoid zero (using Inverse Gamma #' priors), but more informative priors will often help with model efficiency #' and convergence. #' #' 5. **Model Fitting:** Fit the model using either Hamiltonian Monte Carlo or an #' approximation algorithm (i.e. change the `backend` argument) and use #' [summary.mvgam()], [conditional_effects.mvgam()], [mcmc_plot.mvgam()], #' [pp_check.mvgam()], [pairs.mvgam()] and [plot.mvgam()] to inspect / #' interrogate the model. #' #' 6. **Model Comparison:** Update the model as needed and use [loo_compare.mvgam()] #' for in-sample model comparisons, or alternatively use [forecast.mvgam()], #' [lfo_cv.mvgam()] and [score.mvgam_forecast()] to compare models based on #' out-of-sample forecasts (see the [forecast evaluation vignette](https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html) #' for guidance). #' #' 7. **Inference and Prediction:** When satisfied with the model structure, use #' [predict.mvgam()], \code{\link[marginaleffects]{plot_predictions}()} and/or #' \code{\link[marginaleffects]{plot_slopes}()} for more targeted simulation-based #' inferences (see ["How to interpret and report nonlinear effects from Generalized Additive Models"](https://ecogambler.netlify.app/blog/interpreting-gams/) #' for some guidance on interpreting GAMs). For time series models, use #' [hindcast.mvgam()], [fitted.mvgam()], [augment.mvgam()] and [forecast.mvgam()] #' to inspect posterior hindcast / forecast distributions. #' #' 8. **Documentation:** Use [how_to_cite()] to obtain a scaffold methods section #' (with full references) to begin describing this model in scientific publications. #' #' @author Nicholas J Clark #' #' @references #' Nicholas J Clark & Konstans Wells (2023). Dynamic generalised additive models #' (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and #' Evolution. 14:3, 771-784. #' #' Nicholas J Clark, SK Morgan Ernest, Henry Senyondo, Juniper Simonis, Ethan P White, #' Glenda M Yenni, KANK Karunarathna (2025). Beyond single-species models: leveraging #' multispecies forecasts to navigate the dynamics of ecological predictability. #' PeerJ. 13:e18929 https://doi.org/10.7717/peerj.18929 #' #' @seealso \code{\link[mgcv]{jagam}()}, \code{\link[mgcv]{gam}()}, #' \code{\link[mgcv]{gam.models}}, [get_mvgam_priors()], [jsdgam()], #' [hindcast.mvgam()], [forecast.mvgam()], [predict.mvgam()] #' #' @return A `list` object of class `mvgam` containing model output, the text #' representation of the model file, the mgcv model output (for easily generating #' simulations at unsampled covariate values), Dunn-Smyth residuals for each #' series and key information needed for other functions in the package. See #' \code{\link{mvgam-class}} for details. Use `methods(class = "mvgam")` for an #' overview on available methods. #' #' @examples #' \dontrun{ #' # ============================================================================= #' # Basic Multi-Series Time Series Modeling #' # ============================================================================= #' #' # Simulate three time series that have shared seasonal dynamics, #' # independent AR(1) trends, and Poisson observations #' set.seed(0) #' dat <- sim_mvgam( #' T = 80, #' n_series = 3, #' mu = 2, #' trend_model = AR(p = 1), #' prop_missing = 0.1, #' prop_trend = 0.6 #' ) #' #' # Plot key summary statistics for a single series #' plot_mvgam_series(data = dat$data_train, series = 1) #' #' # Plot all series together #' plot_mvgam_series(data = dat$data_train, series = "all") #' #' # Formulate a model using Stan where series share a cyclic smooth for #' # seasonality and each series has an independent AR1 temporal process. #' # Note that 'noncentred = TRUE' will likely give performance gains. #' # Set run_model = FALSE to inspect the returned objects #' mod1 <- mvgam( #' formula = y ~ s(season, bs = "cc", k = 6), #' data = dat$data_train, #' trend_model = AR(), #' family = poisson(), #' noncentred = TRUE, #' run_model = FALSE #' ) #' #' # View the model code in Stan language #' stancode(mod1) #' #' # View the data objects needed to fit the model in Stan #' sdata1 <- standata(mod1) #' str(sdata1) #' #' # Now fit the model #' mod1 <- mvgam( #' formula = y ~ s(season, bs = "cc", k = 6), #' data = dat$data_train, #' trend_model = AR(), #' family = poisson(), #' noncentred = TRUE, #' chains = 2, #' silent = 2 #' ) #' #' # Extract the model summary #' summary(mod1) #' #' # Plot the historical trend and hindcast distributions for one series #' hc_trend <- hindcast(mod1, type = "trend") #' plot(hc_trend) #' #' hc_predicted <- hindcast(mod1, type = "response") #' plot(hc_predicted) #' #' # Residual diagnostics #' plot(mod1, type = "residuals", series = 1) #' resids <- residuals(mod1) #' str(resids) #' #' # Fitted values and residuals can be added directly to the training data #' augment(mod1) #' #' # Compute the forecast using covariate information in data_test #' fc <- forecast(mod1, newdata = dat$data_test) #' str(fc) #' fc_summary <- summary(fc) #' head(fc_summary, 12) #' plot(fc) #' #' # Plot the estimated seasonal smooth function #' plot(mod1, type = "smooths") #' #' # Plot estimated first derivatives of the smooth #' plot(mod1, type = "smooths", derivatives = TRUE) #' #' # Plot partial residuals of the smooth #' plot(mod1, type = "smooths", residuals = TRUE) #' #' # Plot posterior realisations for the smooth #' plot(mod1, type = "smooths", realisations = TRUE) #' #' # Plot conditional response predictions using marginaleffects #' conditional_effects(mod1) #' plot_predictions(mod1, condition = "season", points = 0.5) #' #' # Generate posterior predictive checks using bayesplot #' pp_check(mod1) #' #' # Extract observation model beta coefficient draws as a data.frame #' beta_draws_df <- as.data.frame(mod1, variable = "betas") #' head(beta_draws_df) #' str(beta_draws_df) #' #' # Investigate model fit #' mc.cores.def <- getOption("mc.cores") #' options(mc.cores = 1) #' loo(mod1) #' options(mc.cores = mc.cores.def) #' #' #' # ============================================================================= #' # Vector Autoregressive (VAR) Models #' # ============================================================================= #' #' # Fit a model to the portal time series that uses a latent #' # Vector Autoregression of order 1 #' mod <- mvgam( #' formula = captures ~ -1, #' trend_formula = ~ trend, #' trend_model = VAR(cor = TRUE), #' family = poisson(), #' data = portal_data, #' chains = 2, #' silent = 2 #' ) #' #' # Plot the autoregressive coefficient distributions; #' # use 'dir = "v"' to arrange the order of facets correctly #' mcmc_plot( #' mod, #' variable = 'A', #' regex = TRUE, #' type = 'hist', #' facet_args = list(dir = 'v') #' ) #' #' # Plot the process error variance-covariance matrix in the same way #' mcmc_plot( #' mod, #' variable = 'Sigma', #' regex = TRUE, #' type = 'hist', #' facet_args = list(dir = 'v') #' ) #' #' # Calculate Generalized Impulse Response Functions for each series #' irfs <- irf( #' mod, #' h = 12, #' cumulative = FALSE #' ) #' #' # Plot some of them #' plot(irfs, series = 1) #' plot(irfs, series = 2) #' #' # Calculate forecast error variance decompositions for each series #' fevds <- fevd(mod, h = 12) #' #' # Plot median contributions to forecast error variance #' plot(fevds) #' #' #' # ============================================================================= #' # Dynamic Factor Models #' # ============================================================================= #' #' # Now fit a model that uses two RW dynamic factors to model #' # the temporal dynamics of the four rodent species #' mod <- mvgam( #' captures ~ series, #' trend_model = RW(), #' use_lv = TRUE, #' n_lv = 2, #' data = portal_data, #' chains = 2, #' silent = 2 #' ) #' #' # Plot the factors #' plot(mod, type = 'factors') #' #' # Plot the hindcast distributions #' hcs <- hindcast(mod) #' plot(hcs, series = 1) #' plot(hcs, series = 2) #' plot(hcs, series = 3) #' plot(hcs, series = 4) #' #' # Use residual_cor() to calculate temporal correlations among the series #' # based on the factor loadings #' lvcors <- residual_cor(mod) #' names(lvcors) #' lvcors$cor #' #' # For those correlations whose credible intervals did not include #' # zero, plot them as a correlation matrix (all other correlations #' # are shown as zero on this plot) #' plot(lvcors, cluster = TRUE) #' #' #' # ============================================================================= #' # Shared Latent Trends with Custom Trend Mapping #' # ============================================================================= #' #' # Example of supplying a trend_map so that some series can share #' # latent trend processes #' sim <- sim_mvgam(n_series = 3) #' mod_data <- sim$data_train #' #' # Here, we specify only two latent trends; series 1 and 2 share a trend, #' # while series 3 has its own unique latent trend #' trend_map <- data.frame( #' series = unique(mod_data$series), #' trend = c(1, 1, 2) #' ) #' #' # Fit the model using AR1 trends #' mod <- mvgam( #' formula = y ~ s(season, bs = "cc", k = 6), #' trend_map = trend_map, #' trend_model = AR(), #' data = mod_data, #' return_model_data = TRUE, #' chains = 2, #' silent = 2 #' ) #' #' # The mapping matrix is now supplied as data to the model in the 'Z' element #' mod$model_data$Z #' #' # The first two series share an identical latent trend; the third is different #' plot(residual_cor(mod)) #' plot(mod, type = "trend", series = 1) #' plot(mod, type = "trend", series = 2) #' plot(mod, type = "trend", series = 3) #' #' #' # ============================================================================= #' # Time-Varying (Dynamic) Coefficients #' # ============================================================================= #' #' # Example of how to use dynamic coefficients #' # Simulate a time-varying coefficient for the effect of temperature #' set.seed(123) #' N <- 200 #' beta_temp <- vector(length = N) #' beta_temp[1] <- 0.4 #' for (i in 2:N) { #' beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) #' } #' plot(beta_temp) #' #' # Simulate a covariate called 'temp' #' temp <- rnorm(N, sd = 1) #' #' # Simulate some noisy Gaussian observations #' out <- rnorm(N, #' mean = 4 + beta_temp * temp, #' sd = 0.5 #' ) #' #' # Gather necessary data into a data.frame; split into training / testing #' data <- data.frame(out, temp, time = seq_along(temp)) #' data_train <- data[1:180, ] #' data_test <- data[181:200, ] #' #' # Fit the model using the dynamic() function #' mod <- mvgam( #' formula = out ~ dynamic( #' temp, #' scale = FALSE, #' k = 40 #' ), #' family = gaussian(), #' data = data_train, #' newdata = data_test, #' chains = 2, #' silent = 2 #' ) #' #' # Inspect the model summary, forecast and time-varying coefficient distribution #' summary(mod) #' plot(mod, type = "smooths") #' fc <- forecast(mod, newdata = data_test) #' plot(fc) #' #' # Propagating the smooth term shows how the coefficient is expected to evolve #' plot_mvgam_smooth(mod, smooth = 1, newdata = data) #' abline(v = 180, lty = "dashed", lwd = 2) #' points(beta_temp, pch = 16) #' #' #' # ============================================================================= #' # Working with Offset Terms #' # ============================================================================= #' #' # Example showing how to incorporate an offset; simulate some count data #' # with different means per series #' set.seed(100) #' dat <- sim_mvgam( #' prop_trend = 0, #' mu = c(0, 2, 2), #' seasonality = "hierarchical" #' ) #' #' # Add offset terms to the training and testing data #' dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) #' dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) #' #' # Fit a model that includes the offset in the linear predictor as well as #' # hierarchical seasonal smooths #' mod <- mvgam( #' formula = y ~ offset(offset) + #' s(series, bs = "re") + #' s(season, bs = "cc") + #' s(season, by = series, m = 1, k = 5), #' data = dat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Inspect the model file to see the modification to the linear predictor (eta) #' stancode(mod) #' #' # Forecasts for the first two series will differ in magnitude #' fc <- forecast(mod, newdata = dat$data_test) #' plot(fc, series = 1, ylim = c(0, 75)) #' plot(fc, series = 2, ylim = c(0, 75)) #' #' # Changing the offset for the testing data should lead to changes in #' # the forecast #' dat$data_test$offset <- dat$data_test$offset - 2 #' fc <- forecast(mod, newdata = dat$data_test) #' plot(fc) #' #' # Relative Risks can be computed by fixing the offset to the same value #' # for each series #' dat$data_test$offset <- rep(1, NROW(dat$data_test)) #' preds_rr <- predict(mod, #' type = "link", #' newdata = dat$data_test, #' summary = FALSE #' ) #' series1_inds <- which(dat$data_test$series == "series_1") #' series2_inds <- which(dat$data_test$series == "series_2") #' #' # Relative Risks are now more comparable among series #' layout(matrix(1:2, ncol = 2)) #' plot(preds_rr[1, series1_inds], #' type = "l", col = "grey75", #' ylim = range(preds_rr), #' ylab = "Series1 Relative Risk", xlab = "Time" #' ) #' for (i in 2:50) { #' lines(preds_rr[i, series1_inds], col = "grey75") #' } #' #' plot(preds_rr[1, series2_inds], #' type = "l", col = "darkred", #' ylim = range(preds_rr), #' ylab = "Series2 Relative Risk", xlab = "Time" #' ) #' for (i in 2:50) { #' lines(preds_rr[i, series2_inds], col = "darkred") #' } #' layout(1) #' #' #' # ============================================================================= #' # Binomial Family Models #' # ============================================================================= #' #' # Example showcasing how cbind() is needed for Binomial observations #' # Simulate two time series of Binomial trials #' trials <- sample(c(20:25), 50, replace = TRUE) #' x <- rnorm(50) #' detprob1 <- plogis(-0.5 + 0.9 * x) #' detprob2 <- plogis(-0.1 - 0.7 * x) #' dat <- rbind( #' data.frame( #' y = rbinom(n = 50, size = trials, prob = detprob1), #' time = 1:50, #' series = "series1", #' x = x, #' ntrials = trials #' ), #' data.frame( #' y = rbinom(n = 50, size = trials, prob = detprob2), #' time = 1:50, #' series = "series2", #' x = x, #' ntrials = trials #' ) #' ) #' dat <- dplyr::mutate(dat, series = as.factor(series)) #' dat <- dplyr::arrange(dat, time, series) #' plot_mvgam_series(data = dat, series = "all") #' #' # Fit a model using the binomial() family; must specify observations #' # and number of trials in the cbind() wrapper #' mod <- mvgam( #' formula = cbind(y, ntrials) ~ series + s(x, by = series), #' family = binomial(), #' data = dat, #' chains = 2, #' silent = 2 #' ) #' summary(mod) #' pp_check(mod, #' type = "bars_grouped", #' group = "series", ndraws = 50 #' ) #' pp_check(mod, #' type = "ecdf_overlay_grouped", #' group = "series", ndraws = 50 #' ) #' conditional_effects(mod, type = "link") #' #' # To view predictions on the probability scale, #' # use ntrials = 1 in datagrid() #' plot_predictions( #' mod, #' by = c('x', 'series'), #' newdata = datagrid( #' x = runif(100, -2, 2), #' series = unique, #' ntrials = 1 #' ), #' type = 'expected' #' ) #' #' # Not needed for general use; cleans up connections for automated testing #' closeAllConnections() #' } #' @export mvgam <- function( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... ) { # Check data arguments dots <- list(...) if (missing("data")) { if ("data_train" %in% names(dots)) { message('argument "data_train" is deprecated; supply as "data" instead') data <- dots$data_train dots$data_train <- NULL } else { stop('Argument "data" is missing with no default', call. = FALSE) } } if (missing("newdata")) { if ("data_test" %in% names(dots)) { message('argument "data_test" is deprecated; supply as "newdata" instead') data_test <- dots$data_train dots$data_test <- NULL } else { data_test <- rlang::missing_arg() } } if (!missing("data")) { data_train <- data } if (!missing("newdata")) { data_test <- newdata } orig_data <- data_train # Check sampler arguments use_stan <- TRUE if ("adapt_delta" %in% names(dots)) { message( 'argument "adapt_delta" should be supplied as an element in "control"' ) adapt_delta <- dots$adapt_delta dots$adapt_delta <- NULL } else { adapt_delta <- control$adapt_delta if (is.null(adapt_delta)) adapt_delta <- 0.8 } if ("max_treedepth" %in% names(dots)) { message( 'argument "max_treedepth" should be supplied as an element in "control"' ) max_treedepth <- dots$max_treedepth dots$max_treedepth <- NULL } else { max_treedepth <- control$max_treedepth if (is.null(max_treedepth)) max_treedepth <- 10 } # Validate trend_model if ("drift" %in% names(dots) & silent < 2L) { message( 'The "drift" argument is deprecated; use fixed effects of "time" instead' ) dots$drift <- NULL } drift <- FALSE orig_trend_model <- trend_model trend_model <- validate_trend_model( orig_trend_model, drift = drift, noncentred = noncentred ) # Cannot yet map observations to trends that evolve as CAR1 if (trend_model == "CAR1" & !missing(trend_map)) { stop("cannot yet use trend mapping for CAR1 dynamics", call. = FALSE) } # Ensure series and time variables are present data_train <- validate_series_time( data_train, name = "data", trend_model = orig_trend_model ) # Validate the formula to convert any dynamic() terms formula <- interpret_mvgam( formula, N = max(data_train$index..time..index), family = family ) # Check sampler arguments validate_pos_integer(chains) validate_pos_integer(threads) validate_pos_integer(burnin) validate_pos_integer(samples) validate_pos_integer(thin) validate_silent(silent) # Upper bounds no longer supported as they are fairly useless upper_bounds <- rlang::missing_arg() # Check for gp terms in the validated formula list2env( check_gp_terms(formula, data_train, family = family), envir = environment() ) # Check for missing rhs in formula list2env(check_obs_intercept(formula, orig_formula), envir = environment()) # Check for brmspriors if (!missing(priors)) { if (inherits(priors, "brmsprior") & !lfo & use_stan) { priors <- adapt_brms_priors( priors = priors, formula = orig_formula, trend_formula = trend_formula, data = data_train, family = family, use_lv = use_lv, n_lv = n_lv, trend_model = orig_trend_model, trend_map = trend_map, drift = drift, warnings = TRUE, knots = knots ) } } # Ensure series and time variables are present data_train <- validate_series_time( data_train, name = "data", trend_model = orig_trend_model ) if (!missing(data_test)) { data_test <- validate_series_time( data_test, name = "newdata", trend_model = orig_trend_model ) if (trend_model == "CAR1") { data_test$index..time..index <- data_test$index..time..index + max(data_train$index..time..index) } } # Lighten the final object if this is an lfo run if (lfo) { return_model_data <- FALSE } # Validate observation formula formula <- interpret_mvgam(formula, N = max(data_train$index..time..index)) data_train <- validate_obs_formula(formula, data = data_train, refit = refit) if (!missing(data_test)) { data_test <- validate_obs_formula(formula, data = data_test, refit = refit) } if (is.null(attr(terms(formula(formula)), "offset"))) { offset <- FALSE } else { offset <- TRUE } # Ensure fitting software can be located if (!use_stan & run_model) { find_jags() } if (use_stan & run_model) { find_stan() } # Validate the family and threads arguments family <- validate_family(family, use_stan = use_stan) family_char <- match.arg(arg = family$family, choices = family_char_choices()) threads <- validate_threads(family_char, threads) # Nmixture additions? list2env( check_nmix( family, family_char, trend_formula, trend_model, trend_map, data_train ), envir = environment() ) # Validate remaining trend arguments trend_val <- validate_trend_restrictions( trend_model = trend_model, formula = formula, trend_formula = trend_formula, trend_map = trend_map, drift = drift, drop_obs_intercept = drop_obs_intercept, use_lv = use_lv, n_lv = n_lv, data_train = data_train, use_stan = use_stan ) list2env(trend_val, envir = environment()) if (is.null(trend_map)) { trend_map <- rlang::missing_arg() } if (is.null(n_lv)) { n_lv <- rlang::missing_arg() } # Some general family-level restrictions can now be checked orig_y <- data_train$y if (any(!is.na(orig_y))) { validate_family_restrictions(response = orig_y, family = family) } # Fill in missing observations in data_train so the size of the dataset is correct when # building the initial JAGS model resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(formula(formula))[[2]]) } else { if (any(grepl("cbind", resp_terms))) { resp_terms <- resp_terms[-grepl("cbind", resp_terms)] out_name <- resp_terms[1] } } data_train[[out_name]] <- replace_nas(data_train[[out_name]]) # Compute default priors if (use_stan) { def_priors <- adapt_brms_priors( c( make_default_scales(orig_y, family), make_default_int( orig_y, family = if (add_nmix) { nmix() } else { family } ) ), formula = orig_formula, trend_formula = trend_formula, data = orig_data, family = family, use_lv = use_lv, n_lv = n_lv, trend_model = orig_trend_model, trend_map = trend_map, drift = drift, knots = knots ) } # Initiate the GAM model using mgcv so that the linear predictor matrix can be easily calculated # when simulating from the Bayesian model later on; ss_gam <- try( mvgam_setup( formula = formula, knots = knots, family = family_to_mgcvfam(family), dat = data_train ), silent = TRUE ) if (inherits(ss_gam, "try-error")) { if (grepl("missing values", ss_gam[1])) { stop( paste( "Missing values found in data predictors:\n", attr(ss_gam, "condition") ), call. = FALSE ) } else { stop(paste(ss_gam[1]), call. = FALSE) } } # Check the test data for NAs as well using predict.gam testdat_pred <- try( predict(ss_gam, newdata = data_test, na.action = na.fail), silent = TRUE ) if (inherits(testdat_pred, "try-error")) { if (grepl("missing values", testdat_pred[1])) { stop( paste( "Missing values found in newdata predictors:\n", attr(testdat_pred, "condition") ), call. = FALSE ) } } # Make JAGS file and appropriate data structures list2env( jagam_setup( ss_gam = ss_gam, formula = formula, data_train = data_train, family = family, family_char = family_char, knots = knots ), envir = environment() ) # Update initial values of lambdas using the full estimates from the # fitted gam model to speed convergence; remove initial betas so that the # chains can start in very different regions of the parameter space ss_jagam$jags.ini$b <- NULL if (length(ss_gam$sp) == length(ss_jagam$jags.ini$lambda)) { ss_jagam$jags.ini$lambda <- ss_gam$sp ss_jagam$jags.ini$lambda[log(ss_jagam$jags.ini$lambda) > 10] <- exp(10) } if (length(ss_gam$smooth) == 0) { ss_jagam$jags.ini$lambda <- NULL } # Fill y with NAs if this is a simulation from the priors; # otherwise replace with the original supplied values data_train <- check_priorsim(prior_simulation, data_train, orig_y, formula) # Read in the base (unmodified) jags model file base_model <- suppressWarnings(readLines(file_name)) # Remove lines from the linear predictor section lines_remove <- c(1:grep("## response", base_model)) base_model <- base_model[-lines_remove] if (any(grepl("scale <- 1/tau", base_model, fixed = TRUE))) { base_model <- base_model[-grep("scale <- 1/tau", base_model, fixed = TRUE)] } if (any(grepl("tau ~ dgamma(.05,.005)", base_model, fixed = TRUE))) { base_model <- base_model[ -grep("tau ~ dgamma(.05,.005)", base_model, fixed = TRUE) ] } # Any parametric effects in the gam (particularly the intercept) need sensible priors to ensure they # do not directly compete with the latent trends if (any(grepl("Parametric effect priors", base_model))) { in_parenth <- regmatches( base_model[grep("Parametric effect priors", base_model) + 1], gregexpr( "(?<=\\().+?(?=\\))", base_model[grep("Parametric effect priors", base_model) + 1], perl = T ) )[[1]][1] n_terms <- as.numeric(sub(".*:", "", in_parenth)) ss_jagam$jags.data$p_coefs <- coef(ss_gam)[1:n_terms] # Use the initialised GAM's estimates for parametric effects, but widen them # substantially to allow for better exploration beta_sims <- rmvn(100, coef(ss_gam), ss_gam$Vp) ss_jagam$jags.data$p_taus <- apply( as.matrix(beta_sims[, 1:n_terms]), 2, function(x) 1 / (sd(x)^2) ) base_model[grep("Parametric effect priors", base_model) + 1] <- paste0( " for (i in 1:", n_terms, ") { b[i] ~ dnorm(p_coefs[i], p_taus[i]) }" ) base_model[grep("Parametric effect priors", base_model)] <- c( " ## parametric effect priors (regularised for identifiability)" ) } # For any random effect smooths, use non-centred parameterisation to avoid degeneracies # For monotonic smooths, need to determine which direction to place # coefficient constraints smooth_labs <- do.call( rbind, lapply(seq_along(ss_gam$smooth), function(x) { data.frame( label = ss_gam$smooth[[x]]$label, class = class(ss_gam$smooth[[x]])[1], id = ifelse(is.null(ss_gam$smooth[[x]]$id), NA, ss_gam$smooth[[x]]$id) ) }) ) # Check for 'id' arguments, which are not yet supported if (any(!is.na(smooth_labs$id))) { stop( 'smooth terms with the "id" argument not yet supported by mvgam', call. = FALSE ) } if (any(smooth_labs$class == "random.effect")) { re_smooths <- smooth_labs %>% dplyr::filter(class == "random.effect") %>% dplyr::pull(label) for (i in 1:length(re_smooths)) { # If there are multiple smooths with this label, find out where the random effect # smooth sits smooth_labs %>% dplyr::filter(label == re_smooths[i]) %>% dplyr::mutate(smooth_number = dplyr::row_number()) %>% dplyr::filter(class == "random.effect") %>% dplyr::pull(smooth_number) -> smooth_number in_parenth <- regmatches( base_model[ grep(re_smooths[i], base_model, fixed = T)[smooth_number] + 1 ], gregexpr( "(?<=\\().+?(?=\\))", base_model[ grep(re_smooths[i], base_model, fixed = T)[smooth_number] + 1 ], perl = T ) )[[1]][1] n_terms <- as.numeric(sub(".*:", "", in_parenth)) n_start <- as.numeric(strsplit(sub(".*\\(", "", in_parenth), ":")[[1]][1]) base_model[ grep(re_smooths[i], base_model, fixed = T)[smooth_number] + 1 ] <- paste0( " for (i in ", n_start, ":", n_terms, ") {\n b_raw[i] ~ dnorm(0, 1)\n", "b[i] <- ", paste0("mu_raw", i), " + b_raw[i] * ", paste0("sigma_raw", i), "\n }\n ", paste0("sigma_raw", i), " ~ dexp(0.5)\n", paste0("mu_raw", i), " ~ dnorm(0, 1)" ) base_model[grep(re_smooths[i], base_model, fixed = T)[ smooth_number ]] <- paste0(" ## prior (non-centred) for ", re_smooths[i], "...") } } base_model[grep("smoothing parameter priors", base_model)] <- c( " ## smoothing parameter priors..." ) # Remove the fakery lines if they were added if (!smooths_included) { base_model <- base_model[ -c( grep("## prior for s(fakery)", trimws(base_model), fixed = TRUE):(grep( "## prior for s(fakery)", trimws(base_model), fixed = TRUE ) + 7) ) ] } # Add replacement lines for priors, trends and the linear predictor fil <- tempfile(fileext = ".xt") modification <- add_base_dgam_lines(use_lv) cat( c(readLines(textConnection(modification)), base_model), file = fil, sep = "\n" ) model_file <- trimws(readLines(fil, n = -1)) # Modify observation distribution lines if (family_char == "tweedie") { model_file <- add_tweedie_lines(model_file, upper_bounds = upper_bounds) } else if (family_char == "poisson") { model_file <- add_poisson_lines(model_file, upper_bounds = upper_bounds) } else { if (missing(upper_bounds)) { model_file[grep( "y\\[i, s\\] ~", model_file )] <- " y[i, s] ~ dnegbin(rate[i, s], phi[s])" model_file[grep( "ypred\\[i, s\\] ~", model_file )] <- " ypred[i, s] ~ dnegbin(rate[i, s], phi[s])" } } # Modify lines needed for the specified trend model model_file <- add_trend_lines( model_file, stan = FALSE, use_lv = use_lv, trend_model = if ( trend_model %in% c("RW", "VAR1", "PWlinear", "PWlogistic", "ZMVN") ) { "RW" } else { trend_model }, drift = drift ) # Use informative priors based on the fitted mgcv model to speed convergence # and eliminate searching over strange parameter spaces if (length(ss_gam$sp) == length(ss_jagam$jags.ini$lambda)) { model_file[grep( "lambda\\[i\\] ~", model_file )] <- " lambda[i] ~ dexp(1/sp[i])" } else { model_file[grep( "lambda\\[i\\] ~", model_file )] <- " lambda[i] ~ dexp(0.05)" } # Final tidying of the JAGS model for readability clean_up <- vector() for (x in 1:length(model_file)) { clean_up[x] <- model_file[x - 1] == "" & model_file[x] == "" } clean_up[is.na(clean_up)] <- FALSE model_file <- model_file[!clean_up] # Add in the offset if needed if (offset) { model_file[grep("eta <- X %*% b", model_file, fixed = TRUE)] <- "eta <- X %*% b + offset" model_file[grep("eta <- X * b", model_file, fixed = TRUE)] <- "eta <- X * b + offset" if (!missing(data_test) & !prior_simulation) { ss_jagam$jags.data$offset <- c( ss_jagam$jags.data$offset, data_test[[get_offset(ss_gam)]] ) } } model_file_jags <- textConnection(model_file) # Covariate dataframe including training and testing observations if (!missing(data_test) & !prior_simulation) { suppressWarnings( lp_test <- try( predict(ss_gam, newdata = data_test, type = "lpmatrix"), silent = TRUE ) ) if (inherits(lp_test, "try-error")) { testdat <- data.frame(time = data_test$index..time..index) terms_include <- names(ss_gam$coefficients)[which( !names(ss_gam$coefficients) %in% "(Intercept)" )] if (length(terms_include) > 0) { newnames <- vector() newnames[1] <- "time" for (i in 1:length(terms_include)) { testdat <- cbind(testdat, data.frame(data_test[[terms_include[i]]])) newnames[i + 1] <- terms_include[i] } colnames(testdat) <- newnames } suppressWarnings( lp_test <- predict(ss_gam, newdata = testdat, type = "lpmatrix") ) } # Remove fakery columns from design matrix if no smooth terms were included if (!smooths_included) { ss_jagam$jags.data$X <- as.matrix( ss_jagam$jags.data$X[, -c(xcols_drop)], ncol = NCOL(lp_test) ) } X <- data.frame(rbind(ss_jagam$jags.data$X, lp_test)) # Add a time variable if (inherits(data_train, "list")) { temp_dat_train <- data.frame( index..time..index = data_train$index..time..index, series = data_train$series ) temp_dat_test <- data.frame( index..time..index = data_test$index..time..index, series = data_test$series ) X$index..time..index <- rbind(temp_dat_train, temp_dat_test) %>% dplyr::left_join( rbind(temp_dat_train, temp_dat_test) %>% dplyr::select(index..time..index) %>% dplyr::distinct() %>% dplyr::arrange(index..time..index) %>% dplyr::mutate(index..time..index = dplyr::row_number()), by = c("index..time..index") ) %>% dplyr::pull(index..time..index) # Add a series identifier variable X$series <- as.numeric(rbind(temp_dat_train, temp_dat_test)$series) # Add an outcome variable X$outcome <- c(orig_y, rep(NA, NROW(temp_dat_test))) } else { if (NCOL(data_train) != NCOL(data_test)) { stop( '"data" and "newdata" have different numbers of columns', call. = FALSE ) } X$index..time..index <- dplyr::bind_rows(data_train, data_test) %>% dplyr::left_join( dplyr::bind_rows(data_train, data_test) %>% dplyr::select(index..time..index) %>% dplyr::distinct() %>% dplyr::arrange(index..time..index) %>% dplyr::mutate(index..time..index = dplyr::row_number()), by = c("index..time..index") ) %>% dplyr::pull(index..time..index) # Add a series identifier variable X$series <- as.numeric(dplyr::bind_rows(data_train, data_test)$series) # Add an outcome variable X$outcome <- c(data_train$y, rep(NA, NROW(data_test))) } } else { X <- data.frame(ss_jagam$jags.data$X) # Remove fakery columns from design matrix if no smooth terms were included if (!smooths_included) { X[, xcols_drop] <- NULL } if (inherits(data_train, "list")) { temp_dat <- data.frame(index..time..index = data_train$index..time..index) X$index..time..index <- temp_dat %>% dplyr::left_join( temp_dat %>% dplyr::select(index..time..index) %>% dplyr::distinct() %>% dplyr::arrange(index..time..index) %>% dplyr::mutate(index..time..index = dplyr::row_number()), by = c("index..time..index") ) %>% dplyr::pull(index..time..index) } else { X$index..time..index <- data_train %>% dplyr::left_join( data_train %>% dplyr::select(index..time..index) %>% dplyr::distinct() %>% dplyr::arrange(index..time..index) %>% dplyr::mutate(index..time..index = dplyr::row_number()), by = c("index..time..index") ) %>% dplyr::pull(index..time..index) } X$outcome <- c(data_train$y) X$series <- as.numeric(data_train$series) } # Arrange by time then by series X %>% dplyr::arrange(index..time..index, series) -> X # Matrix of indices in X that correspond to timepoints for each series ytimes <- matrix( NA, nrow = length(unique(X$index..time..index)), ncol = length(unique(X$series)) ) for (i in 1:length(unique(X$series))) { ytimes[, i] <- which(X$series == i) } ss_jagam$jags.data$ytimes <- ytimes # Matrix of outcomes in X that correspond to each series at each timepoint ys_mat <- matrix(NA, nrow = NROW(ytimes), ncol = NCOL(ytimes)) for (i in 1:length(unique(X$series))) { ys_mat[, i] <- X$outcome[which(X$series == i)] } ss_jagam$jags.data$y <- ys_mat # Other necessary variables ss_jagam$jags.data$n <- NROW(ytimes) ss_jagam$jags.data$n_series <- NCOL(ytimes) ss_jagam$jags.data$X <- as.matrix( X %>% dplyr::select(-index..time..index, -series, -outcome) ) if (NCOL(ss_jagam$jags.data$X) == 1) { if (offset) { model_file[grep( "eta <-", model_file, fixed = TRUE )] <- "eta <- X * b + offset" } else { model_file[grep("eta <-", model_file, fixed = TRUE)] <- "eta <- X * b" } } if (!missing(upper_bounds)) { ss_jagam$jags.data$upper_bound <- upper_bounds } if (length(ss_gam$sp) == length(ss_jagam$jags.ini$lambda)) { ss_jagam$jags.data$sp <- ss_gam$sp } # Machine epsilon for minimum allowable non-zero rate if (family_char == "negative binomial") { ss_jagam$jags.data$min_eps <- .Machine$double.eps } # Number of latent variables to use if (use_lv) { if (missing(n_lv)) { ss_jagam$jags.data$n_lv <- min(2, floor(ss_jagam$jags.data$n_series / 2)) } else { ss_jagam$jags.data$n_lv <- n_lv ss_jagam$jags.ini$X1 <- rep(1, n_lv) ss_jagam$jags.ini$X2 <- 1 } if (ss_jagam$jags.data$n_lv > ss_jagam$jags.data$n_series) { stop("Number of latent variables cannot be greater than number of series") } } if (missing(upper_bounds)) { upper_bounds <- NULL } if (use_lv) { n_lv <- ss_jagam$jags.data$n_lv } else { if (missing(trend_map)) { n_lv <- NULL } } if (missing(data_test)) { data_test <- NULL } # Remove Smooth penalty matrix if no smooths were used in the formula if (!smooths_included) { if (any(grepl("S.*", names(ss_jagam$jags.data)))) { ss_jagam$jags.data[[grep("S.*", names(ss_jagam$jags.data))]] <- NULL ss_jagam$jags.data$sp <- NULL } ss_jagam$jags.data$zero <- NULL } # Add information about the call and necessary data structures to the model file # Get dimensions and numbers of smooth terms snames <- names(ss_jagam$jags.data)[grep("S.*", names(ss_jagam$jags.data))] if (length(snames) == 0) { smooth_penalty_data <- NULL } else { smooth_dims <- matrix(NA, ncol = 2, nrow = length(snames)) for (i in 1:length(snames)) { smooth_dims[i, ] <- dim(ss_jagam$jags.data[[snames[i]]]) } smooth_penalty_data <- vector() for (i in 1:length(snames)) { smooth_penalty_data[i] <- paste0( "matrix ", snames[i], "; mgcv smooth penalty matrix ", snames[i] ) } } if ("sp" %in% names(ss_jagam$jags.data)) { if (length(ss_jagam$jags.data$sp) == 1) { sp_data <- c( paste0( "real sp; inverse exponential location prior for smoothing parameter ", paste0(names(ss_jagam$jags.data$sp)) ), "___________values ranging 5 - 50 are a good start" ) } else { sp_data <- c( paste0( "vector sp; inverse exponential location priors for smoothing parameters: ", paste0(names(ss_jagam$jags.data$sp), collapse = "; ") ), "___________values ranging 5 - 50 are a good start" ) } } else { sp_data <- NULL } if ("p_coefs" %in% names(ss_jagam$jags.data)) { parametric_ldata <- paste0( "vector p_coefs; vector (length = ", length(ss_jagam$jags.data$p_coefs), ") of prior Gaussian means for parametric effects" ) } else { parametric_ldata <- NULL } if ("p_taus" %in% names(ss_jagam$jags.data)) { parametric_tdata <- paste0( "vector p_taus; vector (length = ", length(ss_jagam$jags.data$p_coefs), ") of prior Gaussian precisions for parametric effects" ) } else { parametric_tdata <- NULL } # A second check for any smooth parameters if (any(grep("lambda", model_file, fixed = TRUE))) { smooths_included <- TRUE } else { smooths_included <- smooths_included } if (any(grep("K.* <- ", model_file))) { smooths_included <- TRUE } else { smooths_included <- smooths_included } # Add in additional data structure information for the model file heading if (family_char == "negative binomial") { min_eps <- paste0( "min_eps; .Machine$double.eps (smallest floating-point number x such that 1 + x != 1)\n" ) } else { min_eps <- NULL } if (smooths_included) { zeros <- paste0( "vector zero; prior basis coefficient locations vector of length ncol(X)\n" ) } else { zeros <- NULL } if (offset) { offset_line <- paste0("offset; offset vector of length (n x n_series)\n") } else { offset_line <- NULL } model_file <- c( "JAGS model code generated by package mvgam", "\n", "GAM formula:", gsub('\"', "", paste(formula[2], formula[3], sep = " ~ ")), "\n", "Trend model:", trend_model, "\n", "Required data:", "integer n; number of timepoints per series\n", "integer n_series; number of series\n", "matrix y; time-ordered observations of dimension n x n_series (missing values allowed)\n", "matrix ytimes; time-ordered n x n_series matrix (which row in X belongs to each [time, series] observation?)\n", "matrix X; mgcv GAM design matrix of dimension (n x n_series) x basis dimension\n", paste0(smooth_penalty_data), offset_line, zeros, paste0(parametric_ldata), paste0(parametric_tdata), sp_data, min_eps, "\n", model_file ) # Get names of smoothing parameters if (smooths_included) { ss_gam$off <- ss_jagam$pregam$off ss_gam$S <- ss_jagam$pregam$S name_starts <- unlist(purrr::map(ss_jagam$pregam$smooth, "first.sp")) name_ends <- unlist(purrr::map(ss_jagam$pregam$smooth, "last.sp")) rho_names <- unlist(lapply(seq(1:length(ss_gam$smooth)), function(i) { number_seq <- seq(1:(1 + name_ends[i] - name_starts[i])) number_seq[1] <- "" paste0(rep(ss_gam$smooth[[i]]$label, length(number_seq)), number_seq) })) } else { rho_names <- NA } #### Set up model file and modelling data #### if (use_stan) { algorithm <- match.arg( algorithm, c("sampling", "meanfield", "fullrank", "pathfinder", "laplace") ) backend <- match.arg(backend, c("rstan", "cmdstanr")) cmdstan_avail <- insight::check_if_installed( "cmdstanr", stop = FALSE, quietly = TRUE ) if (isTRUE(cmdstan_avail)) { if (is.null(cmdstanr::cmdstan_version(error_on_NA = FALSE))) { backend <- "rstan" } } fit_engine <- "stan" use_cmdstan <- ifelse(backend == "cmdstanr", TRUE, FALSE) # Import the base Stan model file modification <- add_base_dgam_lines(stan = TRUE, use_lv = use_lv) unlink("base_gam_stan.txt") stanfile_name <- tempfile(pattern = "base_gam_stan", fileext = ".txt") cat(modification, file = stanfile_name, sep = "\n", append = T) base_stan_model <- trimws(suppressWarnings(readLines(stanfile_name))) unlink(stanfile_name) # Add necessary trend structure base_stan_model <- add_trend_lines( model_file = base_stan_model, stan = TRUE, trend_model = if ( trend_model %in% c("RW", "VAR1", "PWlinear", "PWlogistic", "ZMVN") ) { "RW" } else { trend_model }, use_lv = use_lv, drift = drift ) # Add remaining data, model and parameters blocks to the Stan model file; # gather Stan data structure stan_objects <- add_stan_data( jags_file = trimws(model_file), stan_file = base_stan_model, ss_gam = ss_gam, use_lv = use_lv, n_lv = n_lv, jags_data = ss_jagam$jags.data, family = ifelse( family_char %in% c("binomial", "bernoulli", "beta_binomial"), "poisson", family_char ), upper_bounds = upper_bounds ) if (use_lv || !missing(trend_map)) { stan_objects$model_data$n_lv <- n_lv } # Set monitor parameters param <- get_monitor_pars( family = family_char, use_lv = use_lv, trend_model = trend_model, smooths_included = stan_objects$smooths_included, drift = drift ) if (any(smooth_labs$class == "random.effect")) { param <- c(param, "mu_raw", "sigma_raw") } # Don't place inits when using Stan; may add options later for users # to supply them though inits <- NULL # Include any GP term updates if (!is.null(gp_terms)) { gp_additions <- make_gp_additions( gp_details = gp_details, orig_formula = orig_formula, data = data_train, newdata = data_test, model_data = stan_objects$model_data, mgcv_model = ss_gam, gp_terms = gp_terms, family = family, rho_names = rho_names ) stan_objects$model_data <- gp_additions$model_data ss_gam <- gp_additions$mgcv_model rho_names <- gp_additions$rho_names } # Vectorise likelihoods vectorised <- vectorise_stan_lik( model_file = stan_objects$stan_file, model_data = stan_objects$model_data, family = family_char, threads = threads, trend_model = trend_model, use_lv = use_lv, offset = offset, drift = drift ) # If a VAR model is used, enforce stationarity using methods described by # Heaps 2022 (Enforcing stationarity through the prior in vector autoregressions) if (use_var1) { vectorised$model_file <- stationarise_VAR(vectorised$model_file) } if (use_var1cor) { param <- c(param, "L_Omega") vectorised$model_file <- stationarise_VARcor(vectorised$model_file) } # Add modifications for trend mapping and trend predictors, if # supplied trend_sp_names <- NA if (!missing(trend_map)) { trend_map_setup <- trend_map_mods( model_file = vectorised$model_file, model_data = vectorised$model_data, trend_map = trend_map, data_train = data_train, ytimes = ytimes, n_lv = n_lv, trend_model = trend_model ) vectorised$model_file <- trend_map_setup$model_file vectorised$model_data <- trend_map_setup$model_data if ( trend_model %in% c("None", "RW", "AR1", "AR2", "AR3", "CAR1", "ZMVN") ) { param <- unique(c(param, "trend", "sigma")) } # If trend formula specified, add the predictors for the trend models if (!missing(trend_formula)) { if (missing(trend_knots)) { trend_knots <- missing_arg() } trend_pred_setup <- add_trend_predictors( trend_formula = trend_formula, trend_knots = trend_knots, trend_map = trend_map, trend_model = trend_model, data_train = data_train, data_test = if (missing(data_test)) { NULL } else { data_test }, model_file = vectorised$model_file, model_data = vectorised$model_data, drop_trend_int = FALSE, drift = drift ) vectorised$model_file <- trend_pred_setup$model_file vectorised$model_data <- trend_pred_setup$model_data trend_mgcv_model <- trend_pred_setup$trend_mgcv_model param <- unique(c(param, "trend", "b_trend", "trend_mus")) if (trend_pred_setup$trend_smooths_included) { param <- c(param, "rho_trend", "lambda_trend") } if (trend_pred_setup$trend_random_included) { param <- c(param, "mu_raw_trend", "sigma_raw_trend") } trend_sp_names <- trend_pred_setup$trend_sp_names } else {} if (trend_model == "VAR1") { param <- c(param, "lv_coefs", "LV") use_lv <- TRUE } } # Update default priors vectorised$model_file <- suppressWarnings(update_priors( vectorised$model_file, def_priors, use_stan = TRUE )) # Drop observation intercept if specified if (drop_obs_intercept) { if ( any(grepl( "// observation model basis coefficients", vectorised$model_file, fixed = TRUE )) ) { vectorised$model_file[ grep( "// observation model basis coefficients", vectorised$model_file, fixed = TRUE ) + 1 ] <- paste0( vectorised$model_file[ grep( "// observation model basis coefficients", vectorised$model_file, fixed = TRUE ) + 1 ], "\n", "// (Intercept) fixed at zero\n", "b[1] = 0;" ) } else { vectorised$model_file[grep( "b[1:num_basis] = b_raw[1:num_basis]", vectorised$model_file, fixed = TRUE )] <- paste0( "b[1:num_basis] = b_raw[1:num_basis];\n", "// (Intercept) fixed at zero\n", "b[1] = 0;" ) } vectorised$model_file <- readLines( textConnection(vectorised$model_file), n = -1 ) attr(ss_gam, "drop_obs_intercept") <- TRUE } else { attr(ss_gam, "drop_obs_intercept") <- FALSE } # Remaining model file updates for any GP terms if (!is.null(gp_terms)) { final_gp_updates <- add_gp_model_file( model_file = vectorised$model_file, model_data = vectorised$model_data, mgcv_model = ss_gam, gp_additions = gp_additions ) vectorised$model_file <- final_gp_updates$model_file vectorised$model_data <- final_gp_updates$model_data } # Update monitor pars for any GP terms if ( any(grepl( "real alpha_gp", vectorised$model_file, fixed = TRUE )) & !lfo ) { alpha_params <- trimws(gsub( ";", "", gsub( "real ", "", grep( "real alpha_gp", vectorised$model_file, fixed = TRUE, value = TRUE ), fixed = TRUE ) )) rho_params <- trimws(gsub( ";", "", gsub( "real ", "", grep( "real rho_gp", vectorised$model_file, fixed = TRUE, value = TRUE ), fixed = TRUE ) )) param <- c(param, alpha_params, rho_params) } # Update for any monotonic term updates if (any(smooth_labs$class %in% c("moi.smooth", "mod.smooth"))) { final_mono_updates <- add_mono_model_file( model_file = vectorised$model_file, model_data = vectorised$model_data, mgcv_model = ss_gam ) vectorised$model_file <- final_mono_updates$model_file vectorised$model_data <- final_mono_updates$model_data } # Update for any piecewise trends if (trend_model %in% c("PWlinear", "PWlogistic")) { pw_additions <- add_piecewise( vectorised$model_file, vectorised$model_data, data_train, data_test, orig_trend_model, family ) vectorised$model_file <- pw_additions$model_file vectorised$model_data <- pw_additions$model_data orig_trend_model$changepoints <- pw_additions$model_data$t_change orig_trend_model$change_freq <- pw_additions$model_data$change_freq orig_trend_model$cap <- pw_additions$model_data$cap } # Update for CAR1 trends if (trend_model == "CAR1") { vectorised$model_data <- add_corcar( vectorised$model_data, data_train, data_test ) } # Updates for Binomial and Bernoulli families if (family_char %in% c("binomial", "bernoulli", "beta_binomial")) { bin_additions <- add_binomial( formula, vectorised$model_file, vectorised$model_data, data_train, data_test, family_char ) vectorised$model_file <- bin_additions$model_file vectorised$model_data <- bin_additions$model_data attr(ss_gam, "trials") <- bin_additions$trials } # Add in any user-specified priors if (!missing(priors)) { vectorised$model_file <- update_priors( vectorised$model_file, priors, use_stan = TRUE ) } else { priors <- NULL } # Check if non-centering can be used nc_check <- check_noncent( model_file = vectorised$model_file, noncentred = noncentred, use_lv = use_lv, trend_map = trend_map, add_ma = add_ma, add_cor = add_cor, trend_model = trend_model, drift = drift, silent = silent, nmix = add_nmix ) vectorised$model_file <- nc_check$model_file noncentred <- nc_check$noncentred # Add any correlated error or moving average processes; this comes after # priors as currently there is no option to change priors on these parameters if (add_ma | add_cor) { MaCor_additions <- add_MaCor( model_file = vectorised$model_file, model_data = vectorised$model_data, data_train = data_train, data_test = data_test, add_ma = add_ma, add_cor = add_cor, trend_model = orig_trend_model ) vectorised$model_file <- MaCor_additions$model_file vectorised$model_data <- MaCor_additions$model_data } # Add updates for an N-mixture model if (add_nmix) { nmix_additions <- add_nmixture( vectorised$model_file, vectorised$model_data, orig_trend_model = orig_trend_model, data_train = data_train, data_test = data_test, trend_map = trend_map, nmix_trendmap = nmix_trendmap ) vectorised$model_file <- nmix_additions$model_file vectorised$model_data <- nmix_additions$model_data family <- nmix() family_char <- "nmix" # Nmixtures don't use generated quantities because it is faster # to produce these in R after sampling has finished param <- c(param, "p") param <- param[ !param %in% c( "ypred", "mus", "theta", "detprob", "latent_ypred", "lv_coefs", "error" ) ] } # Updates for sharing of observation params if (share_obs_params) { vectorised$model_file <- shared_obs_params( vectorised$model_file, family_char ) } # Tidy the representation vectorised$model_file <- sanitise_modelfile(vectorised$model_file) if (requireNamespace("cmdstanr", quietly = TRUE) & backend == "cmdstanr") { # Replace new syntax if this is an older version of Stan if (cmdstanr::cmdstan_version() < "2.26") { warning( "Your version of CmdStan is out of date. Some features of mvgam may not work" ) vectorised$model_file <- gsub( "array[n, n_series] int ypred;", "int ypred[n, n_series];", vectorised$model_file, fixed = TRUE ) vectorised$model_file <- gsub( "array[n, n_series] real ypred;", "real ypred[n, n_series];", vectorised$model_file, fixed = TRUE ) } # Auto-format the model file if (autoformat) { if ( requireNamespace("cmdstanr") & cmdstanr::cmdstan_version() >= "2.29.0" ) { vectorised$model_file <- .autoformat( vectorised$model_file, overwrite_file = FALSE, backend = "cmdstanr", silent = silent >= 1L ) } vectorised$model_file <- readLines( textConnection(vectorised$model_file), n = -1 ) } } else { if (autoformat) { vectorised$model_file <- .autoformat( vectorised$model_file, overwrite_file = FALSE, backend = "rstan", silent = silent >= 1L ) vectorised$model_file <- readLines( textConnection(vectorised$model_file), n = -1 ) } # Replace new syntax if this is an older version of Stan if (rstan::stan_version() < "2.26") { warning( "Your version of rstan is out of date. Some features of mvgam may not work" ) vectorised$model_file <- gsub( "array[n, n_series] int ypred;", "int ypred[n, n_series];", vectorised$model_file, fixed = TRUE ) } } attr(vectorised$model_data, "trend_model") <- trend_model # Remove data likelihood if this is a prior sampling run if (prior_simulation) { vectorised$model_file <- remove_likelihood(vectorised$model_file) } } else { # Set up data and model file for JAGS attr(ss_jagam$jags.data, "trend_model") <- trend_model trend_sp_names <- NA if (!smooths_included) { inits <- NULL } else { inits <- ss_jagam$jags.ini } initlist <- replicate(chains, inits, simplify = FALSE) inits <- initlist if (!missing(priors)) { model_file <- update_priors(model_file, priors, use_stan = FALSE) } else { priors <- NULL } # Set monitor parameters and initial values param <- get_monitor_pars( family_char, smooths_included = smooths_included, use_lv, trend_model, drift ) # Add random effect parameters for monitoring if (any(smooth_labs$class == "random.effect")) { param <- c(param, paste0("mu_raw", 1:length(re_smooths))) param <- c(param, paste0("sigma_raw", 1:length(re_smooths))) } } # Remove lp__ from monitor params if VB is to be used if (algorithm %in% c("meanfield", "fullrank", "pathfinder", "laplace")) { param <- param[!param %in% "lp__"] } # Lighten up the mgcv model(s) to reduce size of the returned object ss_gam <- trim_mgcv(ss_gam) if (!missing(trend_formula)) { trend_mgcv_model <- trim_mgcv(trend_mgcv_model) } #### Return only the model file and all data / inits needed to run the model # outside of mvgam #### if (!run_model) { unlink("base_gam.txt") output <- structure( list( call = orig_formula, trend_call = if (!missing(trend_formula)) { trend_formula } else { NULL }, family = family_char, share_obs_params = share_obs_params, trend_model = orig_trend_model, trend_map = if (!missing(trend_map)) { trend_map } else { NULL }, drift = FALSE, priors = priors, model_file = if (use_stan) { vectorised$model_file } else { trimws(model_file) }, model_data = if (use_stan) { vectorised$model_data } else { ss_jagam$jags.data }, inits = inits, monitor_pars = param, mgcv_model = ss_gam, trend_mgcv_model = if (!missing(trend_formula)) { trend_mgcv_model } else { NULL }, sp_names = rho_names, trend_sp_names = trend_sp_names, ytimes = ytimes, use_lv = use_lv, n_lv = n_lv, upper_bounds = upper_bounds, obs_data = data_train, test_data = data_test, fit_engine = if (use_stan) { "stan" } else { "jags" }, backend = if (use_stan) { backend } else { "rjags" }, algorithm = if (use_stan) { algorithm } else { "sampling" }, max_treedepth = NULL, adapt_delta = NULL ), class = "mvgam_prefit" ) #### Else if running the model, complete the setup for fitting #### } else { # If this is a lfo_cv run, trim down parameters to monitor so post-processing # is faster if (lfo) { to_remove <- c( "trend", "b_trend", "b", "b_raw", "rho", "sigma", "alpha_gp", "rho_gp", "ar1", "ar2", "ar3", "LV", "lv_coefs", "penalty", "Sigma", "theta", "error" ) param <- param[!param %in% to_remove] } if (use_stan) { model_data <- vectorised$model_data # Check if cmdstan is accessible; if not, use rstan if (backend == "cmdstanr") { if (!requireNamespace("cmdstanr", quietly = TRUE)) { if (silent < 2) { message("cmdstanr library not found; defaulting to rstan") } use_cmdstan <- FALSE } else { use_cmdstan <- TRUE if (is.null(cmdstanr::cmdstan_version(error_on_NA = FALSE))) { warning( "cmdstanr library found but Cmdstan not found. Defaulting to rstan" ) use_cmdstan <- FALSE } } } if (use_cmdstan) { # Prepare threading and generate the model cmd_mod <- .model_cmdstanr( vectorised$model_file, threads = threads, silent = silent ) # Condition the model using Cmdstan out_gam_mod <- .sample_model_cmdstanr( model = cmd_mod, algorithm = algorithm, prior_simulation = prior_simulation, data = model_data, chains = chains, parallel = parallel, silent = silent, max_treedepth = max_treedepth, adapt_delta = adapt_delta, threads = threads, burnin = burnin, samples = samples, param = param, save_all_pars = save_all_pars, dots ) } else { # Condition the model using rstan requireNamespace("rstan", quietly = TRUE) out_gam_mod <- .sample_model_rstan( model = vectorised$model_file, algorithm = algorithm, prior_simulation = prior_simulation, data = model_data, chains = chains, parallel = parallel, silent = silent, max_treedepth = max_treedepth, adapt_delta = adapt_delta, threads = threads, burnin = burnin, samples = samples, thin = thin, dots ) } } if (!use_stan) { requireNamespace("runjags", quietly = TRUE) fit_engine <- "jags" model_data <- ss_jagam$jags.data runjags::runjags.options(silent.jags = TRUE, silent.runjags = TRUE) # Initiate adaptation of the model for the full burnin period. This is necessary as JAGS # will take a while to optimise the samplers, so long adaptation with little 'burnin' # is more crucial than little adaptation but long 'burnin' https://mmeredith.net/blog/2016/Adapt_or_burn.htm unlink("base_gam.txt") cat(model_file, file = "base_gam.txt", sep = "\n", append = T) message("Compiling the JAGS program...") message() if (prior_simulation) { n_adapt <- 500 n_burn <- 0 samples <- 1000 thin <- 1 } else { n_burn <- burnin # Rely on long adaptation to tune samplers appropriately n_adapt <- max(1000, n_burn - 1000) } if (parallel) { cl <- parallel::makePSOCKcluster(min(c( chains, parallel::detectCores() - 1 ))) setDefaultCluster(cl) gam_mod <- runjags::run.jags( model = "base_gam.txt", data = ss_jagam$jags.data, modules = "glm", inits = initlist, n.chains = chains, adapt = n_adapt, burnin = n_burn, sample = samples, jags = jags_path, thin = thin, method = "rjparallel", monitor = param, silent.jags = TRUE, cl = cl ) stopCluster(cl) } else { gam_mod <- runjags::run.jags( model = "base_gam.txt", data = ss_jagam$jags.data, modules = "glm", inits = initlist, n.chains = chains, adapt = n_adapt, burnin = n_burn, sample = samples, jags = jags_path, thin = thin, method = "rjags", monitor = param, silent.jags = TRUE ) } out_gam_mod <- coda::as.mcmc.list(gam_mod) } unlink(file_name) unlink(fil) # Add generated quantities for N-mixture models if (family_char == "nmix") { out_gam_mod <- add_nmix_posterior( model_output = out_gam_mod, obs_data = data_train, test_data = data_test, mgcv_model = trend_mgcv_model, Z = model_data$Z, n_lv = n_lv, K_inds = model_data$K_inds_all ) } # Get Dunn-Smyth Residual distributions for each series if this # is not a prior simulation or an lfo fit if (prior_simulation || lfo || !residuals) { series_resids <- NULL } else { object <- list( model_output = out_gam_mod, call = orig_formula, mgcv_model = ss_gam, model_data = if (use_stan) { vectorised$model_data } else { ss_jagam$jags.data }, fit_engine = fit_engine, family = family_char, share_obs_params = share_obs_params, obs_data = data_train, test_data = data_test, ytimes = ytimes ) class(object) <- "mvgam" # Use the much faster vectorized residual # calculation function now series_resids <- dsresids_vec(object) } if (prior_simulation) { data_train$y <- orig_y } # Add Bayesian coefficients to the mgcv model to help with plotting of # smooths that aren't yet supported by mvgam plotting functions; this is # also necessary for computing EDFs and approximate p-values of smooths if (!lfo) { V <- cov(mcmc_chains(out_gam_mod, "b")) ss_gam$Vp <- ss_gam$Vc <- V # Add the posterior median coefficients p <- mcmc_summary( out_gam_mod, "b", variational = algorithm %in% c("meanfield", "fullrank", "pathfinder", "laplace") )[, c(4)] names(p) <- names(ss_gam$coefficients) ss_gam$coefficients <- p # Repeat for any trend-specific mgcv model if (!missing(trend_formula)) { V <- cov(mcmc_chains(out_gam_mod, "b_trend")) trend_mgcv_model$Vp <- trend_mgcv_model$Vc <- V p <- mcmc_summary( out_gam_mod, "b_trend", variational = algorithm %in% c("meanfield", "fullrank", "pathfinder", "laplace") )[, c(4)] names(p) <- names(trend_mgcv_model$coefficients) trend_mgcv_model$coefficients <- p } } #### Return the output as class mvgam #### trim_data <- list() attr(model_data, "trend_model") <- trend_model attr(trim_data, "trend_model") <- trend_model attr(model_data, "noncentred") <- if (noncentred) TRUE else NULL attr(trim_data, "noncentred") <- if (noncentred) TRUE else NULL attr(model_data, "threads") <- threads attr(trim_data, "threads") <- threads output <- structure( list( call = orig_formula, trend_call = if (!missing(trend_formula)) { trend_formula } else { NULL }, family = family_char, share_obs_params = share_obs_params, trend_model = orig_trend_model, trend_map = if (!missing(trend_map)) { trend_map } else { NULL }, drift = FALSE, priors = priors, model_output = out_gam_mod, model_file = if (use_stan) { vectorised$model_file } else { trimws(model_file) }, model_data = if (return_model_data) { model_data } else { trim_data }, inits = if (return_model_data) { inits } else { NULL }, monitor_pars = param, sp_names = rho_names, trend_sp_names = trend_sp_names, mgcv_model = ss_gam, trend_mgcv_model = if (!missing(trend_formula)) { trend_mgcv_model } else { NULL }, ytimes = ytimes, resids = series_resids, use_lv = use_lv, n_lv = n_lv, upper_bounds = upper_bounds, obs_data = data_train, test_data = data_test, fit_engine = fit_engine, backend = if (use_stan) { backend } else { "rjags" }, algorithm = if (use_stan) { algorithm } else { "sampling" }, max_treedepth = if (use_stan & algorithm == "sampling") { max_treedepth } else { NULL }, adapt_delta = if (use_stan & algorithm == "sampling") { adapt_delta } else { NULL } ), class = "mvgam" ) } return(output) } ================================================ FILE: R/mvgam_diagnostics.R ================================================ #' Extract diagnostic quantities of \pkg{mvgam} models #' #' Extract quantities that can be used to diagnose sampling behavior #' of the algorithms applied by \pkg{Stan} at the back-end of \pkg{mvgam}. #' #' @name mvgam_diagnostics #' @aliases nuts_params rhat neff_ratio #' #' @param object,x A \code{mvgam} or \code{jsdgam} object. #' @param pars An optional character vector of parameter names. #' For \code{nuts_params} these will be NUTS sampler parameter #' names rather than model parameters. If pars is omitted #' all parameters are included. #' @param ... Arguments passed to individual methods. #' #' @return The exact form of the output depends on the method. #' @examples #' \dontrun{ #' simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') #' mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2) #' np <- nuts_params(mod) #' head(np) #' #' # extract the number of divergence transitions #' sum(subset(np, Parameter == "divergent__")$Value) #' #' head(neff_ratio(mod)) #' } #' @details For more details see #' \code{\link[bayesplot:bayesplot-extractors]{bayesplot-extractors}}. #' NULL #' @rdname mvgam_diagnostics #' @importFrom bayesplot nuts_params #' @export nuts_params #' @export nuts_params.mvgam <- function(object, pars = NULL, ...) { bayesplot::nuts_params(object$model_output, pars = pars, ...) } #' @rdname mvgam_diagnostics #' @importFrom bayesplot log_posterior #' @export log_posterior.mvgam <- function(object, ...) { bayesplot::log_posterior(object$model_output, ...) } #' @rdname mvgam_diagnostics #' @importFrom posterior rhat #' @export rhat #' @export rhat.mvgam <- function(x, pars = NULL, ...) { # bayesplot uses outdated rhat code from rstan # bayesplot::rhat(object$fit, pars = pars, ...) if (is.null(pars)) { vars_extract <- variables(x) draws <- as_draws_array( x, variable = unlist(purrr::map(vars_extract, 'orig_name')), use_alias = FALSE ) } else { draws <- as_draws_array(x, variable = pars) } tmp <- posterior::summarise_draws(draws, rhat = posterior::rhat) rhat <- tmp$rhat names(rhat) <- tmp$variable rhat } #' @rdname mvgam_diagnostics #' @importFrom bayesplot neff_ratio #' @importFrom brms ndraws #' @export neff_ratio #' @export neff_ratio.mvgam <- function(object, pars = NULL, ...) { insight::check_if_installed( "matrixStats", reason = 'to calculate effective sample sizes' ) if (is.null(pars)) { vars_extract <- unlist(purrr::map(variables(object), 'orig_name')) vars_extract <- vars_extract[-grep('ypred', vars_extract)] draws <- as_draws_array(object, variable = vars_extract, use_alias = FALSE) } else { draws <- as_draws_array(object, variable = pars) } tmp <- posterior::summarise_draws( draws, ess_bulk = posterior::ess_bulk, ess_tail = posterior::ess_tail ) # min of ess_bulk and ess_tail mimics definition of posterior::rhat.default ess <- matrixStats::rowMins(cbind(tmp$ess_bulk, tmp$ess_tail)) names(ess) <- tmp$variable ess / brms::ndraws(draws) } ================================================ FILE: R/mvgam_fevd-class.R ================================================ #' `mvgam_fevd` object description #' #' A \code{mvgam_fevd} object returned by function [fevd()]. Run #' `methods(class = "mvgam_fevd")` to see an overview of available methods. #' #' @details A forecast error variance decomposition is useful for quantifying #' the amount of information each series that in a Vector Autoregression #' contributes to the forecast distributions of the other series in the #' autoregression. This object contains the forecast error variance #' decomposition using the orthogonalised impulse response coefficient #' matrices \eqn{\Psi_h}, which can be used to quantify the contribution of #' series \eqn{j} to the h-step forecast error variance of series \eqn{k}: #' \deqn{ #' \sigma_k^2(h) = \sum_{j=1}^K(\psi_{kj, 0}^2 + \ldots + \psi_{kj, #' h-1}^2) \quad #' } #' If the orthogonalised impulse reponses \eqn{(\psi_{kj, 0}^2 + \ldots + #' \psi_{kj, h-1}^2)} are divided by the variance of the forecast error #' \eqn{\sigma_k^2(h)}, this yields an interpretable percentage representing #' how much of the forecast error variance for \eqn{k} can be explained by an #' exogenous shock to \eqn{j}. This percentage is what is calculated and #' returned in objects of class `mvgam_fevd`, where the posterior #' distribution of variance decompositions for each variable in the original #' model is contained in a separate slot within the returned `list` object #' #' @seealso [mvgam()], [VAR()] #' #' @references Lütkepohl, H (2006). New Introduction to Multiple Time Series #' Analysis. Springer, New York. #' #' @author Nicholas J Clark #' #' @name mvgam_fevd-class NULL #' @title Posterior summary of forecast error variance decompositions #' #' @description This function takes an \code{mvgam_fevd} object and calculates #' a posterior summary of the error variance decompositions of each series, #' at all horizons #' #' @param object an object of class `mvgam_fevd` obtained using the #' \code{fevd()} function. This object will contain draws from the posterior #' distribution of the forecast error variance decompositions. #' #' @param probs The upper and lower percentiles to be computed by the #' `quantile` function, in addition to the median #' #' @param ... ignored #' #' @return A long-format `tibble` / `data.frame` reporting the posterior median, #' upper and lower percentiles of the error variance decompositions of each #' series at all horizons. #' #' @method summary mvgam_fevd #' #' @seealso \code{\link{fevd}}, \code{\link{plot.mvgam_fevd}} #' #' @author Nicholas J Clark #' #' @export summary.mvgam_fevd = function(object, probs = c(0.025, 0.975), ...) { if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) # Calculate posterior quantiles of error variance contributions ynames <- names(object[[1]]) out <- do.call( rbind, lapply(seq_len(length(object)), function(draw) { fevd_df(object[[draw]], ynames = ynames) %>% dplyr::mutate(draw = draw) }) ) %>% dplyr::group_by(horizon, target, draw) %>% dplyr::mutate(total_evd = sum(evd)) %>% dplyr::ungroup() %>% dplyr::mutate(evd = evd / total_evd) %>% dplyr::group_by(horizon, target, Series) %>% dplyr::mutate( fevdQ50 = median(evd), fevd_Qlower = quantile(evd, min(probs), na.rm = TRUE), fevd_Qupper = quantile(evd, max(probs), na.rm = TRUE) ) %>% dplyr::ungroup() %>% dplyr::mutate( shock = gsub('process', 'Process', paste0(Series, ' -> ', target)) ) %>% dplyr::select(shock, horizon, fevdQ50, fevd_Qlower, fevd_Qupper) %>% dplyr::distinct() colnames(out) <- c( 'shock', 'horizon', 'fevdQ50', paste0('fevdQ', 100 * min(probs)), paste0('fevdQ', 100 * max(probs)) ) return(out) } #'Plot forecast error variance decompositions from an `mvgam_fevd` object #' #'This function takes an \code{mvgam_fevd} object and produces #'a plot of the posterior median contributions to forecast variance for each series #'in the fitted Vector Autoregression #' #'@importFrom ggplot2 ggplot aes geom_bar facet_wrap labs #' #'@param x \code{list} object of class \code{mvgam_fevd}. See [fevd()] #' #'@param ... ignored #' #'@return A \code{\link[ggplot2]{ggplot}} object, #' which can be further customized using the \pkg{ggplot2} package #' #'@author Nicholas J Clark #' #'@export plot.mvgam_fevd = function(x, ...) { # Calculate posterior median error variance contributions ynames <- names(x[[1]]) do.call( rbind, lapply(seq_len(length(x)), function(draw) { fevd_df(x[[draw]], ynames = ynames) }) ) %>% dplyr::group_by(horizon, target, Series) %>% dplyr::summarise(mean_evd = median(evd)) %>% dplyr::ungroup() %>% dplyr::group_by(horizon, target) %>% dplyr::mutate( total_evd = sum(mean_evd), mean_evd = mean_evd / total_evd ) %>% dplyr::ungroup() %>% dplyr::mutate( Series = gsub('process', 'Process', Series), target = gsub('process', 'Process', target) ) -> mean_evds # Plot as a ggplot object ggplot2::ggplot( mean_evds, ggplot2::aes(fill = Series, y = mean_evd, x = horizon) ) + ggplot2::geom_bar(position = "stack", stat = "identity") + ggplot2::facet_wrap(~target) + ggplot2::theme_bw() + ggplot2::labs( x = 'Forecast horizon', y = 'Median contribution to forecast variance' ) } #'@noRd fevd_df = function(x, ynames) { do.call( rbind, lapply(seq_len(length(x)), function(process) { data.frame( horizon = 1:NROW(x[[process]]), evd = as.vector(x[[process]]), Series = paste0( 'process_', sort(rep( 1:length(ynames), NROW(x[[process]]) )) ), target = ynames[process] ) }) ) } ================================================ FILE: R/mvgam_forecast-class.R ================================================ #' `mvgam_forecast` object description #' #' A \code{mvgam_forecast} object returned by function \code{\link{hindcast}} #' or \code{\link{forecast}}. Run `methods(class = "mvgam_forecast")` to see #' an overview of available methods. #' #' @details A `mvgam_forecast` object contains the following elements: #' #' \itemize{ #' \item `call` the original observation model formula #' #' \item `trend_call` If a `trend_formula was supplied`, the original trend #' model formula is returned. Otherwise `NULL` #' #' \item `family` \code{character} description of the observation distribution #' #' \item `family_pars` \code{list} containing draws of family-specific #' parameters (i.e. shape, scale or overdispersion parameters). Only #' returned if `type = link`. Otherwise `NULL` #' #' \item `trend_model` \code{character} description of the latent trend model #' #' \item `drift` Logical specifying whether a drift term was used in the #' trend model #' #' \item `use_lv` Logical flag indicating whether latent dynamic factors were #' used in the model #' #' \item `fit_engine` `Character` describing the fit engine, either as `stan` #' or `jags` #' #' \item `type` The type of predictions included (either `link`, `response` #' or `trend`) #' #' \item `series_names` Names of the time series, taken from #' `levels(data$series)` in the original model fit #' #' \item `train_observations` A `list` of training observation vectors of #' length `n_series` #' #' \item `train_times` A `list` of the unique training times of length #' `n_series` #' #' \item `test_observations` If the \code{\link{forecast}} function was used, #' a `list` of test observation vectors of length `n_series`. Otherwise #' `NULL` #' #' \item `test_times` If the \code{\link{forecast}} function was used, a #' `list` of the unique testing (validation) times of length `n_series`. #' Otherwise `NULL` #' #' \item `hindcasts` A `list` of posterior hindcast distributions of length #' `n_series`. #' #' \item `forecasts` If the \code{\link{forecast}} function was used, a #' `list` of posterior forecast distributions of length `n_series`. #' Otherwise `NULL` #' } #' #' @seealso [mvgam], [hindcast.mvgam], [forecast.mvgam] #' #' @author Nicholas J Clark #' #' @name mvgam_forecast-class NULL #' @title Posterior summary of hindcast and forecast objects #' #' @description This function takes an \code{mvgam_forecast} object and #' calculates a posterior summary of the hindcast and forecast distributions #' of each series, along with any true values that were included in `data` #' and `newdata` if `type = 'response'` was used in the call to #' \code{hindcast()} or \code{function()} #' #' @param object an object of class `mvgam_forecast` obtained using either the #' \code{hindcast()} or \code{function()} function. This object will contain #' draws from the posterior distribution of hindcasts and forecasts. #' #' @param probs The upper and lower percentiles to be computed by the #' `quantile` function, in addition to the median #' #' @param ... ignored #' #' @return A long-format `tibble` / `data.frame` reporting the posterior median, #' upper and lower percentiles of the predictions for each series at each of #' the timepoints that were originally supplied in `data` and, optionally, #' in `newdata`. #' #' @method summary mvgam_forecast #' #' @seealso \code{\link{forecast.mvgam}}, \code{\link{plot.mvgam_forecast}} #' #' @author Nicholas J Clark #' #' @export summary.mvgam_forecast = function(object, probs = c(0.025, 0.975), ...) { if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) n_series <- length(object$series_names) type <- object$type # Extract predictions and truths (if type = 'response') fc_preds <- do.call( rbind, lapply(1:n_series, function(x) { s_name <- object$series_names[x] preds <- cbind( object$hindcasts[[which(names(object$hindcasts) == s_name)]], object$forecasts[[which(names(object$forecasts) == s_name)]] ) # Calculate quantiles of the forecast distribution cred <- sapply( 1:NCOL(preds), function(n) quantile(preds[, n], probs = probs, na.rm = TRUE) ) meds <- apply(preds, 2, median) # Put into a long "tidy" dataframe if (type == 'response') { df <- data.frame( series = s_name, time = c( object$train_times[[which(names(object$hindcasts) == s_name)]], object$test_times[[which(names(object$hindcasts) == s_name)]] ), pred_median = meds, pred_Qlower = cred[1, ], pred_Qupper = cred[2, ], truth = c( object$train_observations[[s_name]], object$test_observations[[s_name]] ), type = 'response' ) colnames(df) <- c( 'series', 'time', 'predQ50', paste0('predQ', 100 * min(probs)), paste0('predQ', 100 * max(probs)), 'truth', 'type' ) rownames(df) <- NULL } else { df <- data.frame( series = s_name, time = c( object$train_times[[which(names(object$hindcasts) == s_name)]], object$test_times[[which(names(object$hindcasts) == s_name)]] ), predQ50 = meds, predQlower = cred[1, ], predQupper = cred[2, ], type = type ) colnames(df) <- c( 'series', 'time', 'predQ50', paste0('predQ', 100 * min(probs)), paste0('predQ', 100 * max(probs)), 'type' ) rownames(df) <- NULL } df }) ) %>% dplyr::mutate( series = factor(series, levels = object$series_names) ) class(fc_preds) <- c("tbl_df", "tbl", "data.frame") return(fc_preds) } ================================================ FILE: R/mvgam_formulae.R ================================================ #' Details of formula specifications in \pkg{mvgam} models #' @details \code{\link{mvgam}} will accept an observation model formula and an optional #' process model formula (via the argument `trend_formula`). Neither of these formulae can #' be specified as lists, contrary to the accepted behaviour in some `mgcv` or `brms` models. #' \cr #' \cr #' Note that it is possible to supply an empty formula where #' there are no predictors or intercepts in the observation model (i.e. `y ~ 0` or `y ~ -1`). #' In this case, an intercept-only observation model will be set up but the intercept coefficient #' will be fixed at zero. This can be handy if you wish to fit pure State-Space models where #' the variation in the dynamic trend controls the average expectation, and/or where intercepts #' are non-identifiable. #' \cr #' \cr #' The formulae supplied to \code{\link{mvgam}} and \code{\link{jsdgam}} #' are exactly like those supplied to #' \code{\link{glm}} except that smooth terms, #' \code{\link[mgcv]{s}}, #' \code{\link[mgcv]{te}}, #' \code{\link[mgcv]{ti}} and #' \code{\link[mgcv]{t2}}, #' time-varying effects using \code{\link{dynamic}}, #' monotonically increasing (using `s(x, bs = 'moi')`) #' or decreasing splines (using `s(x, bs = 'mod')`; #' see \code{\link{smooth.construct.moi.smooth.spec}} for #' details), as well as #' Gaussian Process functions using \code{\link[brms]{gp}} and offsets using #' \code{\link[stats]{offset}} #' can be added to the right hand side (and \code{.} is not supported in \code{mvgam} formulae). #' \cr #' \cr #' Further details on specifying different kinds of smooth functions, and how to control their behaviours #' by modifying their potential complexities and / or how the penalties behave, can be found in the #' extensive documentation for the `mgcv` package. #' @seealso \code{\link{mvgam}}, #' \code{\link[mgcv]{formula.gam}}, #' \code{\link[mgcv]{gam.models}}, #' \code{\link[mgcv]{jagam}}, #' \code{\link[mgcv]{gam}}, #' \code{\link[mgcv]{s}}, #' \code{\link[brms]{gp}}, #' \code{\link[stats]{formula}} #' @author Nicholas J Clark #' @name mvgam_formulae NULL #' @export #' @importFrom brms gp brms::gp #' @export #' @importFrom mgcv s mgcv::s #' @export #' @importFrom mgcv te mgcv::te #' @export #' @importFrom mgcv ti mgcv::ti #' @export #' @importFrom mgcv t2 mgcv::t2 ================================================ FILE: R/mvgam_irf-class.R ================================================ #' `mvgam_irf` object description #' #' A \code{mvgam_irf} object returned by function \code{\link{irf}}. #' Run `methods(class = "mvgam_irf")` to see an overview of available methods. #' #' @details Generalized or Orthogonalized Impulse Response Functions can be #' computed using the posterior estimates of Vector Autoregressive parameters. #' This function generates a positive "shock" for a target process at time #' `t = 0` and then calculates how each of the remaining processes in the #' latent VAR are expected to respond over the forecast horizon `h`. The #' function computes IRFs for all processes in the object and returns them in #' an array that can be plotted using the S3 `plot` function. To inspect #' community-level metrics of stability using latent VAR processes, you can #' use the related [stability()] function. #' #' A `mvgam_irf` object contains a `list` of posterior impulse response #' functions, each stored as its own `list` #' #' @seealso [mvgam], [VAR] #' #' @references PH Pesaran & Shin Yongcheol (1998). #' Generalized impulse response analysis in linear multivariate models. #' Economics Letters 58: 17–29. #' #' @author Nicholas J Clark #' #' @name mvgam_irf-class NULL #' @title Posterior summary of impulse responses #' #' @description This function takes an \code{mvgam_irf} object and #' calculates a posterior summary of the impulse responses of each #' series to shocks from each of the other series, at all horizons #' #' @param object an object of class `mvgam_irf` obtained using the #' \code{irf()} function. This object will contain draws from the posterior #' distribution of the impulse responses. #' #' @param probs The upper and lower percentiles to be computed by the #' `quantile` function, in addition to the median #' #' @param ... ignored #' #' @return A long-format `tibble` / `data.frame` reporting the posterior median, #' upper and lower percentiles of the impulse responses of each series to #' shocks from each of the other series at all horizons. #' #' @method summary mvgam_irf #' #' @seealso \code{\link{irf}}, \code{\link{plot.mvgam_irf}} #' #' @author Nicholas J Clark #' #' @export summary.mvgam_irf = function(object, probs = c(0.025, 0.975), ...) { if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) n_processes <- dim(object[[1]][[1]])[2] h <- dim(object[[1]][[1]])[1] n_draws <- length(object) out <- do.call( rbind, lapply(1:n_processes, function(series) { # Extract IRFs for the specific series impulse_responses <- lapply(seq_along(object), function(j) { object[[j]][series] }) responses <- do.call( rbind, lapply(seq_along(impulse_responses), function(j) { data.frame( horizon = 1:h, imp_resp = as.vector(impulse_responses[[j]][[1]]), resp_var = paste0( 'Process_', sort(rep( 1:n_processes, NROW(impulse_responses[[j]][[1]]) )) ) ) }) ) %>% dplyr::mutate(shock = paste0('Process_', series, ' -> ', resp_var)) %>% # Calculate posterior empirical quantiles of impulse responses dplyr::group_by(shock, horizon) %>% dplyr::summarise( irfQ50 = median(imp_resp), irfQlower = quantile(imp_resp, min(probs), na.rm = TRUE), irfQupper = quantile(imp_resp, max(probs), na.rm = TRUE), .groups = 'keep' ) %>% dplyr::ungroup() colnames(responses) <- c( 'shock', 'horizon', 'irfQ50', paste0('irfQ', 100 * min(probs)), paste0('irfQ', 100 * max(probs)) ) responses }) ) return(out) } #' Plot impulse responses from an `mvgam_irf` object #' #' This function takes an \code{mvgam_irf} object and produces plots of #' Impulse Response Functions #' #' @param x \code{list} object of class \code{mvgam_irf}. See [irf()] #' #' @param series \code{integer} specifying which process series should be #' given the shock #' #' @param ... ignored #' #' @return A `ggplot` object showing the expected response of each latent time #' series to a shock of the focal `series` #' #' @author Nicholas J Clark #' #' @export plot.mvgam_irf = function(x, series = 1, ...) { all_irfs <- x validate_pos_integer(series) n_processes <- dim(all_irfs[[1]][[1]])[2] if (series > n_processes) { stop(paste0("argument 'series' must be <= ", n_processes), call. = FALSE) } h <- dim(all_irfs[[1]][[1]])[1] # Extract IRFs for the specific series impulse_responses <- lapply(seq_along(all_irfs), function(j) { all_irfs[[j]][series] }) # Extract impulse responses to a shock in the focal series # in tidy format for ggploting responses <- do.call( rbind, lapply(seq_along(impulse_responses), function(j) { data.frame( horizon = 1:h, imp_resp = as.vector(impulse_responses[[j]][[1]]), resp_var = sort(rep( paste0('Process_~', 1:n_processes), NROW(impulse_responses[[j]][[1]]) )) ) }) ) %>% dplyr::mutate( resp_var = paste0('Process_~', series, ' %->% ', resp_var) ) %>% # Calculate posterior empirical quantiles of impulse responses dplyr::group_by(resp_var, horizon) %>% dplyr::summarise( med = median(imp_resp), lower1 = quantile(imp_resp, 0.1, na.rm = TRUE), lower2 = quantile(imp_resp, 0.2, na.rm = TRUE), lower3 = quantile(imp_resp, 0.3, na.rm = TRUE), lower4 = quantile(imp_resp, 0.4, na.rm = TRUE), upper1 = quantile(imp_resp, 0.9, na.rm = TRUE), upper2 = quantile(imp_resp, 0.8, na.rm = TRUE), upper3 = quantile(imp_resp, 0.7, na.rm = TRUE), upper4 = quantile(imp_resp, 0.6, na.rm = TRUE), .groups = 'keep' ) %>% dplyr::ungroup() # Plot the IRFs ggplot2::ggplot(data = responses, ggplot2::aes(x = horizon, y = med)) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "#DCBCBC" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower2, ymax = upper2), fill = "#C79999" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower3, ymax = upper3), fill = "#B97C7C" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower4, ymax = upper4), fill = "#A25050" ) + ggplot2::geom_line(col = "#8F2727", linewidth = 1) + ggplot2::geom_hline(yintercept = 0, linetype = "dashed", colour = "black") + ggplot2::facet_wrap( ~resp_var, scales = 'free_y', labeller = ggplot2::label_parsed ) + ggplot2::labs( x = "Horizon", y = paste0(attr(x, 'irf_type'), " impulse response") ) + ggplot2::theme_bw() } ================================================ FILE: R/mvgam_residcor-class.R ================================================ #' `mvgam_residcor` object description #' #' A \code{mvgam_residcor} object returned by function [residual_cor()]. #' Run `methods(class = "mvgam_residcor")` to see an overview of available methods. #' @return Objects of this class are structured as a `list` with the following components: #' #' \item{cor, cor_lower, cor_upper}{A set of \eqn{p \times p} correlation matrices, #' containing either the posterior median or mean estimate, plus lower and upper limits #' of the corresponding credible intervals supplied to `probs`} #' \item{sig_cor}{A \eqn{p \times p} correlation matrix containing only those correlations whose credible #' interval does not contain zero. All other correlations are set to zero} #' \item{prec, prec_lower, prec_upper}{A set of \eqn{p \times p} precision matrices, #' containing either the posterior median or mean estimate, plus lower and upper limits #' of the corresponding credible intervals supplied to `probs`} #' \item{sig_prec}{A \eqn{p \times p} precision matrix containing only those precisions whose credible #' interval does not contain zero. All other precisions are set to zero} #' \item{cov}{A \eqn{p \times p} posterior median or mean covariance matrix} #' \item{trace}{The median/mean point estimator of the trace (sum of the diagonal elements) #' of the residual covariance matrix `cov`} #' #' @details #' Hui (2016) provides an excellent description of the quantities that this function calculates, so this passage #' is heavily paraphrased from his associated \pkg{boral} package. #' #' In latent factor models, the residual covariance matrix is calculated #' based on the matrix of latent factor loading matrix \eqn{\Theta}, where the residual covariance #' matrix \eqn{\Sigma = \Theta\Theta'}. A strong residual covariance/correlation matrix #' between two species can be interpreted as evidence of species interactions (e.g., #' facilitation or competition), #' missing covariates, as well as any additional species correlation not accounted for by shared #' environmental captured in `formula`. #' #' The residual precision matrix (also known as partial correlation matrix, Ovaskainen et al., 2016) #' is defined as the inverse of the residual correlation matrix. The precision matrix is often used to #' identify direct or causal relationships between two species e.g., two species can have a zero #' precision but still be correlated, which can be interpreted as saying that two species are not #' directly associated, but they are still correlated *through* other species. In other words, they #' are conditionally independent given the other species. It is important that the precision matrix #' does not exhibit the exact same properties of the correlation e.g., the diagonal elements are #' not equal to 1. Nevertheless, relatively larger values of precision may imply stronger #' direct relationships between two species. #' #' In addition to the residual correlation and precision matrices, the median or mean point estimator #' of trace of the residual covariance matrix is returned, #' \eqn{\sum\limits_{j=1}^p [\Theta\Theta']_{jj}}. Often used in other areas of multivariate #' statistics, the trace may be interpreted as the amount of covariation explained by the latent factors. #' One situation where the trace may be useful is when comparing a pure latent factor model #' (where no terms are suppled to `formula`) versus a model with latent #' factors and some additional predictors in `formula` -- the proportional difference in trace #' between these two models may be interpreted as the proportion of covariation between species explained #' by the predictors in `formula`. Of course, the trace itself is random due to the MCMC sampling, and so it #' is not always guaranteed to produce sensible answers. #' @author Nicholas J Clark #' @references #' Francis KC Hui (2016). BORAL - Bayesian ordination and regression analysis of #' multivariate abundance data in R. Methods in Ecology and Evolution. 7, 744-750. #' \cr #' \cr #' Otso Ovaskainen et al. (2016). Using latent variable models to identify large networks of #' species-to-species associations at different spatial scales. Methods in Ecology and Evolution, #' 7, 549-555. #' @seealso [jsdgam()], [residual_cor()] #' @author Nicholas J Clark #' @name mvgam_residcor-class NULL #' Plot residual correlations based on latent factors #' #' Plot residual correlation estimates from Joint Species Distribution #' (\code{jsdgam}) or dynamic factor (\code{mvgam}) models #' @param x \code{list} object of class \code{mvgam_residcor} resulting from a #' call to `residual_cor(..., summary = TRUE)` #' @param cluster Logical. Should the variables be re-arranged within the plot #' to group the correlation matrix into clusters of positive and negative correlations? #' Defaults to `FALSE` #' @param ... ignored #' @method plot mvgam_residcor #' @details This function plots the significant residual correlations from a #' \code{mvgam_residcor} object, whereby the posterior mean (if `robust = FALSE`) #' or posterior median (if `robust = TRUE`) correlations are shown #' only those correlations whose credible interval does not contain zero. All other #' correlations are set to zero in the returned plot #' @return A `ggplot` object #' @seealso [jsdgam()], [lv_correlations()], [residual_cor()] #' #' @author Nicholas J Clark #' #' @export plot.mvgam_residcor = function(x, cluster = FALSE, ...) { # Extract significant correlations corrmat <- x$sig_cor # Re-order into clusters, if specified if (cluster) { idx <- cluster_cormat(corrmat) } else { idx <- 1:NROW(corrmat) } # Plot the correlation matrix ggplot2::ggplot( data = gather_matrix(corrmat[idx, idx]), mapping = ggplot2::aes( x = Var1, y = Var2, fill = correlation ) ) + ggplot2::geom_tile(colour = 'grey50') + ggplot2::scale_fill_gradient2( breaks = seq(-1, 1, by = 0.5), limits = c(-1, 1) ) + ggplot2::labs(x = '', y = '') + ggplot2::scale_x_discrete(guide = ggplot2::guide_axis(angle = 45)) + ggplot2::theme_minimal() } #' Melt a symmetric matrix into a long data.frame #' @noRd gather_matrix <- function(mat) { mat[upper.tri(mat)] <- NA if (is.null(dimnames(mat))) { grid <- expand.grid(seq.int(NROW(mat)), seq.int(NCOL(mat))) } else { grid <- expand.grid(dimnames(mat)) } out <- as.data.frame(cbind(grid, value = as.vector(mat))) colnames(out) <- c('Var1', 'Var2', 'correlation') return(out) } #' Order a symmetric correlation matrix using approximate Robinson #' ordering, for better visualisation of "clusters" #' Credit for these functions goes to the maintainers of the gclus R package #' @importFrom stats hclust as.dist #' @noRd cluster_cormat <- function(cormat, ...) { dis <- -cormat dis_d <- as.dist(dis) n <- NROW(dis) if (n <= 2) { idx <- 1:n } else { clusters <- stats::hclust(dis_d, ...) clusters <- reorder_clusters(clusters, dis) idx <- clusters$order } return(idx) } #' @noRd reorder_clusters <- function(x, dis, ...) { if (!is.matrix(dis)) { dis <- as.matrix(dis) } merges <- x$merge n <- NROW(merges) endpoints <- matrix(0, n, 2) dir <- matrix(1L, n, 2) for (i in 1L:n) { j <- merges[i, 1] k <- merges[i, 2] if ((j < 0) && (k < 0)) { endpoints[i, 1] <- -j endpoints[i, 2] <- -k } else if (j < 0) { j <- -j endpoints[i, 1] <- j e1 <- endpoints[k, 1] e2 <- endpoints[k, 2] if (dis[j, e1] < dis[j, e2]) { endpoints[i, 2] <- e2 } else { endpoints[i, 2] <- e1 dir[i, 2] <- -1 } } else if (k < 0) { k <- -k endpoints[i, 2] <- k e1 <- endpoints[j, 1] e2 <- endpoints[j, 2] if (dis[k, e1] < dis[k, e2]) { endpoints[i, 1] <- e2 dir[i, 1] <- -1 } else { endpoints[i, 1] <- e1 } } else { ek1 <- endpoints[k, 1] ek2 <- endpoints[k, 2] ej1 <- endpoints[j, 1] ej2 <- endpoints[j, 2] d11 <- dis[ej1, ek1] d12 <- dis[ej1, ek2] d21 <- dis[ej2, ek1] d22 <- dis[ej2, ek2] dmin <- min(d11, d12, d21, d22) if (dmin == d21) { endpoints[i, 1] <- ej1 endpoints[i, 2] <- ek2 } else if (dmin == d11) { endpoints[i, 1] <- ej2 endpoints[i, 2] <- ek2 dir[i, 1] <- -1 } else if (dmin == d12) { endpoints[i, 1] <- ej2 endpoints[i, 2] <- ek1 dir[i, 1] <- -1 dir[i, 2] <- -1 } else { endpoints[i, 1] <- ej1 endpoints[i, 2] <- ek1 dir[i, 2] <- -1 } } } for (i in n:2L) { if (dir[i, 1] == -1) { m <- merges[i, 1] if (m > 0) { m1 <- merges[m, 1] merges[m, 1] <- merges[m, 2] merges[m, 2] <- m1 if (dir[m, 1] == dir[m, 2]) { dir[m, ] <- -dir[m, ] } } } if (dir[i, 2] == -1) { m <- merges[i, 2] if (m > 0) { m1 <- merges[m, 1] merges[m, 1] <- merges[m, 2] merges[m, 2] <- m1 if (dir[m, 1] == dir[m, 2]) { dir[m, ] <- -dir[m, ] } } } } clusters <- as.list(1:n) for (i in 1:n) { j <- merges[[i, 1]] k <- merges[[i, 2]] if ((j < 0) && (k < 0)) { clusters[[i]] <- c(-j, -k) } else if (j < 0) { clusters[[i]] <- c(-j, clusters[[k]]) } else if (k < 0) { clusters[[i]] <- c(clusters[[j]], -k) } else { clusters[[i]] <- c(clusters[[j]], clusters[[k]]) } } x1 <- x x1$merge <- merges x1$order <- clusters[[n]] return(x1) } ================================================ FILE: R/mvgam_setup.R ================================================ #' Generic GAM setup function #' @importFrom stats na.fail #' @noRd mvgam_setup <- function( formula, knots, family = gaussian(), dat = list(), na.action, drop.unused.levels = FALSE, maxit = 5 ) { if (missing(knots)) { out <- init_gam(formula(formula), data = dat, family = family) attr(out, 'knots') <- NULL } else { if (!is.list(knots)) { stop('all "knot" arguments must be supplied as lists', call. = FALSE) } out <- init_gam( formula(formula), data = dat, family = family, knots = knots ) attr(out, 'knots') <- knots } out } #' Generic JAGAM setup function #' @noRd #' jagam_setup <- function( ss_gam, formula, data_train, family, family_char, knots ) { # Change the formula to a Poisson-like formula if this is a cbind Binomial, # as jagam will fail if it sees that if (family$family %in% c('binomial', 'beta_binomial')) { resp_terms <- as.character(terms(formula(formula))[[2]]) if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] out_name <- resp_terms[1] } else { stop( 'Binomial family requires the cbind() left-hand side formula syntax', call. = FALSE ) } formula <- update(formula, paste(out_name, '~ .')) family <- poisson() } # Set file save location in tempdir file_name <- tempfile(pattern = 'base_gam', fileext = '.txt') if (length(ss_gam$smooth) == 0) { smooths_included <- FALSE # If no smooth terms are included, jagam will fail; so add a fake one and remove # it from the model and data structures later data_train$fakery <- rnorm(length(data_train$y)) form_fake <- update.formula(formula, ~ . + s(fakery, k = 3)) fakery_names <- names( mvgam_setup( formula = form_fake, family = family_to_mgcvfam(family), dat = data_train, drop.unused.levels = FALSE )$coefficients ) xcols_drop <- grep('s(fakery', fakery_names, fixed = TRUE) if (!missing(knots)) { ss_jagam <- mgcv::jagam( form_fake, data = data_train, family = family_to_jagamfam(family_char), file = file_name, sp.prior = 'gamma', diagonalize = FALSE, knots = knots, drop.unused.levels = FALSE ) } else { ss_jagam <- mgcv::jagam( form_fake, data = data_train, family = family_to_jagamfam(family_char), file = file_name, sp.prior = 'gamma', diagonalize = FALSE, drop.unused.levels = FALSE ) } data_train$fakery <- NULL } else { smooths_included <- TRUE xcols_drop <- NULL # If smooth terms included, use the original formula if (!missing(knots)) { ss_jagam <- mgcv::jagam( formula, data = data_train, family = family_to_jagamfam(family_char), file = file_name, sp.prior = 'gamma', diagonalize = FALSE, knots = knots, drop.unused.levels = FALSE ) } else { ss_jagam <- mgcv::jagam( formula, data = data_train, family = family_to_jagamfam(family_char), file = file_name, sp.prior = 'gamma', diagonalize = FALSE, drop.unused.levels = FALSE ) } } return(list( file_name = file_name, ss_jagam = ss_jagam, smooths_included = smooths_included, xcols_drop = xcols_drop )) } #' @noRd get_offset <- function(model) { nm1 <- names(attributes(model$terms)$dataClasses) if ('(offset)' %in% nm1) { deparse(as.list(model$call)$offset) } else { sub("offset\\((.*)\\)$", "\\1", grep('offset', nm1, value = TRUE)) } } #' @noRd trim_mgcv <- function(mgcv_model) { mgcv_model$fitted.values <- mgcv_model$residuals <- mgcv_model$linear.predictors <- mgcv_model$working.weights <- mgcv_model$z <- NULL mgcv_model } #' Fill in missing observations in data_train so the size of the dataset is correct when #' building the initial JAGS model #' @noRd replace_nas = function(var) { if (all(is.na(var))) { # Sampling from uniform[0.1,0.99] will allow all the gam models # to work, even though the Poisson / Negative Binomial will issue # warnings. This is ok as we just need to produce the linear predictor matrix # and store the coefficient names var <- runif(length(var), 0.1, 0.99) } else { # If there are some non-missing observations, # sample from the observed values to ensure # distributional assumptions are met without warnings var[which(is.na(var))] <- sample(var[which(!is.na(var))], length(which(is.na(var))), replace = TRUE) } var } #' The below functions are mostly perfect copies of functions #' written originally by Prof Simon Wood #' All credit goes to Prof Wood and the mgcv development team. #' They only exist in mvgam because of CRAN restrictions on #' calling internal functions from other packages #' @noRd rmvn <- function(n, mu, sig) { L <- mgcv::mroot(sig) m <- ncol(L) t(mu + L %*% matrix(rnorm(m * n), m, n)) } #' @noRd init_gam <- function( formula, family = gaussian(), data = list(), na.action = na.omit, knots = NULL, drop.unused.levels = FALSE, control = mgcv::gam.control(), centred = TRUE, diagonalize = FALSE, sp = NULL ) { if (is.character(family)) { family <- eval(parse(text = family)) } if (is.function(family)) { family <- family() } if (is.null(family$family)) { stop("family not recognized") } gp <- mgcv::interpret.gam(formula) # interpret the formula cl <- match.call() # call needed in gam object for update to work mf <- match.call(expand.dots = FALSE) mf$formula <- gp$fake.formula mf$family <- mf$knots <- mf$sp <- mf$file <- mf$control <- mf$centred <- mf$sp.prior <- mf$diagonalize <- NULL mf$drop.unused.levels <- drop.unused.levels mf[[1]] <- quote(stats::model.frame) ##as.name("model.frame") pmf <- mf pmf$formula <- gp$pf pmf <- eval(pmf, parent.frame()) pterms <- attr(pmf, "terms") rm(pmf) mf <- eval(mf, parent.frame()) if (nrow(mf) < 2) { stop("Not enough (non-NA) data to do anything meaningful") } terms <- attr(mf, "terms") ## summarize the *raw* input variables ## note can't use get_all_vars here -- buggy with matrices vars <- all.vars(gp$fake.formula[-2]) ## drop response here inp <- parse(text = paste("list(", paste(vars, collapse = ","), ")")) if (!is.list(data) && !is.data.frame(data)) { data <- as.data.frame(data) } dl <- eval(inp, data, parent.frame()) if (!control$keepData) { rm(data) } ## save space names(dl) <- vars ## list of all variables needed var.summary <- variable_summary(gp$pf, dl, nrow(mf)) ## summarize the input data rm(dl) G <- gam_setup( gp, pterms = pterms, data = mf, knots = knots, sp = sp, H = NULL, absorb.cons = centred, sparse.cons = FALSE, select = TRUE, idLinksBases = TRUE, scale.penalty = control$scalePenalty, diagonal.penalty = diagonalize ) G$model <- mf G$terms <- terms G$family <- family G$call <- cl G$var.summary <- var.summary lambda <- initial_spg( G$X, G$y, G$w, family, G$S, G$rank, G$off, offset = G$offset, L = G$L ) jags.ini <- list() lam <- if (is.null(G$L)) lambda else G$L %*% lambda #jin <- mgcv:::jini(G,lam) G$formula <- formula G$coefficients <- rep(0, length(G$term.names)) names(G$coefficients) <- G$term.names G$residuals <- rnorm(NROW(G$X)) G$edf <- rep(1, length(G$coefficients)) names(G$edf) <- G$term.names G$edf1 <- rep(1, length(G$coefficients)) names(G$edf1) <- G$term.names G$sig2 <- 1 G$rank <- ncol(G$X) G$Vp <- G$Ve <- diag(rep(1, length(G$coefficients))) G$sp <- exp(G$sp) G$scale.estimated <- FALSE G$method <- 'UBRE' G$pred.formula <- gp$pred.formula class(G) <- c('gam', 'glm', 'lm') G$R <- model.matrix(G) return(G) } #'@importFrom mgcv gam.side smoothCon get.var Rrank interpret.gam initial.sp #'@importFrom stats .getXlevels model.matrix model.offset na.omit #'@importFrom methods cbind2 #'@noRd gam_setup <- function( formula, pterms, data = stop("No data supplied to gam_setup"), knots = NULL, sp = NULL, min.sp = NULL, H = NULL, absorb.cons = TRUE, sparse.cons = 0, select = FALSE, idLinksBases = TRUE, scale.penalty = TRUE, paraPen = NULL, gamm.call = FALSE, drop.intercept = FALSE, diagonal.penalty = FALSE, apply.by = TRUE, list.call = FALSE, modCon = 0 ) { if (inherits(formula, "split.gam.formula")) { split <- formula } else if (inherits(formula, "formula")) { split <- mgcv::interpret.gam(formula) } else { stop("First argument is no sort of formula!") } if (length(split$smooth.spec) == 0) { if (split$pfok == 0) { stop("You've got no model....") } m <- 0 } else { m <- length(split$smooth.spec) } G <- list( m = m, min.sp = min.sp, H = H, pearson.extra = 0, dev.extra = 0, n.true = -1, pterms = pterms ) if (is.null(attr(data, "terms"))) { mf <- model.frame(split$pf, data, drop.unused.levels = FALSE) } else { mf <- data } G$intercept <- attr(attr(mf, "terms"), "intercept") > 0 if (list.call) { offi <- attr(pterms, "offset") if (!is.null(offi)) { G$offset <- mf[[names(attr(pterms, "dataClasses"))[offi]]] } } else { G$offset <- model.offset(mf) } if (!is.null(G$offset)) { G$offset <- as.numeric(G$offset) } if (drop.intercept) { attr(pterms, "intercept") <- 1 } X <- model.matrix(pterms, mf) if (drop.intercept) { xat <- attributes(X) ind <- xat$assign > 0 X <- X[, ind, drop = FALSE] xat$assign <- xat$assign[ind] xat$dimnames[[2]] <- xat$dimnames[[2]][ind] xat$dim[2] <- xat$dim[2] - 1 attributes(X) <- xat G$intercept <- FALSE } rownames(X) <- NULL G$nsdf <- ncol(X) G$contrasts <- attr(X, "contrasts") G$xlevels <- .getXlevels(pterms, mf) G$assign <- attr(X, "assign") PP <- parametric_penalty(pterms, G$assign, paraPen, sp) if (!is.null(PP)) { ind <- 1:length(PP$sp) if (!is.null(sp)) { sp <- sp[-ind] } if (!is.null(min.sp)) { PP$min.sp <- min.sp[ind] min.sp <- min.sp[-ind] } } G$smooth <- list() G$S <- list() if (gamm.call) { if (m > 0) { for (i in 1:m) { attr(split$smooth.spec[[i]], "gamm") <- TRUE } } } if (m > 0 && idLinksBases) { id.list <- list() for (i in 1:m) { if (!is.null(split$smooth.spec[[i]]$id)) { id <- as.character(split$smooth.spec[[i]]$id) if (length(id.list) && id %in% names(id.list)) { ni <- length(id.list[[id]]$sm.i) id.list[[id]]$sm.i[ni + 1] <- i base.i <- id.list[[id]]$sm.i[1] split$smooth.spec[[i]] <- clone_smooth_spec( split$smooth.spec[[base.i]], split$smooth.spec[[i]] ) temp.term <- split$smooth.spec[[i]]$term for (j in 1:length(temp.term)) { id.list[[id]]$data[[j]] <- cbind( id.list[[id]]$data[[j]], mgcv::get.var(temp.term[j], data, vecMat = FALSE) ) } } else { id.list[[id]] <- list(sm.i = i) id.list[[id]]$data <- list() term <- split$smooth.spec[[i]]$term for (j in 1:length(term)) { id.list[[id]]$data[[j]] <- mgcv::get.var( term[j], data, vecMat = FALSE ) } } } } } G$off <- array(0, 0) first.para <- G$nsdf + 1 sm <- list() newm <- 0 if (m > 0) { for (i in 1:m) { id <- split$smooth.spec[[i]]$id if (is.null(id) || !idLinksBases) { sml <- mgcv::smoothCon( split$smooth.spec[[i]], data, knots, absorb.cons, scale.penalty = scale.penalty, null.space.penalty = select, sparse.cons = sparse.cons, diagonal.penalty = diagonal.penalty, apply.by = apply.by, modCon = modCon ) } else { names(id.list[[id]]$data) <- split$smooth.spec[[i]]$term sml <- mgcv::smoothCon( split$smooth.spec[[i]], id.list[[id]]$data, knots, absorb.cons, n = nrow(data), dataX = data, scale.penalty = scale.penalty, null.space.penalty = select, sparse.cons = sparse.cons, diagonal.penalty = diagonal.penalty, apply.by = apply.by, modCon = modCon ) } ind <- 1:length(sml) sm[ind + newm] <- sml[ind] newm <- newm + length(sml) } } G$m <- m <- newm if (m > 0) { sm <- mgcv::gam.side(sm, X, tol = .Machine$double.eps^0.5) if (!apply.by) { for (i in 1:length(sm)) { if (!is.null(sm[[i]]$X0)) { ind <- attr(sm[[i]], "del.index") sm[[i]]$X <- if (is.null(ind)) { sm[[i]]$X0 } else { sm[[i]]$X0[, -ind, drop = FALSE] } } } } } idx <- list() L <- matrix(0, 0, 0) lsp.names <- sp.names <- rep("", 0) if (m > 0) { for (i in 1:m) { id <- sm[[i]]$id length.S <- if (is.null(sm[[i]]$updateS)) { length(sm[[i]]$S) } else { sm[[i]]$n.sp } Li <- if (is.null(sm[[i]]$L)) diag(length.S) else sm[[i]]$L if (length.S > 0) { if (length.S == 1) { lspn <- sm[[i]]$label } else { Sname <- names(sm[[i]]$S) lspn <- if (is.null(Sname)) { paste(sm[[i]]$label, 1:length.S, sep = "") } else { paste(sm[[i]]$label, Sname, sep = "") } } spn <- lspn[1:ncol(Li)] } if (is.null(id) || is.null(idx[[id]])) { if (!is.null(id)) { idx[[id]]$c <- ncol(L) + 1 idx[[id]]$nc <- ncol(Li) } L <- rbind( cbind(L, matrix(0, nrow(L), ncol(Li))), cbind(matrix(0, nrow(Li), ncol(L)), Li) ) if (length.S > 0) { sp.names <- c(sp.names, spn) lsp.names <- c(lsp.names, lspn) } } else { L0 <- matrix(0, nrow(Li), ncol(L)) if (ncol(Li) > idx[[id]]$nc) { stop( "Later terms sharing an `id' can not have more smoothing parameters than the first such term" ) } L0[, idx[[id]]$c:(idx[[id]]$c + ncol(Li) - 1)] <- Li L <- rbind(L, L0) if (length.S > 0) { lsp.names <- c(lsp.names, lspn) } } } } Xp <- NULL if (m > 0) { for (i in 1:m) { n.para <- ncol(sm[[i]]$X) sm[[i]]$first.para <- first.para first.para <- first.para + n.para sm[[i]]$last.para <- first.para - 1 Xoff <- attr(sm[[i]]$X, "offset") if (!is.null(Xoff)) { if (is.null(G$offset)) G$offset <- Xoff else G$offset <- G$offset + Xoff } if (is.null(sm[[i]]$Xp)) { if (!is.null(Xp)) Xp <- cbind2(Xp, sm[[i]]$X) } else { if (is.null(Xp)) { Xp <- X } Xp <- cbind2(Xp, sm[[i]]$Xp) sm[[i]]$Xp <- NULL } X <- cbind2(X, sm[[i]]$X) sm[[i]]$X <- NULL G$smooth[[i]] <- sm[[i]] } } if (is.null(Xp)) { G$cmX <- colMeans(X) } else { G$cmX <- colMeans(Xp) qrx <- qr(Xp, LAPACK = TRUE) R <- qr.R(qrx) p <- ncol(R) rank <- mgcv::Rrank(R) QtX <- qr.qty(qrx, X)[1:rank, ] if (rank < p) { R <- R[1:rank, ] qrr <- qr(t(R), tol = 0) R <- qr.R(qrr) G$P <- forwardsolve(t(R), QtX) } else { G$P <- backsolve(R, QtX) } if (rank < p) { G$P <- qr.qy(qrr, rbind(G$P, matrix(0, p - rank, p))) } G$P[qrx$pivot, ] <- G$P } G$X <- X rm(X) n.p <- ncol(G$X) if (!is.null(sp)) { ok <- TRUE if (length(sp) < ncol(L)) { warning("Supplied smoothing parameter vector is too short - ignored.") ok <- FALSE } if (sum(is.na(sp))) { warning("NA's in supplied smoothing parameter vector - ignoring.") ok <- FALSE } } else { ok <- FALSE } G$sp <- if (ok) sp[1:ncol(L)] else rep(-1, ncol(L)) names(G$sp) <- sp.names k <- 1 if (m > 0) { for (i in 1:m) { id <- sm[[i]]$id if (is.null(sm[[i]]$L)) { Li <- diag(length(sm[[i]]$S)) } else { Li <- sm[[i]]$L } if (is.null(id)) { spi <- sm[[i]]$sp if (!is.null(spi)) { if (length(spi) != ncol(Li)) { stop( "incorrect number of smoothing parameters supplied for a smooth term" ) } G$sp[k:(k + ncol(Li) - 1)] <- spi } k <- k + ncol(Li) } else { spi <- sm[[i]]$sp if (is.null(idx[[id]]$sp.done)) { if (!is.null(spi)) { if (length(spi) != ncol(Li)) { stop( "incorrect number of smoothing parameters supplied for a smooth term" ) } G$sp[idx[[id]]$c:(idx[[id]]$c + idx[[id]]$nc - 1)] <- spi } idx[[id]]$sp.done <- TRUE k <- k + idx[[id]]$nc } } } } k <- 1 if (length(idx)) { for (i in 1:length(idx)) { idx[[i]]$sp.done <- FALSE } } if (m > 0) { for (i in 1:m) { id <- sm[[i]]$id if (!is.null(id)) { if (idx[[id]]$nc > 0) { G$smooth[[i]]$sp <- G$sp[ idx[[id]]$c:(idx[[id]]$c + idx[[id]]$nc - 1) ] } if (!idx[[id]]$sp.done) { idx[[id]]$sp.done <- TRUE k <- k + idx[[id]]$nc } } else { if (is.null(sm[[i]]$L)) { nc <- length(sm[[i]]$S) } else { nc <- ncol(sm[[i]]$L) } if (nc > 0) { G$smooth[[i]]$sp <- G$sp[k:(k + nc - 1)] } k <- k + nc } } } if (!is.null(min.sp)) { if (length(min.sp) < nrow(L)) { stop("length of min.sp is wrong.") } if (nrow(L) > 0) { min.sp <- min.sp[1:nrow(L)] } if (sum(is.na(min.sp))) { stop("NA's in min.sp.") } if (sum(min.sp < 0)) stop("elements of min.sp must be non negative.") } k.sp <- 0 G$rank <- array(0, 0) if (m > 0) { for (i in 1:m) { sm <- G$smooth[[i]] if (length(sm$S) > 0) { for (j in 1:length(sm$S)) { k.sp <- k.sp + 1 G$off[k.sp] <- sm$first.para G$S[[k.sp]] <- sm$S[[j]] G$rank[k.sp] <- sm$rank[j] if (!is.null(min.sp)) { if (is.null(H)) { H <- matrix(0, n.p, n.p) } H[sm$first.para:sm$last.para, sm$first.para:sm$last.para] <- H[ sm$first.para:sm$last.para, sm$first.para:sm$last.para ] + min.sp[k.sp] * sm$S[[j]] } } } } } if (!is.null(PP)) { L <- rbind( cbind(L, matrix(0, nrow(L), ncol(PP$L))), cbind(matrix(0, nrow(PP$L), ncol(L)), PP$L) ) G$off <- c(PP$off, G$off) G$S <- c(PP$S, G$S) G$rank <- c(PP$rank, G$rank) G$sp <- c(PP$sp, G$sp) lsp.names <- c(PP$full.sp.names, lsp.names) G$n.paraPen <- length(PP$off) if (!is.null(PP$min.sp)) { if (is.null(H)) { H <- matrix(0, n.p, n.p) } for (i in 1:length(PP$S)) { ind <- PP$off[i]:(PP$off[i] + ncol(PP$S[[i]]) - 1) H[ind, ind] <- H[ind, ind] + PP$min.sp[i] * PP$S[[i]] } } } else { G$n.paraPen <- 0 } fix.ind <- G$sp >= 0 if (sum(fix.ind)) { lsp0 <- G$sp[fix.ind] ind <- lsp0 == 0 ef0 <- indi <- (1:length(ind))[ind] if (length(indi) > 0) { for (i in 1:length(indi)) { ii <- G$off[i]:(G$off[i] + ncol(G$S[[i]]) - 1) ef0[i] <- norm(G$X[, ii], type = "F")^2 / norm(G$S[[i]], type = "F") * .Machine$double.eps * 0.1 } } lsp0[!ind] <- log(lsp0[!ind]) lsp0[ind] <- log(ef0) lsp0 <- as.numeric(L[, fix.ind, drop = FALSE] %*% lsp0) L <- L[, !fix.ind, drop = FALSE] G$sp <- G$sp[!fix.ind] } else { lsp0 <- rep(0, nrow(L)) } G$H <- H if (ncol(L) == nrow(L) && !sum(L != diag(ncol(L)))) { L <- NULL } G$L <- L G$lsp0 <- lsp0 names(G$lsp0) <- lsp.names if (absorb.cons == FALSE) { G$C <- matrix(0, 0, n.p) if (m > 0) { for (i in 1:m) { if (is.null(G$smooth[[i]]$C)) { n.con <- 0 } else { n.con <- nrow(G$smooth[[i]]$C) } C <- matrix(0, n.con, n.p) C[, G$smooth[[i]]$first.para:G$smooth[[i]]$last.para] <- G$smooth[[i]]$C G$C <- rbind(G$C, C) G$smooth[[i]]$C <- NULL } rm(C) } } G$y <- drop(data[[split$response]]) ydim <- dim(G$y) if (!is.null(ydim) && length(ydim) < 2) { dim(G$y) <- NULL } G$n <- nrow(data) if (is.null(data$"(weights)")) { G$w <- rep(1, G$n) } else { G$w <- data$"(weights)" } if (G$nsdf > 0) { term.names <- colnames(G$X)[1:G$nsdf] } else { term.names <- array("", 0) } n.smooth <- length(G$smooth) n.sp0 <- 0 if (n.smooth) { for (i in 1:n.smooth) { k <- 1 jj <- G$smooth[[i]]$first.para:G$smooth[[i]]$last.para if (G$smooth[[i]]$df > 0) { for (j in jj) { term.names[j] <- paste( G$smooth[[i]]$label, ".", as.character(k), sep = "" ) k <- k + 1 } } n.sp <- length(G$smooth[[i]]$S) if (n.sp) { G$smooth[[i]]$first.sp <- n.sp0 + 1 n.sp0 <- G$smooth[[i]]$last.sp <- n.sp0 + n.sp } if (!is.null(G$smooth[[i]]$g.index)) { if (is.null(G$g.index)) { G$g.index <- rep(FALSE, n.p) } G$g.index[jj] <- G$smooth[[i]]$g.index } } } G$term.names <- term.names G$pP <- PP G } #' @noRd parametric_penalty <- function(pterms, assign, paraPen, sp0) { S <- list() off <- rep(0, 0) rank <- rep(0, 0) sp <- rep(0, 0) full.sp.names <- rep("", 0) L <- matrix(0, 0, 0) k <- 0 tind <- unique(assign) n.t <- length(tind) if (n.t > 0) { for (j in 1:n.t) { if (tind[j] > 0) { term.label <- attr(pterms[tind[j]], "term.label") P <- paraPen[[term.label]] if (!is.null(P)) { ind <- (1:length(assign))[assign == tind[j]] Li <- P$L P$L <- NULL spi <- P$sp P$sp <- NULL ranki <- P$rank P$rank <- NULL np <- length(P) if (!is.null(ranki) && length(ranki) != np) { stop("`rank' has wrong length in `paraPen'") } if (np) { for (i in 1:np) { k <- k + 1 S[[k]] <- P[[i]] off[k] <- min(ind) if (ncol(P[[i]]) != nrow(P[[i]]) || nrow(P[[i]]) != length(ind)) { stop(" a parametric penalty has wrong dimension") } if (is.null(ranki)) { ev <- eigen(S[[k]], symmetric = TRUE, only.values = TRUE)$values rank[k] <- sum(ev > max(ev) * .Machine$double.eps * 10) } else { rank[k] <- ranki[i] } } } if (np) { if (is.null(Li)) { Li <- diag(np) } if (nrow(Li) != np) { stop("L has wrong dimension in `paraPen'") } L <- rbind( cbind(L, matrix(0, nrow(L), ncol(Li))), cbind(matrix(0, nrow(Li), ncol(L)), Li) ) ind <- (length(sp) + 1):(length(sp) + ncol(Li)) ind2 <- (length(sp) + 1):(length(sp) + nrow(Li)) if (is.null(spi)) { sp[ind] <- -1 } else { if (length(spi) != ncol(Li)) { stop("`sp' dimension wrong in `paraPen'") } sp[ind] <- spi } if (length(ind) > 1) { names(sp)[ind] <- paste( term.label, ind - ind[1] + 1, sep = "" ) } else { names(sp)[ind] <- term.label } if (length(ind2) > 1) { full.sp.names[ind2] <- paste( term.label, ind2 - ind2[1] + 1, sep = "" ) } else { full.sp.names[ind2] <- term.label } } } } } } if (k == 0) { return(NULL) } if (!is.null(sp0)) { if (length(sp0) < length(sp)) { stop("`sp' too short") } sp0 <- sp0[1:length(sp)] sp[sp < 0] <- sp0[sp < 0] } list( S = S, off = off, sp = sp, L = L, rank = rank, full.sp.names = full.sp.names ) } #' @noRd clone_smooth_spec <- function(specb, spec) { if (specb$dim != spec$dim) { stop("`id' linked smooths must have same number of arguments") } if (inherits(specb, c("tensor.smooth.spec", "t2.smooth.spec"))) { specb$term <- spec$term specb$label <- spec$label specb$by <- spec$by k <- 1 for (i in 1:length(specb$margin)) { if (is.null(spec$margin)) { for (j in 1:length(specb$margin[[i]]$term)) { specb$margin[[i]]$term[j] <- spec$term[k] k <- k + 1 } specb$margin[[i]]$label <- "" } else { specb$margin[[i]]$term <- spec$margin[[i]]$term specb$margin[[i]]$label <- spec$margin[[i]]$label specb$margin[[i]]$xt <- spec$margin[[i]]$xt } } } else { specb$term <- spec$term specb$label <- spec$label specb$by <- spec$by specb$xt <- spec$xt } specb } #' Summarize all the variables in a list of variables #' #' This function is derived from \code{mgcv:::variable.summary} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @noRd variable_summary <- function(pf, dl, n) { v.n <- length(dl) v.name <- v.name1 <- names(dl) if (v.n) { k <- 0 for (i in 1:v.n) { if (length(dl[[i]]) >= n) { k <- k + 1 v.name[k] <- v.name1[i] } } if (k > 0) v.name <- v.name[1:k] else v.name <- rep("", k) } p.name <- all.vars(pf[-2]) vs <- list() v.n <- length(v.name) if (v.n > 0) { for (i in 1:v.n) { if (v.name[i] %in% p.name) { para <- TRUE } else { para <- FALSE } if (para && is.matrix(dl[[v.name[i]]]) && ncol(dl[[v.name[i]]]) > 1) { x <- matrix( apply( dl[[v.name[i]]], 2, quantile, probs = 0.5, type = 3, na.rm = TRUE ), 1, ncol(dl[[v.name[i]]]) ) } else { x <- dl[[v.name[i]]] if (is.character(x)) { x <- as.factor(x) } if (is.factor(x)) { x <- x[!is.na(x)] lx <- levels(x) freq <- tabulate(x) ii <- min((1:length(lx))[freq == max(freq)]) x <- factor(lx[ii], levels = lx) } else { x <- as.numeric(x) x <- c( min(x, na.rm = TRUE), as.numeric(quantile(x, probs = 0.5, type = 3, na.rm = TRUE)), max(x, na.rm = TRUE) ) } } vs[[v.name[i]]] <- x } } vs } #' @importFrom stats lm #' @noRd initial_spg <- function( x, y, weights, family, S, rank, off, offset = NULL, L = NULL, lsp0 = NULL, type = 1, start = NULL, mustart = NULL, etastart = NULL, E = NULL, ... ) { if (length(S) == 0) { return(rep(0, 0)) } nobs <- nrow(x) if (is.null(mustart)) { mukeep <- NULL } else { mukeep <- mustart } eval(family$initialize) if (inherits(family, "general.family")) { lbb <- family$ll( y, x, start, weights, family, offset = offset, deriv = 1 )$lbb pcount <- rep(0, ncol(lbb)) for (i in 1:length(S)) { ind <- off[i]:(off[i] + ncol(S[[i]]) - 1) dlb <- -diag(lbb[ind, ind, drop = FALSE]) indp <- rowSums(abs(S[[i]])) > max(S[[i]]) * .Machine$double.eps^0.75 & dlb != 0 ind <- ind[indp] pcount[ind] <- pcount[ind] + 1 } lambda <- rep(0, length(S)) for (i in 1:length(S)) { ind <- off[i]:(off[i] + ncol(S[[i]]) - 1) lami <- 1 dlb <- abs(diag(lbb[ind, ind, drop = FALSE])) dS <- diag(S[[i]]) pc <- pcount[ind] ind <- rowSums(abs(S[[i]])) > max(S[[i]]) * .Machine$double.eps^0.75 & dlb != 0 dlb <- dlb[ind] / pc[ind] dS <- dS[ind] rm <- max(length(dS) / rank[i], 1) while ( sqrt( mean(dlb / (dlb + lami * dS * rm)) * mean(dlb) / mean( dlb + lami * dS * rm ) ) > 0.4 ) { lami <- lami * 5 } while ( sqrt( mean(dlb / (dlb + lami * dS * rm)) * mean(dlb) / mean( dlb + lami * dS * rm ) ) < 0.4 ) { lami <- lami / 5 } lambda[i] <- lami } } else { if (is.null(mukeep)) { if (!is.null(start)) { etastart <- drop(x %*% start) } if (!is.null(etastart)) mustart <- family$linkinv(etastart) } else { mustart <- mukeep } if (inherits(family, "extended.family")) { theta <- family$getTheta() Ddo <- family$Dd(y, mustart, theta, weights) mu.eta2 <- family$mu.eta(family$linkfun(mustart))^2 w <- 0.5 * as.numeric(Ddo$Dmu2 * mu.eta2) if (any(w < 0)) w <- 0.5 * as.numeric(Ddo$EDmu2 * mu.eta2) } else { w <- as.numeric( weights * family$mu.eta(family$linkfun(mustart))^2 / family$variance(mustart) ) } w <- sqrt(w) if (type == 1) { lambda <- mgcv::initial.sp(w * x, S, off) } else { csX <- colSums((w * x)^2) lambda <- rep(0, length(S)) for (i in 1:length(S)) { ind <- off[i]:(off[i] + ncol(S[[i]]) - 1) lambda[i] <- sum(csX[ind]) / sqrt(sum(S[[i]]^2)) } } } if (!is.null(L)) { lsp <- log(lambda) if (is.null(lsp0)) { lsp0 <- rep(0, nrow(L)) } lsp <- as.numeric(coef(lm(lsp ~ L - 1 + offset(lsp0)))) lambda <- exp(lsp) } lambda } #' Set up JAGS data and model file for fitting GAMs #' #' This function is derived from \code{mgcv:::jagam} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @importFrom mgcv gam.control interpret.gam #' @noRd jagam <- function( formula, family = gaussian, data = list(), file, weights = NULL, na.action, offset = NULL, knots = NULL, sp = NULL, drop.unused.levels = FALSE, control = mgcv::gam.control(), centred = TRUE, diagonalize = FALSE ) { ## Start the model specification cat("model {\n", file = file) sp.prior <- 'gamma' # Evaluate family if (is.character(family)) { family <- eval(parse(text = family)) } if (is.function(family)) { family <- family() } if (is.null(family$family)) { stop("family not recognized") } # Interpret the formula and initialize the model.frame object gp <- mgcv::interpret.gam(formula) gp$pfok <- 1 cl <- match.call() mf <- match.call(expand.dots = FALSE) mf$formula <- gp$fake.formula mf$family <- mf$knots <- mf$sp <- mf$file <- mf$control <- mf$centred <- mf$sp.prior <- mf$diagonalize <- NULL mf$drop.unused.levels <- drop.unused.levels mf[[1]] <- quote(stats::model.frame) pmf <- mf # Extract fixed effect terms # Multiple formula objects if (is.list(formula)) { environment(formula) <- environment(formula[[1]]) pterms <- list() tlab <- rep("", 0) for (i in 1:length(formula)) { pmf$formula <- gp[[i]]$pf pterms[[i]] <- attr(eval(pmf, parent.frame()), "terms") tlabi <- attr(pterms[[i]], "term.labels") if (i > 1 && length(tlabi) > 0) { tlabi <- paste(tlabi, i - 1, sep = ".") } tlab <- c(tlab, tlabi) } attr(pterms, "term.labels") <- tlab # Single linear predictor case } else { pmf$formula <- gp$pf pmf <- eval(pmf, parent.frame()) pterms <- attr(pmf, "terms") } mf <- eval(mf, parent.frame()) if (nrow(mf) < 2) { stop("Not enough (non-NA) data to do anything meaningful") } terms <- attr(mf, "terms") # Summarize the *raw* input variables vars <- all.vars(gp$fake.formula[-2]) inp <- parse(text = paste("list(", paste(vars, collapse = ","), ")")) if (!is.list(data) && !is.data.frame(data)) { data <- as.data.frame(data) } dl <- eval(inp, data, parent.frame()) rm(data) names(dl) <- vars var.summary <- variable_summary(gp$pf, dl, nrow(mf)) rm(dl) gsname <- if (is.list(formula)) "gam_setup.list" else "gam_setup" G <- do.call( gsname, list( formula = gp, pterms = pterms, data = mf, knots = knots, sp = sp, H = NULL, absorb.cons = TRUE, sparse.cons = FALSE, select = TRUE, idLinksBases = TRUE, scale.penalty = control$scalePenalty ) ) G$model <- mf G$terms <- terms G$family <- family G$call <- cl G$var.summary <- var.summary ## write JAGS code producing linear predictor and linking linear predictor to ## response.... use.weights <- if (is.null(weights)) FALSE else TRUE use.weights <- write_jagslp( "y", family = poisson(), file, use.weights, !is.null(G$offset) ) if (is.null(weights) && use.weights) { weights <- rep(1, nrow(G$X)) } ## start the JAGS data list... jags.stuff <- list(y = G$y, n = length(G$y), X = G$X) if (!is.null(G$offset)) { jags.stuff$offset <- G$offset } if (use.weights) { jags.stuff$w <- weights } if (family$family == "binomial") { jags.stuff$y <- G$y * weights } ## set the fixed effect priors... lambda <- rep(1, length(G$S)) jags.ini <- list() jags.ini$b <- rep(0, NCOL(G$X)) prior.tau <- 10 ptau <- 10 if (is.list(formula)) { for (i in 1:length(G$nsdf)) { if (G$nsdf[i] > 0) { if (i == 1) { cat( " ## Parametric effect priors CHECK tau=1/", signif(1 / sqrt(ptau), 2), "^2 is appropriate!\n", file = file, append = TRUE, sep = "" ) cat( " for (i in 1:", G$nsdf[i], ") { b[i] ~ dnorm(0,", ptau, ") }\n", file = file, append = TRUE, sep = "" ) } else { cat( " ## Parametric effect priors CHECK tau=1/", signif(1 / sqrt(ptau), 2), "^2 is appropriate!\n", file = file, append = TRUE, sep = "" ) cat( " for (i in ", attr(G$nsdf, 'pstart')[i], ':', attr(G$nsdf, 'pstart')[i] + G$nsdf[i], ") { b[i] ~ dnorm(0,", ptau, ") }\n", file = file, append = TRUE, sep = "" ) } } } } else { if (sum(G$nsdf) > 0) { cat( " ## Parametric effect priors CHECK tau=1/", signif(1 / sqrt(ptau), 2), "^2 is appropriate!\n", file = file, append = TRUE, sep = "" ) cat( " for (i in 1:", sum(G$nsdf), ") { b[i] ~ dnorm(0,", ptau, ") }\n", file = file, append = TRUE, sep = "" ) } } ## Work through smooths. n.sp <- 0 ## count the smoothing parameters.... for (i in 1:length(G$smooth)) { ## Are penalties seperable... seperable <- FALSE M <- length(G$smooth[[i]]$S) p <- G$smooth[[i]]$last.para - G$smooth[[i]]$first.para + 1 ## number of params if (M <= 1) { seperable <- TRUE } else { overlap <- rowSums(G$smooth[[i]]$S[[1]]) for (j in 2:M) { overlap <- overlap & rowSums(G$smooth[[i]]$S[[j]]) } if (!sum(overlap)) seperable <- TRUE } if (seperable) { ## double check that they are diagonal if (M > 0) { for (j in 1:M) { if ( max(abs( G$smooth[[i]]$S[[j]] - diag(diag(G$smooth[[i]]$S[[j]]), nrow = p) )) > 0 ) { seperable <- FALSE } } } } cat( " ## prior for ", G$smooth[[i]]$label, "... \n", file = file, append = TRUE, sep = "" ) if (seperable) { b0 <- G$smooth[[i]]$first.para b1 <- G$smooth[[i]]$last.para if (M == 0) { cat( " ## Note fixed vague prior, CHECK tau = 1/", signif(1 / sqrt(ptau), 2), "^2...\n", file = file, append = TRUE, sep = "" ) #b1 <- G$smooth[[i]]$last.para ptau <- min(prior.tau[b0:b1]) cat( " for (i in ", b0, ":", b1, ") { b[i] ~ dnorm(0,", ptau, ") }\n", file = file, append = TRUE, sep = "" ) } else { for (j in 1:M) { D <- diag(G$smooth[[i]]$S[[j]]) > 0 #b1 <- sum(as.numeric(D)) + b0 - 1 n.sp <- n.sp + 1 #cat(" for (i in ",b0,":",b1,") { b[i] ~ dnorm(0, lambda[",n.sp,"]) }\n",file=file,append=TRUE,sep="") #b0 <- b1 + 1 cat( " for (i in ", compress_iseq((b0:b1)[D]), ") { b[i] ~ dnorm(0, lambda[", n.sp, "]) }\n", file = file, append = TRUE, sep = "" ) } } } else { ## inseperable - requires the penalty matrices to be supplied to JAGS... b0 <- G$smooth[[i]]$first.para b1 <- G$smooth[[i]]$last.para Kname <- paste("K", i, sep = "") ## total penalty matrix in JAGS Sname <- paste("S", i, sep = "") ## components of total penalty in R & JAGS cat( " ", Kname, " <- ", Sname, "[1:", p, ",1:", p, "] * lambda[", n.sp + 1, "] ", file = file, append = TRUE, sep = "" ) if (M > 1) { ## code to form total precision matrix... for (j in 2:M) { cat( " + ", Sname, "[1:", p, ",", (j - 1) * p + 1, ":", j * p, "] * lambda[", n.sp + j, "]", file = file, append = TRUE, sep = "" ) } } cat( "\n b[", b0, ":", b1, "] ~ dmnorm(zero[", b0, ":", b1, "],", Kname, ") \n", file = file, append = TRUE, sep = "" ) n.sp <- n.sp + M Sc <- G$smooth[[i]]$S[[1]] if (M > 1) { for (j in 2:M) { Sc <- cbind(Sc, G$smooth[[i]]$S[[j]]) } } jags.stuff[[Sname]] <- Sc jags.stuff$zero <- rep(0, ncol(G$X)) } } ## smoothing penalties finished ## Write the smoothing parameter prior code, using L if it exists. cat( " ## smoothing parameter priors CHECK...\n", file = file, append = TRUE, sep = "" ) if (is.null(G$L)) { if (sp.prior == "log.uniform") { cat(" for (i in 1:", n.sp, ") {\n", file = file, append = TRUE, sep = "") cat(" rho[i] ~ dunif(-12,12)\n", file = file, append = TRUE, sep = "") cat( " lambda[i] <- exp(rho[i])\n", file = file, append = TRUE, sep = "" ) cat(" }\n", file = file, append = TRUE, sep = "") jags.ini$rho <- log(lambda) } else { ## gamma priors cat(" for (i in 1:", n.sp, ") {\n", file = file, append = TRUE, sep = "") cat( " lambda[i] ~ dgamma(.05,.005)\n", file = file, append = TRUE, sep = "" ) cat( " rho[i] <- log(lambda[i])\n", file = file, append = TRUE, sep = "" ) cat(" }\n", file = file, append = TRUE, sep = "") jags.ini$lambda <- lambda } } else { jags.stuff$L <- G$L rho.lo <- FALSE if (any(G$lsp0 != 0)) { jags.stuff$rho.lo <- G$lsp0 rho.lo <- TRUE } nr <- ncol(G$L) if (sp.prior == "log.uniform") { cat( " for (i in 1:", nr, ") { rho0[i] ~ dunif(-12,12) }\n", file = file, append = TRUE, sep = "" ) if (rho.lo) { cat( " rho <- rho.lo + L %*% rho0\n", file = file, append = TRUE, sep = "" ) } else { cat(" rho <- L %*% rho0\n", file = file, append = TRUE, sep = "") } cat( " for (i in 1:", n.sp, ") { lambda[i] <- exp(rho[i]) }\n", file = file, append = TRUE, sep = "" ) jags.ini$rho0 <- log(lambda) } else { ## gamma prior cat(" for (i in 1:", nr, ") {\n", file = file, append = TRUE, sep = "") cat( " lambda0[i] ~ dgamma(.05,.005)\n", file = file, append = TRUE, sep = "" ) cat( " rho0[i] <- log(lambda0[i])\n", file = file, append = TRUE, sep = "" ) cat(" }\n", file = file, append = TRUE, sep = "") if (rho.lo) { cat( " rho <- rho.lo + L %*% rho0\n", file = file, append = TRUE, sep = "" ) } else { cat(" rho <- L %*% rho0\n", file = file, append = TRUE, sep = "") } cat( " for (i in 1:", n.sp, ") { lambda[i] <- exp(rho[i]) }\n", file = file, append = TRUE, sep = "" ) jags.ini$lambda0 <- lambda } } cat("}", file = file, append = TRUE) G$formula = formula G$rank = ncol(G$X) ## to Gibbs sample we force full rank! list(pregam = G, jags.data = jags.stuff, jags.ini = jags.ini) } ## new_jagam #' Initialize a gam object using a list of formulae #' #' This function is derived from \code{mgcv:::gam.setup.list} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @noRd gam_setup.list <- function( formula, pterms, data = stop("No data supplied to gam.setup"), knots = NULL, sp = NULL, min.sp = NULL, H = NULL, absorb.cons = TRUE, sparse.cons = 0, select = FALSE, idLinksBases = TRUE, scale.penalty = TRUE, paraPen = NULL, gamm.call = FALSE, drop.intercept = NULL, apply.by = TRUE, modCon = 0 ) { # version of gam.setup for when gam is called with a list of formulae, # specifying several linear predictors... # key difference to gam.setup is an attribute to the model matrix, # "lpi", which is a list # of column indices for each linear predictor d <- length(pterms) if (is.null(drop.intercept)) { drop.intercept <- rep(FALSE, d) } if (length(drop.intercept) != d) { stop("length(drop.intercept) should be equal to number of model formulas") } lp.overlap <- if (formula$nlp < d) TRUE else FALSE G <- gam_setup( formula[[1]], pterms[[1]], data, knots, sp, min.sp, H, absorb.cons, sparse.cons, select, idLinksBases, scale.penalty, paraPen, gamm.call, drop.intercept[1], apply.by = apply.by, list.call = TRUE, modCon = modCon ) G$pterms <- pterms G$offset <- list(G$offset) G$xlevels <- list(G$xlevels) G$assign <- list(G$assign) used.sp <- length(G$lsp0) if (!is.null(sp) && used.sp > 0) { sp <- sp[-(1:used.sp)] } if (!is.null(min.sp) && nrow(G$L) > 0) { min.sp <- min.sp[-(1:nrow(G$L))] } flpi <- lpi <- list() for (i in 1:formula$nlp) { lpi[[i]] <- rep(0, 0) } lpi[[1]] <- 1:ncol(G$X) flpi[[1]] <- formula[[1]]$lpi pof <- ncol(G$X) pstart <- rep(0, d) pstart[1] <- 1 if (d > 1) { for (i in 2:d) { if (is.null(formula[[i]]$response)) { formula[[i]]$response <- formula$response mv.response <- FALSE } else { mv.response <- TRUE } formula[[i]]$pfok <- 1 um <- gam_setup( formula[[i]], pterms[[i]], data, knots, sp, min.sp, H, absorb.cons, sparse.cons, select, idLinksBases, scale.penalty, paraPen, gamm.call, drop.intercept[i], apply.by = apply.by, list.call = TRUE, modCon = modCon ) used.sp <- length(um$lsp0) if (!is.null(sp) && used.sp > 0) { sp <- sp[-(1:used.sp)] } if (!is.null(min.sp) && nrow(um$L) > 0) { min.sp <- min.sp[-(1:nrow(um$L))] } flpi[[i]] <- formula[[i]]$lpi for (j in formula[[i]]$lpi) { lpi[[j]] <- c(lpi[[j]], pof + 1:ncol(um$X)) } if (mv.response) { G$y <- cbind(G$y, um$y) } if (i > formula$nlp && !is.null(um$offset)) { stop("shared offsets not allowed") } G$offset[[i]] <- um$offset if (!is.null(um$contrasts)) { G$contrasts <- c(G$contrasts, um$contrasts) } G$xlevels[[i]] <- um$xlevels G$assign[[i]] <- um$assign G$rank <- c(G$rank, um$rank) pstart[i] <- pof + 1 G$X <- cbind(G$X, um$X) k <- G$m if (um$m) { for (j in 1:um$m) { um$smooth[[j]]$first.para <- um$smooth[[j]]$first.para + pof um$smooth[[j]]$last.para <- um$smooth[[j]]$last.para + pof k <- k + 1 G$smooth[[k]] <- um$smooth[[j]] } } ks <- length(G$S) M <- length(um$S) if (!is.null(um$L) || !is.null(G$L)) { if (is.null(G$L)) { G$L <- diag(1, nrow = ks) } if (is.null(um$L)) { um$L <- diag(1, nrow = M) } G$L <- rbind( cbind(G$L, matrix(0, nrow(G$L), ncol(um$L))), cbind(matrix(0, nrow(um$L), ncol(G$L)), um$L) ) } G$off <- c(G$off, um$off + pof) if (M) { for (j in 1:M) { ks <- ks + 1 G$S[[ks]] <- um$S[[j]] } } G$m <- G$m + um$m G$nsdf[i] <- um$nsdf if (!is.null(um$P) || !is.null(G$P)) { if (is.null(G$P)) { G$P <- diag(1, nrow = pof) } k <- ncol(um$X) if (is.null(um$P)) { um$P <- diag(1, nrow = k) } G$P <- rbind( cbind(G$P, matrix(0, pof, k)), cbind(matrix(0, k, pof), um$P) ) } G$cmX <- c(G$cmX, um$cmX) if (um$nsdf > 0) { um$term.names[1:um$nsdf] <- paste( um$term.names[1:um$nsdf], i - 1, sep = "." ) } G$term.names <- c(G$term.names, um$term.names) G$lsp0 <- c(G$lsp0, um$lsp0) G$sp <- c(G$sp, um$sp) pof <- ncol(G$X) } } ## If there is overlap then there is a danger of lack of identifiability of the ## parameteric terms, especially if there are factors present in shared components. ## The following code deals with this possibility... if (lp.overlap) { rt <- olid(G$X, G$nsdf, pstart, flpi, lpi) if (length(rt$dind) > 0) { warning( "dropping unidentifiable parametric terms from model", call. = FALSE ) G$X <- G$X[, -rt$dind] G$cmX <- G$cmX[-rt$dind] G$term.names <- G$term.names[-rt$dind] for (i in 1:length(G$smooth)) { k <- sum(rt$dind < G$smooth[[i]]$first.para) G$smooth[[i]]$first.para <- G$smooth[[i]]$first.para - k G$smooth[[i]]$last.para <- G$smooth[[i]]$last.para - k } for (i in 1:length(G$off)) { G$off[i] <- G$off[i] - sum(rt$dind < G$off[i]) } attr(G$nsdf, "drop.ind") <- rt$dind ## store drop index } } attr(lpi, "overlap") <- lp.overlap attr(G$X, "lpi") <- lpi attr(G$nsdf, "pstart") <- pstart G$g.index <- rep(FALSE, ncol(G$X)) n.sp0 <- 0 if (length(G$smooth)) { for (i in 1:length(G$smooth)) { if (!is.null(G$smooth[[i]]$g.index)) { G$g.index[ G$smooth[[i]]$first.para:G$smooth[[i]]$last.para ] <- G$smooth[[i]]$g.index } n.sp <- length(G$smooth[[i]]$S) if (n.sp) { G$smooth[[i]]$first.sp <- n.sp0 + 1 n.sp0 <- G$smooth[[i]]$last.sp <- n.sp0 + n.sp } } } if (!any(G$g.index)) { G$g.index <- NULL } G } #' Takes a set of non-negative integers and returns minimal code for generating it #' #' This function is derived from \code{mgcv:::compress.iseq} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @noRd compress_iseq <- function(x) { x1 <- sort(x) br <- diff(x1) != 1 ## TRUE at sequence breaks txt <- paste(x1[c(TRUE, br)], x1[c(br, TRUE)], sep = ":") ## subsequences txt1 <- paste(x1[c(TRUE, br)]) ## subseq starts ii <- x1[c(TRUE, br)] == x1[c(br, TRUE)] ## index start and end equal txt[ii] <- txt1[ii] ## replace length on sequences with integers paste("c(", paste(txt, collapse = ","), ")", sep = "") } #' Returns a vector dind of columns of X to drop for identifiability #' #' This function is derived from \code{mgcv:::olid} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @noRd olid <- function(X, nsdf, pstart, flpi, lpi) { nlp <- length(lpi) ## number of linear predictors n <- nrow(X) nf <- length(nsdf) ## number of formulae blocks Xp <- matrix(0, n * nlp, sum(nsdf)) start <- 1 ii <- 1:n tind <- rep(0, 0) ## complete index of all parametric columns in X ## create a block matrix, Xp, with the same identifiability properties as ## unpenalized part of model... for (i in 1:nf) { stop <- start - 1 + nsdf[i] if (stop >= start) { ind <- pstart[i] + 1:nsdf[i] - 1 for (k in flpi[[i]]) { Xp[ii + (k - 1) * n, start:stop] <- X[, ind] } tind <- c(tind, ind) start <- start + nsdf[i] } } ## rank deficiency of Xp will reveal number of redundant parametric ## terms, and a pivoted QR will reveal which to drop to restore ## full rank... qrx <- qr(Xp, LAPACK = TRUE, tol = 0.0) ## unidentifiable columns get pivoted to final cols r <- mgcv::Rrank(qr.R(qrx)) ## get rank from R factor of pivoted QR if (r == ncol(Xp)) { ## full rank, all fine, drop nothing dind <- rep(0, 0) } else { ## reduced rank, drop some columns dind <- tind[sort(qrx$pivot[(r + 1):ncol(X)], decreasing = TRUE)] ## columns to drop ## now we need to adjust nsdf, pstart and lpi for (d in dind) { ## working down through drop indices ## following commented out code is useful should it ever prove necessary to ## adjust pstart and nsdf, but at present these are only used in prediction, ## and it is cleaner to leave them unchanged, and simply drop using dind during prediction. #k <- if (d>=pstart[nf]) nlp else which(d >= pstart[1:(nf-1)] & d < pstart[2:nf]) #nsdf[k] <- nsdf[k] - 1 ## one less unpenalized column in this block #if (k 0) { lpi[[i]] <- lpi[[i]][-k] } ## drop row k <- which(lpi[[i]] > d) if (length(k) > 0) lpi[[i]][k] <- lpi[[i]][k] - 1 ## close up } } ## end of drop index loop } list(dind = dind, lpi = lpi) ##,pstart=pstart,nsdf=nsdf) } #' Write linear predictor section of a jagam file #' #' This function is derived from \code{mgcv:::write.jagslp} #' #' @author Simon N Wood with modifications by Nicholas Clark #' @noRd write_jagslp <- function(resp, family, file, use.weights, offset = FALSE) { ## write the JAGS code for the linear predictor ## and response distribution. iltab <- ## table of inverse link functions c( "eta[i]", "exp(eta[i])", "ilogit(eta[i])", "phi(eta[i])", "1/eta[i]", "eta[i]^2" ) names(iltab) <- c("identity", "log", "logit", "probit", "inverse", "sqrt") if (!family$link %in% names(iltab)) { stop("sorry link not yet handled") } ## code linear predictor and expected response... if (family$link == "identity") { if (offset) { cat( " mu <- X %*% b + offset ## expected response\n", file = file, append = TRUE ) } else { cat(" mu <- X %*% b ## expected response\n", file = file, append = TRUE) } } else { if (offset) { cat( " eta <- X %*% b + offset ## linear predictor\n", file = file, append = TRUE ) } else { cat(" eta <- X %*% b ## linear predictor\n", file = file, append = TRUE) } cat( " for (i in 1:n) { mu[i] <- ", iltab[family$link], "} ## expected response\n", file = file, append = TRUE ) } ## code the response given mu and any scale parameter prior... #scale <- TRUE ## is scale parameter free? cat(" for (i in 1:n) { ", file = file, append = TRUE) if (family$family == "gaussian") { if (use.weights) { cat( resp, "[i] ~ dnorm(mu[i],tau*w[i]) } ## response \n", sep = "", file = file, append = TRUE ) } else { cat( resp, "[i] ~ dnorm(mu[i],tau) } ## response \n", sep = "", file = file, append = TRUE ) } cat( " scale <- 1/tau ## convert tau to standard GLM scale\n", file = file, append = TRUE ) cat( " tau ~ dgamma(.05,.005) ## precision parameter prior \n", file = file, append = TRUE ) } else if (family$family == "poisson") { # scale <- FALSE cat( resp, "[i] ~ dpois(mu[i]) } ## response \n", sep = "", file = file, append = TRUE ) if (use.weights) { warning("weights ignored") } use.weights <- FALSE } else if (family$family == "binomial") { # scale <- FALSE cat( resp, "[i] ~ dbin(mu[i],w[i]) } ## response \n", sep = "", file = file, append = TRUE ) use.weights <- TRUE } else if (family$family == "Gamma") { if (use.weights) { cat( resp, "[i] ~ dgamma(r*w[i],r*w[i]/mu[i]) } ## response \n", sep = "", file = file, append = TRUE ) } else { cat( resp, "[i] ~ dgamma(r,r/mu[i]) } ## response \n", sep = "", file = file, append = TRUE ) } cat( " r ~ dgamma(.05,.005) ## scale parameter prior \n", file = file, append = TRUE ) cat( " scale <- 1/r ## convert r to standard GLM scale\n", file = file, append = TRUE ) } else { stop("family not implemented yet") } use.weights } ================================================ FILE: R/mvgam_trend_types.R ================================================ #' Specify autoregressive dynamic processes in \pkg{mvgam} #' #' Set up autoregressive or autoregressive moving average trend models in #' \pkg{mvgam}. These functions do not evaluate their arguments – they exist #' purely to help set up a model with particular autoregressive trend models. #' #' @param ma \code{Logical}. Include moving average terms of order \code{1}? #' Default is \code{FALSE}. #' #' @param cor \code{Logical}. Include correlated process errors as part of a #' multivariate normal process model? If \code{TRUE} and if #' \code{n_series > 1} in the supplied data, a fully structured covariance #' matrix will be estimated for the process errors. Default is \code{FALSE}. #' #' @param p A non-negative integer specifying the autoregressive (AR) order. #' Default is \code{1}. Cannot currently be larger than \code{3} for `AR` #' terms, and cannot be anything other than `1` for continuous time AR #' (`CAR`) terms. #' #' @param gr An optional grouping variable, which must be a `factor` in the #' supplied `data`, for setting up hierarchical residual correlation #' structures. If specified, this will automatically set `cor = TRUE` and set #' up a model where the residual correlations for a specific level of `gr` #' are modelled hierarchically: #' #' \eqn{\Omega_{group} = \alpha_{cor}\Omega_{global} + #' (1 - \alpha_{cor})\Omega_{group, local}}, #' #' where \eqn{\Omega_{global}} is a *global* correlation matrix, #' \eqn{\Omega_{group, local}} is a *local deviation* correlation matrix and #' \eqn{\alpha_{cor}} is a weighting parameter controlling how strongly the #' local correlation matrix \eqn{\Omega_{group}} is shrunk towards the global #' correlation matrix \eqn{\Omega_{global}} (larger values of #' \eqn{\alpha_{cor}} indicate a greater degree of shrinkage, i.e. a greater #' degree of partial pooling). #' #' When used within a `VAR()` model, this essentially sets up a hierarchical #' panel vector autoregression where both the autoregressive and correlation #' matrices are learned hierarchically. If `gr` is supplied then `subgr` #' *must* also be supplied. #' #' @param subgr A subgrouping `factor` variable specifying which element in #' `data` represents the different time series. Defaults to `series`, but #' note that models that use the hierarchical correlations, where the #' `subgr` time series are measured in each level of `gr`, *should not* #' include a `series` element in `data`. Rather, this element will be created #' internally based on the supplied variables for `gr` and `subgr`. #' #' For example, if you are modelling temporal counts for a group of species #' (labelled as `species` in `data`) across three different geographical #' regions (labelled as `region`), and you would like the residuals to be #' correlated within regions, then you should specify `gr = region` and #' `subgr = species`. Internally, `mvgam()` will create the `series` element #' for the data using: #' #' `series = interaction(group, subgroup, drop = TRUE)` #' #' @return An object of class \code{mvgam_trend}, which contains a list of #' arguments to be interpreted by the parsing functions in \pkg{mvgam}. #' #' @rdname RW #' #' @details Use `vignette("mvgam_overview")` to see the full details of #' available stochastic trend types in \pkg{mvgam}, or view the rendered #' version on the package website at: #' https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # A short example to illustrate CAR(1) models #' # Function to simulate CAR1 data with seasonality #' sim_corcar1 = function(n = 125, #' phi = 0.5, #' sigma = 2, #' sigma_obs = 0.75) { #' # Sample irregularly spaced time intervals #' time_dis <- c(1, runif(n - 1, 0, 5)) #' #' # Set up the latent dynamic process #' x <- vector(length = n); x[1] <- -0.3 #' for (i in 2:n) { #' # zero-distances will cause problems in sampling, so mvgam uses a #' # minimum threshold; this simulation function emulates that process #' if (time_dis[i] == 0) { #' x[i] <- rnorm( #' 1, #' mean = (phi^1e-3) * x[i - 1], #' sd = sigma * (1 - phi^(2 * 1e-3)) / (1 - phi^2) #' ) #' } else { #' x[i] <- rnorm( #' 1, #' mean = (phi^time_dis[i]) * x[i - 1], #' sd = sigma * (1 - phi^(2 * time_dis[i])) / (1 - phi^2) #' ) #' } #' } #' #' # Add 12-month seasonality #' cov1 <- sin(2 * pi * (1:n) / 12) #' cov2 <- cos(2 * pi * (1:n) / 12) #' beta1 <- runif(1, 0.3, 0.7) #' beta2 <- runif(1, 0.2, 0.5) #' seasonality <- beta1 * cov1 + beta2 * cov2 #' #' # Take Gaussian observations with error and return #' data.frame( #' y = rnorm(n, mean = x + seasonality, sd = sigma_obs), #' season = rep(1:12, 20)[1:n], #' time = cumsum(time_dis) #' ) #' } #' #' # Sample two time series #' dat <- rbind( #' dplyr::bind_cols( #' sim_corcar1(phi = 0.65, sigma_obs = 0.55), #' data.frame(series = 'series1') #' ), #' dplyr::bind_cols( #' sim_corcar1(phi = 0.8, sigma_obs = 0.35), #' data.frame(series = 'series2') #' ) #' ) %>% #' dplyr::mutate(series = as.factor(series)) #' #' # mvgam with CAR(1) trends and series-level seasonal smooths #' mod <- mvgam( #' formula = y ~ -1, #' trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), #' trend_model = CAR(), #' priors = c( #' prior(exponential(3), class = sigma), #' prior(beta(4, 4), class = sigma_obs) #' ), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' # View usual summaries and plots #' summary(mod) #' conditional_effects(mod, type = 'expected') #' plot(mod, type = 'trend', series = 1) #' plot(mod, type = 'trend', series = 2) #' plot(mod, type = 'residuals', series = 1) #' plot(mod, type = 'residuals', series = 2) #' mcmc_plot( #' mod, #' variable = 'ar1', #' regex = TRUE, #' type = 'hist' #' ) #' #' # Now an example illustrating hierarchical dynamics #' set.seed(123) #' #' # Simulate three species monitored in three different regions #' simdat1 <- sim_mvgam( #' trend_model = VAR(cor = TRUE), #' prop_trend = 0.95, #' n_series = 3, #' mu = c(1, 2, 3) #' ) #' simdat2 <- sim_mvgam( #' trend_model = VAR(cor = TRUE), #' prop_trend = 0.95, #' n_series = 3, #' mu = c(1, 2, 3) #' ) #' simdat3 <- sim_mvgam( #' trend_model = VAR(cor = TRUE), #' prop_trend = 0.95, #' n_series = 3, #' mu = c(1, 2, 3) #' ) #' #' # Set up the data but DO NOT include 'series' #' all_dat <- rbind( #' simdat1$data_train %>% #' dplyr::mutate(region = 'qld'), #' simdat2$data_train %>% #' dplyr::mutate(region = 'nsw'), #' simdat3$data_train %>% #' dplyr::mutate(region = 'vic') #' ) %>% #' dplyr::mutate( #' species = gsub('series', 'species', series), #' species = as.factor(species), #' region = as.factor(region) #' ) %>% #' dplyr::arrange(series, time) %>% #' dplyr::select(-series) #' #' # Check priors for a hierarchical AR1 model #' get_mvgam_priors( #' formula = y ~ species, #' trend_model = AR(gr = region, subgr = species), #' data = all_dat #' ) #' #' # Fit the model #' mod <- mvgam( #' formula = y ~ species, #' trend_model = AR(gr = region, subgr = species), #' data = all_dat, #' chains = 2, #' silent = 2 #' ) #' #' # Check standard outputs #' summary(mod) #' #' # Inspect posterior estimates for the correlation weighting parameter #' mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') #' } #' @export RW = function(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) { # Validate the supplied groupings and correlation argument gr <- deparse0(substitute(gr)) subgr <- deparse0(substitute(subgr)) if (gr == 'NA') { subgr <- 'series' } if (gr != 'NA') { if (subgr == 'NA') { stop( 'argument "subgr" must be supplied if "gr" is also supplied', call. = FALSE ) } else if (subgr == 'series') { stop( 'argument "subgr" cannot be set to "series" if "gr" is also supplied', call. = FALSE ) } else { cor <- TRUE } } out <- structure( list( trend_model = 'RW', ma = ma, cor = cor, unit = 'time', gr = gr, subgr = subgr, label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c( 'trend', 'tau', 'sigma', 'theta', 'Sigma', 'error', 'drift' ), labels = c( 'trend_estimates', 'precision_parameter', 'standard_deviation', 'moving_average_coef', 'covariance_matrix', 'process_errors', 'drift_parameter' ) ) ) } #' @rdname RW #' @export AR = function(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) { validate_pos_integer(p) if (p > 3) { stop("Argument 'p' must be <= 3", call. = FALSE) } # Validate the supplied groupings and correlation argument gr <- deparse0(substitute(gr)) subgr <- deparse0(substitute(subgr)) if (gr == 'NA') { subgr <- 'series' } if (gr != 'NA') { if (subgr == 'NA') { stop( 'argument "subgr" must be supplied if "gr" is also supplied', call. = FALSE ) } else if (subgr == 'series') { stop( 'argument "subgr" cannot be set to "series" if "gr" is also supplied', call. = FALSE ) } else { cor <- TRUE } } # Determine parameter names based on AR order ar_params <- paste0('ar', 1:p) param_names <- c( 'trend', 'tau', 'sigma', ar_params, 'theta', 'Sigma', 'error', 'drift' ) param_labels <- c( 'trend_estimates', 'precision_parameter', 'standard_deviation', paste0('autoregressive_coef_', 1:p), 'moving_average_coef', 'covariance_matrix', 'process_errors', 'drift_parameter' ) out <- structure( list( trend_model = paste0('AR', p), ma = ma, cor = cor, unit = 'time', gr = gr, subgr = subgr, label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = param_names, labels = param_labels ) ) } #' @rdname RW #' @export CAR = function(p = 1) { validate_pos_integer(p) if (p > 1) { stop("Argument 'p' must be = 1", call. = FALSE) } out <- structure( list( trend_model = paste0('CAR', p), ma = FALSE, cor = FALSE, unit = 'time', gr = 'NA', subgr = 'series', label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c('trend', 'tau', 'sigma', 'ar1', 'Sigma'), labels = c( 'trend_estimates', 'precision_parameter', 'standard_deviation', 'autoregressive_coef', 'covariance_matrix' ) ) ) } #' @rdname RW #' @export VAR = function(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) { # Validate the supplied groupings and correlation argument gr <- deparse0(substitute(gr)) subgr <- deparse0(substitute(subgr)) if (gr == 'NA') { subgr <- 'series' } if (gr != 'NA') { if (subgr == 'NA') { stop( 'argument "subgr" must be supplied if "gr" is also supplied', call. = FALSE ) } else if (subgr == 'series') { stop( 'argument "subgr" cannot be set to "series" if "gr" is also supplied', call. = FALSE ) } else { cor <- TRUE } } out <- structure( list( trend_model = 'VAR', ma = ma, cor = cor, unit = 'time', gr = gr, subgr = subgr, label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c( 'trend', 'A', 'Sigma', 'P_real', 'sigma', 'theta', 'error', 'drift' ), labels = c( 'trend_estimates', 'var_coefficient_matrix', 'covariance_matrix', 'stationary_precision', 'standard_deviation', 'moving_average_matrix', 'process_errors', 'drift_parameter' ) ) ) } #' Specify dynamic Gaussian process trends in \pkg{mvgam} models #' #' Set up low-rank approximate Gaussian Process trend models using Hilbert #' basis expansions in \pkg{mvgam}. This function does not evaluate its #' arguments – it exists purely to help set up a model with particular GP #' trend models. #' #' @param ... unused #' #' @return An object of class \code{mvgam_trend}, which contains a list of #' arguments to be interpreted by the parsing functions in \pkg{mvgam}. #' #' @details A GP trend is estimated for each series using Hilbert space #' approximate Gaussian Processes. In `mvgam`, latent squared exponential GP #' trends are approximated using by default \code{20} basis functions and #' using a multiplicative factor of `c = 5/4`, which saves computational #' costs compared to fitting full GPs while adequately estimating GP #' \code{alpha} and \code{rho} parameters. #' #' @rdname GP #' #' @author Nicholas J Clark #' #' @references Riutort-Mayol G, Burkner PC, Andersen MR, Solin A and Vehtari A #' (2023). Practical Hilbert space approximate Bayesian Gaussian processes for #' probabilistic programming. Statistics and Computing 33, 1. #' https://doi.org/10.1007/s11222-022-10167-2 #' #' @seealso \code{\link[brms]{gp}} #' #' @export GP = function(...) { out <- structure( list( trend_model = 'GP', ma = FALSE, cor = FALSE, unit = 'time', gr = 'NA', subgr = 'series', label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c('trend', 'alpha_gp', 'rho_gp', 'b_gp'), labels = c( 'trend_estimates', 'marginal_deviation', 'length_scale', 'basis_coefficients' ) ) ) } #' Specify piecewise linear or logistic trends in \pkg{mvgam} models #' #' Set up piecewise linear or logistic trend models in \code{mvgam}. These #' functions do not evaluate their arguments – they exist purely to help set up #' a model with particular piecewise trend models. #' #' @param n_changepoints A non-negative integer specifying the number of #' potential changepoints. Potential changepoints are selected uniformly from #' the first `changepoint_range` proportion of timepoints in \code{data}. #' Default is `10`. #' #' @param changepoint_range Proportion of history in \code{data} in which trend #' changepoints will be estimated. Defaults to `0.8` for the first 80%. #' #' @param changepoint_scale Parameter modulating the flexibility of the #' automatic changepoint selection by altering the scale parameter of a #' Laplace distribution. The resulting prior will be #' `double_exponential(0, changepoint_scale)`. Large values will allow many #' changepoints and a more flexible trend, while small values will allow few #' changepoints. Default is `0.05`. #' #' @param growth Character string specifying either `'linear'` or `'logistic'` #' growth of the trend. If `'logistic'`, a variable labelled `cap` MUST be in #' \code{data} to specify the maximum saturation point for the trend (see #' details and examples in \code{\link{mvgam}} for more information). Default #' is `'linear'`. #' #' @author Nicholas J Clark #' #' @references Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." #' The American Statistician 72.1 (2018): 37–45. #' #' @return An object of class \code{mvgam_trend}, which contains a list of #' arguments to be interpreted by the parsing functions in \code{mvgam}. #' #' @details #' *Offsets and intercepts*: #' For each of these trend models, an offset parameter is included in the trend #' estimation process. This parameter will be incredibly difficult to identify #' if you also include an intercept in the observation formula. For that #' reason, it is highly recommended that you drop the intercept from the #' formula (i.e. `y ~ x + 0` or `y ~ x - 1`, where `x` are your optional #' predictor terms). #' #' *Logistic growth and the cap variable*: #' When forecasting growth, there is often some maximum achievable point that a #' time series can reach. For example, total market size, total population size #' or carrying capacity in population dynamics. It can be advantageous for the #' forecast to saturate at or near this point so that predictions are more #' sensible. #' #' This function allows you to make forecasts using a logistic growth trend #' model, with a specified carrying capacity. Note that this capacity does not #' need to be static over time; it can vary with each series × timepoint #' combination if necessary. But you must supply a `cap` value for each #' observation in the data when using `growth = 'logistic'`. #' #' For observation families that use a non-identity link function, the `cap` #' value will be internally transformed to the link scale (i.e. your specified #' `cap` will be log-transformed if you are using a `poisson()` or `nb()` #' family). It is therefore important that you specify the `cap` values on the #' scale of your outcome. Note also that no missing values are allowed in #' `cap`. #' #' @rdname piecewise_trends #' #' @examples #' \dontrun{ #' # Example of logistic growth with possible changepoints #' dNt = function(r, N, k) { #' r * N * (k - N) #' } #' #' Nt = function(r, N, t, k) { #' for (i in 1:(t - 1)) { #' if (i %in% c(5, 15, 25, 41, 45, 60, 80)) { #' N[i + 1] <- max( #' 1, #' N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k) #' ) #' } else { #' N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) #' } #' } #' N #' } #' #' set.seed(11) #' expected <- Nt(0.004, 2, 100, 30) #' plot(expected, xlab = 'Time') #' #' y <- rpois(100, expected) #' plot(y, xlab = 'Time') #' #' mod_data <- data.frame( #' y = y, #' time = 1:100, #' cap = 35, #' series = as.factor('series_1') #' ) #' plot_mvgam_series(data = mod_data) #' #' mod <- mvgam( #' y ~ 0, #' trend_model = PW(growth = 'logistic'), #' family = poisson(), #' data = mod_data, #' chains = 2, #' silent = 2 #' ) #' summary(mod) #' #' hc <- hindcast(mod) #' plot(hc) #' #' library(ggplot2) #' mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + #' scale_y_discrete(labels = mod$trend_model$changepoints) + #' labs( #' y = 'Potential changepoint', #' x = 'Rate change' #' ) #' #' how_to_cite(mod) #' } #' #' @export PW = function( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = 'linear' ) { growth <- match.arg(growth, choices = c('linear', 'logistic')) validate_proportional(changepoint_range) validate_pos_integer(n_changepoints) validate_pos_real(changepoint_scale) trend_model <- 'PWlinear' if (growth == 'logistic') { trend_model = 'PWlogistic' } out <- structure( list( trend_model = trend_model, n_changepoints = n_changepoints, changepoint_range = changepoint_range, changepoint_scale = changepoint_scale, ma = FALSE, cor = FALSE, unit = 'time', gr = 'NA', subgr = 'series', label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c('trend', 'delta_trend', 'k_trend', 'm_trend'), labels = c( 'trend_estimates', 'rate_changes', 'growth_rate', 'offset_parameter' ) ) ) } #' Specify correlated residual processes in \pkg{mvgam} #' #' Set up latent correlated multivariate Gaussian residual processes in #' \pkg{mvgam}. This function does not evaluate its arguments – it exists #' purely to help set up a model with particular error processes #' #' @param unit The unquoted name of the variable that represents the unit of #' analysis in `data` over which latent residuals should be correlated. This #' variable should be either a `numeric` or `integer` variable in the #' supplied `data`. Defaults to `time` to be consistent with other #' functionalities in \pkg{mvgam}, though note that the data need not be time #' series in this case. See examples below for further details and #' explanations #' #' @param gr An optional grouping variable, which must be a `factor` in the #' supplied `data`, for setting up hierarchical residual correlation #' structures. If specified, this will automatically set up a model where the #' residual correlations for a specific level of `gr` are modelled #' hierarchically: #' #' \eqn{\Omega_{group} = p\Omega_{global} + (1 - p)\Omega_{group, local}}, #' #' where \eqn{\Omega_{global}} is a *global* correlation matrix, #' \eqn{\Omega_{group, local}} is a *local deviation* correlation matrix, and #' \eqn{p} is a weighting parameter controlling how strongly the local #' correlation matrix \eqn{\Omega_{group}} is shrunk towards the global #' correlation matrix \eqn{\Omega_{global}}. If `gr` is supplied then `subgr` #' *must* also be supplied #' #' @param subgr A subgrouping `factor` variable specifying which element in #' `data` represents the different observational units. Defaults to `series` #' to be consistent with other functionalities in \pkg{mvgam}, though note #' that the data need not be time series in this case #' #' Models that use the hierarchical correlations (by supplying a value for #' `gr`) *should not* include a `series` element in `data`. Rather, this #' element will be created internally based on the supplied variables for `gr` #' and `subgr` #' #' For example, if you are modelling counts for a group of species (labelled #' as `species` in the data) across sampling sites (labelled as `site` in the #' data) in three different geographical regions (labelled as `region`), and #' you would like the residuals to be correlated within regions, then you #' should specify `unit = site`, `gr = region`, and `subgr = species` #' #' Internally, `mvgam()` will appropriately order the data by `unit` (in this #' case, by `site`) and create the `series` element for the data using #' something like: #' #' `series = as.factor(paste0(group, '_', subgroup))` #' #' @return An object of class \code{mvgam_trend}, which contains a list of #' arguments to be interpreted by the parsing functions in \pkg{mvgam} #' #' @examples #' \dontrun{ #' # Simulate counts of four species over ten sampling locations #' site_dat <- data.frame( #' site = rep(1:10, 4), #' species = as.factor(sort(rep(letters[1:4], 10))), #' y = c(NA, rpois(39, 3)) #' ) #' head(site_dat) #' #' # Set up a correlated residual (i.e. Joint Species Distribution) model #' trend_model <- ZMVN(unit = site, subgr = species) #' mod <- mvgam( #' y ~ species, #' trend_model = ZMVN(unit = site, subgr = species), #' data = site_dat, #' chains = 2, #' silent = 2 #' ) #' #' # Inspect the estimated species-species residual covariances #' mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') #' #' # A hierarchical correlation example #' Sigma <- matrix( #' c(1, -0.4, 0.5, #' -0.4, 1, 0.3, #' 0.5, 0.3, 1), #' byrow = TRUE, #' nrow = 3 #' ) #' #' make_site_dat <- function(...) { #' errors <- mgcv::rmvn( #' n = 30, #' mu = c(0.6, 0.8, 1.8), #' V = Sigma #' ) #' site_dat <- do.call(rbind, lapply(1:3, function(spec) { #' data.frame( #' y = rpois(30, lambda = exp(errors[, spec])), #' species = paste0('species', spec), #' site = 1:30 #' ) #' })) #' site_dat #' } #' #' site_dat <- rbind( #' make_site_dat() %>% #' dplyr::mutate(group = 'group1'), #' make_site_dat() %>% #' dplyr::mutate(group = 'group2') #' ) %>% #' dplyr::mutate( #' species = as.factor(species), #' group = as.factor(group) #' ) #' #' # Fit the hierarchical correlated residual model #' mod <- mvgam( #' y ~ species, #' trend_model = ZMVN(unit = site, gr = group, subgr = species), #' data = site_dat, #' chains = 2, #' silent = 2 #' ) #' #' # Inspect the estimated species-species residual covariances #' mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') #' } #' #' @export ZMVN = function(unit = time, gr = NA, subgr = series) { # Validate the supplied groupings and correlation argument unit <- deparse0(substitute(unit)) gr <- deparse0(substitute(gr)) subgr <- deparse0(substitute(subgr)) if (subgr == 'NA') { stop('argument "subgr" cannot be NA', call. = FALSE) } if (unit == 'NA') { stop('argument "unit" cannot be NA', call. = FALSE) } out <- structure( list( trend_model = 'ZMVN', ma = FALSE, cor = TRUE, unit = unit, gr = gr, subgr = subgr, label = match.call() ), class = 'mvgam_trend', param_info = list( param_names = c('trend', 'tau', 'sigma', 'theta', 'Sigma', 'error'), labels = c( 'trend_estimates', 'precision_parameter', 'standard_deviation', 'correlation_parameter', 'covariance_matrix', 'process_errors' ) ) ) } ================================================ FILE: R/noncent_trend.R ================================================ #' Internal functiosn to change dynamic AR or RW trends #' to a non-centred parameterisation for potentially big speed gains #' @noRd noncent_trend = function(model_file, trend_model, drift) { # Replace trend with trend_raw in params model_file[grep("matrix[n, n_series] trend;", model_file, fixed = TRUE)] <- "matrix[n, n_series] trend_raw;" model_file[grep("// latent trends", model_file, fixed = TRUE)] <- "// raw latent trends" # Add trend to transformed params if (drift) { drift_text <- ' drift[s] +' } else { drift_text <- NULL } if (trend_model == 'RW') { model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent trends\n", "matrix[n, n_series] trend;\n", "trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));\n", "for (s in 1 : n_series) {\n", "trend[2 : n, s] +=", drift_text, " trend[1 : (n - 1), s];\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR1') { model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent trends\n", "matrix[n, n_series] trend;\n", "trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));\n", "for (s in 1 : n_series) {\n", "trend[2 : n, s] +=", drift_text, " ar1[s] * trend[1 : (n - 1), s];\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'CAR1') { model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent trends\n", "matrix[n, n_series] trend;\n", "trend = trend_raw .* rep_matrix(sigma', n) .*", "(1 - rep_matrix(ar1', n) .^ (2 * to_matrix(time_dis))) ./", "(1 - rep_matrix(ar1', n) .^ 2);\n", "for (s in 1 : n_series) {\n", "trend[2 : n, s] +=", drift_text, " pow(ar1[s], to_vector(time_dis[2 : n, s])) .* trend[1 : (n - 1), s];\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR2') { model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent trends\n", "matrix[n, n_series] trend;\n", "trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));\n", "for (s in 1 : n_series) {\n", "trend[2, s] +=", drift_text, " ar1[s] * trend[1, s];\n", "trend[3 : n, s] +=", drift_text, " ar1[s] * trend[2 : (n - 1), s] + ar2[s] * trend[1:(n - 2), s];\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR3') { model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent trends\n", "matrix[n, n_series] trend;\n", "trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));\n", "for (s in 1 : n_series) {\n", "trend[2, s] +=", drift_text, " ar1[s] * trend[1, s];\n", "trend[3, s] +=", drift_text, " ar1[s] * trend[2, s] + ar2[s] * trend[1, s] ;\n", "trend[4 : n, s] +=", drift_text, " ar1[s] * trend[3 : (n - 1), s] + ar2[s] * trend[2:(n - 2), s] + ar3[s] * trend[1:(n - 3), s];\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) } # Remove trend statements from model block and replace with the # z scores trend_start <- grep("// trend estimates", model_file, fixed = TRUE) end_braces <- grep("}", model_file, fixed = TRUE) p <- function(f, b) function(a) f(a, b) trend_end <- end_braces[Position(p(`==`, 1), sign(end_braces - trend_start))] model_file <- model_file[-(trend_start:trend_end)] model_file[ grep( "// priors for latent trend variance parameters", model_file, fixed = TRUE ) + 1 ] <- paste0( model_file[ grep( "// priors for latent trend variance parameters", model_file, fixed = TRUE ) + 1 ], '\n', "to_vector(trend_raw) ~ std_normal();" ) model_file <- readLines(textConnection(model_file), n = -1) model_file } #' @noRd noncent_lv = function(model_file, trend_model, drift) { # Replace LV with LV_raw in params model_file[grep("matrix[n, n_lv] LV;", model_file, fixed = TRUE)] <- "matrix[n, n_lv] LV_raw;" model_file[grep("// latent states", model_file, fixed = TRUE)] <- "// raw latent states" # Add LV to transformed params model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b;\n\n", "// latent states\n", "matrix[n, n_lv] LV;\n" ) model_file <- readLines(textConnection(model_file), n = -1) # Add LV calculations in transformed params if (trend_model == 'None') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "for(i in 1:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]];\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'RW') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "LV[1, j] += trend_mus[ytimes_trend[1, j]];\n", "for(i in 2:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]] + 1 * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR1') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "LV[1, j] += trend_mus[ytimes_trend[1, j]];\n", "for(i in 2:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'CAR1') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "LV[1, j] += trend_mus[ytimes_trend[1, j]];\n", "for(i in 2:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]] + pow(ar1[j], time_dis[i, j]) * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR2') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "LV[1, j] += trend_mus[ytimes_trend[1, j]];\n", "LV[2, j] += trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]);\n", "for(i in 3:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]);\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR3') { model_file[grep( "trend_mus = X_trend * b_trend;", model_file, fixed = TRUE )] <- paste0( "trend_mus = X_trend * b_trend;\n\n", "LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));\n", "for(j in 1:n_lv){\n", "LV[1, j] += trend_mus[ytimes_trend[1, j]];\n", "LV[2, j] += trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]);\n", "LV[3, j] += trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[2, j] - trend_mus[ytimes_trend[2, j]]) + ar2[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]);\n", "for(i in 4:n){\n", "LV[i, j] += trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]) + ar3[j] * (LV[i - 3, j] - trend_mus[ytimes_trend[i - 3, j]]);\n", "}\n", "}" ) model_file <- readLines(textConnection(model_file), n = -1) } # Remove LV statements from model block and replace with the # z scores if (trend_model == 'None') { trend_start <- grep( "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]], sigma[j]);", model_file, fixed = TRUE ) - 2 trend_end <- grep( "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]], sigma[j]);", model_file, fixed = TRUE ) + 2 } else { if ( any(grepl( "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);", model_file, fixed = TRUE )) ) { trend_start <- grep( "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);", model_file, fixed = TRUE ) - 1 } else { trend_start <- grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE ) - 1 } end_braces <- grep("}", model_file, fixed = TRUE) p <- function(f, b) function(a) f(a, b) trend_end <- end_braces[Position( p(`==`, 1), sign(end_braces - trend_start) )] + 1 } model_file <- model_file[-(trend_start:trend_end)] if ( any(grepl( "// priors for latent state SD parameters", model_file, fixed = TRUE )) ) { model_file[ grep( "// priors for latent state SD parameters", model_file, fixed = TRUE ) + 1 ] <- paste0( model_file[ grep( "// priors for latent state SD parameters", model_file, fixed = TRUE ) + 1 ], '\n', "to_vector(LV_raw) ~ std_normal();" ) } else { model_file[ grep("// priors for factor SD parameters", model_file, fixed = TRUE) + 1 ] <- paste0( model_file[ grep("// priors for factor SD parameters", model_file, fixed = TRUE) + 1 ], '\n', "to_vector(LV_raw) ~ std_normal();" ) } model_file <- readLines(textConnection(model_file), n = -1) model_file } #' @noRd check_noncent = function( model_file, noncentred, use_lv, trend_map, add_ma, add_cor, trend_model, drift, silent, nmix ) { if (!missing(trend_map)) { trendmap <- TRUE } else { trendmap <- FALSE } # Haven't yet implemented noncentering for trend_map models that don't # use the trend_formula if ( trendmap & !any(grepl('trend_mus', model_file, fixed = TRUE)) & use_lv ) { trendmap <- FALSE noncentred <- FALSE } if (!noncentred & use_lv & trendmap & trend_model == 'None' & !nmix) { if (silent <= 1L) { message('Your model may benefit from using "noncentred = TRUE"') } } if ( !noncentred & !add_ma & !add_cor & !nmix & trend_model %in% c('RW', 'AR1', 'AR2', 'AR3', 'CAR1') ) { if (use_lv & trendmap) { if (silent <= 1L) { message('Your model may benefit from using "noncentred = TRUE"') } } if (!use_lv) { if (silent <= 1L) { message('Your model may benefit from using "noncentred = TRUE"') } } } if ( noncentred & !add_ma & !add_cor & trend_model %in% c('RW', 'AR1', 'AR2', 'AR3', 'CAR1', 'None') ) { if (use_lv & trendmap) { model_file <- noncent_lv( model_file = model_file, trend_model = trend_model, drift = FALSE ) } else { model_file <- noncent_trend( model_file = model_file, trend_model = trend_model, drift = drift ) } } return(list(model_file = model_file, noncentred = noncentred)) } ================================================ FILE: R/onAttach.R ================================================ .onAttach = function(libname, pkgname) { options("marginaleffects_model_classes" = "mvgam") version <- utils::packageVersion("mvgam") packageStartupMessage( insight::format_message( paste0( "Loading 'mvgam' (version ", version, "). Useful instructions can be found by typing help('mvgam'). A more detailed introduction to the package is available through vignette('mvgam_overview')." ) ) ) } ================================================ FILE: R/ordinate.jsdgam.R ================================================ #' Latent variable ordination plots from jsdgam objects #' #' Plot an ordination of latent variables and their factor loadings from #' \code{jsdgam} models #' #' @name ordinate.jsdgam #' #' @param object \code{list} object of class \code{jsdgam} resulting from a #' call to [jsdgam()] #' #' @param which_lvs A `vector` of indices indicating the two latent variables #' to be plotted (if number of the latent variables specified in the model #' was more than 2). Defaults to \code{c(1, 2)} #' #' @param biplot `Logical`. If `TRUE`, both the site and the species scores #' will be plotted, with names for the taxa interpreted based on the #' `species` argument in the original call to [jsdgam()]. If `FALSE`, only #' the site scores will be plotted #' #' @param alpha A proportional numeric scalar between `0` and `1` that #' controls the relative scaling of the latent variables and their loading #' coefficients #' #' @param label_sites \code{Logical} flag. If `TRUE`, site scores will be #' plotted as labels using names based on the `unit` argument in the #' original call to [jsdgam()]. If `FALSE`, site scores will be shown as #' points only #' #' @param ... ignored #' #' @details #' This function constructs a two-dimensional scatterplot in ordination space. #' The chosen latent variables are first re-rotated using singular value #' decomposition, so that the first plotted latent variable does not have to #' be the first latent variable that was estimated in the original model. #' Posterior median estimates of the variables and the species' loadings on #' these variables are then used to construct the resulting plot. Some attempt #' at de-cluttering the resulting plot is made by using `geom_label_repel()` #' and `geom_text_repel` from the \pkg{ggrepel} package, but if there are many #' sites and/or species then some labels may be removed automatically. Note #' that you can typically get better, more readable plot layouts if you also #' have the \pkg{ggarrow} and \pkg{ggpp} packages installed #' #' @return An `ggplot` object #' #' @author Nicholas J Clark #' #' @seealso [jsdgam()], [residual_cor()] #' #' @examples #'\dontrun{ #' # Fit a JSDGAM to the portal_data captures #' mod <- jsdgam( #' formula = captures ~ #' # Fixed effects of NDVI and mintemp, row effect as a GP of time #' ndvi_ma12:series + mintemp:series + gp(time, k = 15), #' factor_formula = ~ -1, #' data = portal_data, #' unit = time, #' species = series, #' family = poisson(), #' n_lv = 2, #' silent = 2, #' chains = 2 #' ) #' #' # Plot a residual ordination biplot #' ordinate( #' mod, #' alpha = 0.7 #' ) #' #' # Compare to a residual correlation plot #' plot( #' residual_cor(mod) #' ) #' } #' #' @export ordinate <- function(object, ...) { UseMethod("ordinate", object) } #' @rdname ordinate.jsdgam #' @method ordinate jsdgam #' @importFrom grid arrow unit #' @export ordinate.jsdgam <- function( object, which_lvs = c(1, 2), biplot = TRUE, alpha = 0.5, label_sites = TRUE, ... ) { insight::check_if_installed( 'ggrepel', reason = 'to adequately plot ordination scores' ) if (!requireNamespace('ggpp', quietly = TRUE)) { rlang::inform( message = paste0( 'Package "ggpp" can enable more readable ordination plots\n', 'Please consider installing it' ), .frequency = 'once', .frequency_id = 'ggpp_ordinate' ) } if (!requireNamespace('ggarrow', quietly = TRUE)) { rlang::inform( message = paste0( 'Package "ggarrow" can enable more readable ordination plots\n', 'Please consider installing it' ), .frequency = 'once', .frequency_id = 'ggarrow_ordinate' ) } # Check arguments if (length(which_lvs) != 2L) { stop("argument 'which_lvs' must be a vector of length 2", call. = FALSE) } if (object$n_lv > 2 & any(which_lvs > object$n_lv)) { stop( "Fewer latent variables available than those chosen by which_lvs", call. = FALSE ) } validate_proportional(alpha) # Get indices of LV estimates ends <- seq( 0, dim(mcmc_chains(object$model_output, 'LV'))[2], length.out = object$n_lv + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, object$n_lv + 1)]) ends <- ends[-1] # Loop across each lv and calculate median estimates lv_estimates <- do.call( cbind, lapply(1:object$n_lv, function(x) { inds_lv <- seq( x, dim(mcmc_chains(object$model_output, 'LV'))[2], by = object$n_lv ) preds <- mcmc_chains(object$model_output, 'LV')[, inds_lv] # Keep only the in-sample observations of the factors preds <- preds[, 1:(length(object$obs_data$y) / NCOL(object$ytimes))] # Calculate posterior medians apply(preds, 2, median) }) ) # Extract loadings, compute the SVD to re-rotate the variables and loadings # Credit for much of this code goes to Francis Hui, original author of the BORAL # R package (https://github.com/emitanaka/boral) lv_estimates <- as.matrix(lv_estimates) lv_coefs <- apply(mcmc_chains(object$model_output, 'lv_coefs'), 2, median) lv_coefs <- t(matrix(lv_coefs, nrow = object$n_lv)) testcov <- tcrossprod(lv_estimates, lv_coefs) do_svd <- svd(testcov, object$n_lv, object$n_lv) choose_lvs <- scale( do_svd$u * matrix( do_svd$d[1:object$n_lv]^alpha, nrow = NROW(lv_estimates), ncol = object$n_lv, byrow = TRUE ), center = TRUE, scale = FALSE ) choose_lv_coefs <- scale( do_svd$v * matrix( do_svd$d[1:object$n_lv]^(1 - alpha), nrow = NROW(lv_coefs), ncol = object$n_lv, byrow = TRUE ), center = TRUE, scale = FALSE ) largest_lnorms <- order( rowSums(choose_lv_coefs^2), decreasing = TRUE )[1:NROW(lv_coefs)] # Extract site and species loadings into dataframes for plotting sp_dat <- data.frame(choose_lv_coefs)[, which_lvs] colnames(sp_dat) <- c('x', 'y') site_dat <- data.frame(choose_lvs)[, which_lvs] colnames(site_dat) <- c('x', 'y') plot_dat <- rbind( sp_dat, site_dat ) # Get taxa names sp_names <- object$trend_map$series # Get site names unit_name <- attr(object$model_data, 'prepped_trend_model')$unit site_names <- unique( object$obs_data[[unit_name]] ) # Create the base ggplot base_plot <- ggplot2::ggplot(plot_dat, ggplot2::aes(x, y)) + ggplot2::labs( x = paste("Latent variable", which_lvs[1]), y = paste("Latent variable", which_lvs[2]) ) # Add layers accordingly if (label_sites) { p <- base_plot + ggrepel::geom_text_repel( data = site_dat, aes(label = site_names), alpha = 0.75, size = 3, max.overlaps = 20, colour = 'grey40', segment.color = NA ) + ggplot2::geom_point( data = site_dat, pch = 21, fill = 'grey20', colour = 'white' ) } else { p <- base_plot + ggplot2::geom_point( data = site_dat, pch = 21, fill = 'grey20', colour = 'white' ) } if (biplot) { if ( requireNamespace('ggarrow', quietly = TRUE) & requireNamespace('ggpp', quietly = TRUE) ) { sp_dat$group <- paste('gr', 1:NROW(sp_dat)) sp_arrow_dat <- do.call( rbind, lapply(1:nlevels(sp_names), function(x) { data.frame( x = seq(0, sp_dat$x[x], length.out = 20), y = seq(0, sp_dat$y[x], length.out = 20), group = sp_dat$group[x] ) }) ) %>% dplyr::mutate(lw = abs(x) + abs(y)) p <- p + ggarrow::geom_arrow( data = sp_arrow_dat, ggplot2::aes( x = x, y = y, group = group, linewidth = lw ), colour = 'darkred', stroke_colour = 'white', stroke_width = 0.1, alpha = 0.5, show.legend = FALSE ) + ggplot2::scale_linewidth(range = c(0.45, 1.75)) + ggrepel::geom_label_repel( data = sp_dat, ggplot2::aes(label = sp_names), color = 'darkred', box.padding = 0.1, label.size = 0.1, alpha = 0.75, max.overlaps = 20, segment.color = NA, position = ggpp::position_nudge_center(0.025, 0.025, 0, 0) ) } else { p <- p + ggplot2::geom_segment( data = sp_dat, ggplot2::aes( x = 0, y = 0, xend = x, yend = y ), arrow = arrow( length = unit(0.1, "cm"), type = 'closed' ), alpha = 0.5, color = 'darkred' ) + ggrepel::geom_label_repel( data = sp_dat, ggplot2::aes(label = sp_names), color = 'darkred', box.padding = 0.1, label.size = 0.1, alpha = 0.75, max.overlaps = 20 ) } } # Return the plot p <- p + ggplot2::theme_classic() return(p) } ================================================ FILE: R/pairs.mvgam.R ================================================ #' Create a matrix of output plots from a \code{mvgam} object #' #' A \code{\link[graphics:pairs]{pairs}} #' method that is customized for MCMC output. #' #' @param x An object of class \code{mvgam} or \code{jsdgam} #' @inheritParams mcmc_plot.mvgam #' @param ... Further arguments to be passed to #' \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}}. #' #' @return Plottable objects whose classes depend on the arguments supplied. #' See \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}} for details. #' @details For a detailed description see #' \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}}. #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') #' mod <- mvgam(y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2) #' pairs(mod) #' pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE) #' } #' #' @export pairs.mvgam <- function( x, variable = NULL, regex = FALSE, use_alias = TRUE, ... ) { # Set red colour scheme col_scheme <- attr(color_scheme_get(), 'scheme_name') color_scheme_set('red') # Set default params to plot # By default, don't plot the Betas as there can be hundreds # of them in spline models if (is.null(variable)) { all_pars <- variables(x) variable <- c( all_pars$observation_pars[, 1], all_pars$observation_smoothpars[, 1], all_pars$observation_re_params[, 1], all_pars$trend_pars[, 1], all_pars$trend_smoothpars[, 1], all_pars$trend_re_params[, 1] ) regex <- FALSE } draws <- as.array( x, variable = variable, regex = regex, use_alias = use_alias ) # Generate plot and reset colour scheme out_plot <- bayesplot::mcmc_pairs(draws, ...) color_scheme_set(col_scheme) # Return the plot return(out_plot) } ================================================ FILE: R/piecewise_trends.R ================================================ #' Updates for adding piecewise trends #' @noRd add_piecewise = function( model_file, model_data, data_train, data_test = NULL, orig_trend_model, family ) { trend_model <- orig_trend_model$trend_model n_changepoints <- orig_trend_model$n_changepoints changepoint_range <- orig_trend_model$changepoint_range changepoint_scale <- orig_trend_model$changepoint_scale if (family$family == 'Gamma') { family <- Gamma(link = 'log') } if (trend_model == 'PWlogistic') { if (!(exists('cap', where = data_train))) { stop( 'Capacities must be supplied as a variable named "cap" for logistic growth', call. = FALSE ) } if (any(is.na(data_train$cap))) { stop('Missing values found for some "cap" terms', call. = FALSE) } if (any(is.infinite(data_train$cap))) { stop('Infinite values found for some "cap" terms', call. = FALSE) } # Matrix of capacities per series (these must operate on the link scale) all_caps <- data.frame( series = as.numeric(data_train$series), time = data_train$time, cap = suppressWarnings(linkfun(data_train$cap, link = family$link)) ) %>% dplyr::arrange(time, series) if (any(is.na(all_caps$cap)) | any(is.infinite(all_caps$cap))) { stop( paste0( 'Missing or infinite values found for some "cap" terms\n', 'after transforming to the ', family$link, ' link scale' ), call. = FALSE ) } if (!is.null(data_test)) { if (!(exists('cap', where = data_test))) { stop( 'Capacities must also be supplied in "newdata" for logistic growth predictions', call. = FALSE ) } all_caps <- rbind( all_caps, data.frame( series = as.numeric(data_test$series), time = data_test$time, cap = suppressWarnings(linkfun(data_test$cap, link = family$link)) ) ) %>% dplyr::arrange(time, series) if (any(is.na(all_caps$cap)) | any(is.infinite(all_caps$cap))) { stop( paste0( 'Missing or infinite values found for some "cap" terms\n', 'after transforming to the ', family$link, ' link scale' ), call. = FALSE ) } } cap <- matrix( NA, nrow = length(unique(all_caps$time)), ncol = length(unique(all_caps$series)) ) for (i in 1:length(unique(all_caps$series))) { cap[, i] <- all_caps$cap[which(all_caps$series == i)] } } else { cap <- NULL } #### Distribute possible changepoints #### scaled_time <- unique(data_train$time - min(data_train$time) + 1) max_time <- max(scaled_time) hist_size <- floor(max_time * changepoint_range) t_change <- unique(round(seq.int( 1, hist_size, length.out = (n_changepoints + 1) )[-1])) n_changepoints <- length(t_change) change_freq <- n_changepoints / hist_size if (!is.null(data_test)) { # Get forecast horizon changepoints # This can go in with the data if newdata is supplied; else it needs # to be used when extrapolating the trend forward n_new_changes <- stats::rpois( 1, (change_freq * (max(data_test$time) - min(data_test$time))) ) # Spread the forecast changepoints evenly across the forecast # horizon scaled_test_time <- unique(data_test$time - min(data_train$time) + 1) t_change_new <- unique(floor(seq.int( min(scaled_test_time), max(scaled_test_time), length.out = n_new_changes ))) t_change <- c(t_change, t_change_new) n_changepoints <- n_changepoints + n_new_changes scaled_time <- c(scaled_time, scaled_test_time) } # Add changepoint info to the data model_data$n_changepoints <- n_changepoints model_data$change_freq <- change_freq model_data$t_change <- t_change model_data$time <- scaled_time model_data$changepoint_scale <- changepoint_scale model_data$cap <- cap #### Update the model file appropriately #### # Add the piecewise functions if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'matrix get_changepoint_matrix(vector t, vector t_change, int T, int S) {\n', '/* Function to sort changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'matrix[T, S] A;\n', 'row_vector[S] a_row;\n', 'int cp_idx;\n', 'A = rep_matrix(0, T, S);\n', 'a_row = rep_row_vector(0, S);\n', 'cp_idx = 1;\n', 'for (i in 1:T) {\n', 'while ((cp_idx <= S) && (t[i] >= t_change[cp_idx])) {\n', 'a_row[cp_idx] = 1;\n', 'cp_idx = cp_idx + 1;\n', '}\n', 'A[i] = a_row;\n', '}\n', 'return A;\n', '}\n', '// logistic trend functions\n', 'vector logistic_gamma(real k, real m, vector delta, vector t_change, int S) {\n', '/* Function to compute a logistic trend with changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector[S] gamma; // adjusted offsets, for piecewise continuity\n', 'vector[S + 1] k_s; // actual rate in each segment\n', 'real m_pr;\n', 'k_s = append_row(k, k + cumulative_sum(delta));\n', 'm_pr = m; // The offset in the previous segment\n', 'for (i in 1:S) {\n', 'gamma[i] = (t_change[i] - m_pr) * (1 - k_s[i] / k_s[i + 1]);\n', 'm_pr = m_pr + gamma[i]; // update for the next segment\n', '}\n', 'return gamma;\n', '}\n', 'vector logistic_trend(\n', 'real k,\n', 'real m,\n', 'vector delta,\n', 'vector t,\n', 'vector cap,\n', 'matrix A,\n', 'vector t_change,\n', 'int S\n', ') {\n', '/* Function to adjust a logistic trend using a carrying capacity */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector[S] gamma;\n', 'gamma = logistic_gamma(k, m, delta, t_change, S);\n', 'return cap .* inv_logit((k + A * delta) .* (t - (m + A * gamma)));\n', '}\n', '// linear trend function\n', '/* Function to compute a linear trend with changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector linear_trend(\n', 'real k,\n', 'real m,\n', 'vector delta,\n', 'vector t,\n', 'matrix A,\n', 'vector t_change\n', ') {\n', 'return (k + A * delta) .* t + (m + A * (-t_change .* delta));\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'matrix get_changepoint_matrix(vector t, vector t_change, int T, int S) {\n', '/* Function to sort changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'matrix[T, S] A;\n', 'row_vector[S] a_row;\n', 'int cp_idx;\n', 'A = rep_matrix(0, T, S);\n', 'a_row = rep_row_vector(0, S);\n', 'cp_idx = 1;\n', 'for (i in 1:T) {\n', 'while ((cp_idx <= S) && (t[i] >= t_change[cp_idx])) {\n', 'a_row[cp_idx] = 1;\n', 'cp_idx = cp_idx + 1;\n', '}\n', 'A[i] = a_row;\n', '}\n', 'return A;\n', '}\n', '// logistic trend functions\n', 'vector logistic_gamma(real k, real m, vector delta, vector t_change, int S) {\n', '/* Function to compute a logistic trend with changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector[S] gamma; // adjusted offsets, for piecewise continuity\n', 'vector[S + 1] k_s; // actual rate in each segment\n', 'real m_pr;\n', 'k_s = append_row(k, k + cumulative_sum(delta));\n', 'm_pr = m; // The offset in the previous segment\n', 'for (i in 1:S) {\n', 'gamma[i] = (t_change[i] - m_pr) * (1 - k_s[i] / k_s[i + 1]);\n', 'm_pr = m_pr + gamma[i]; // update for the next segment\n', '}\n', 'return gamma;\n', '}\n', 'vector logistic_trend(\n', 'real k,\n', 'real m,\n', 'vector delta,\n', 'vector t,\n', 'vector cap,\n', 'matrix A,\n', 'vector t_change,\n', 'int S\n', ') {\n', '/* Function to adjust a logistic trend using a carrying capacity */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector[S] gamma;\n', 'gamma = logistic_gamma(k, m, delta, t_change, S);\n', 'return cap .* inv_logit((k + A * delta) .* (t - (m + A * gamma)));\n', '}\n', '// linear trend function\n', '/* Function to compute a linear trend with changepoints */\n', '/* credit goes to the Prophet development team at Meta (https://github.com/facebook/prophet/tree/main)*/\n', 'vector linear_trend(\n', 'real k,\n', 'real m,\n', 'vector delta,\n', 'vector t,\n', 'matrix A,\n', 'vector t_change\n', ') {\n', 'return (k + A * delta) .* t + (m + A * (-t_change .* delta));\n', '}\n}\n' ) } # Update the data block model_file[grep('int num_basis;', model_file, fixed = TRUE)] <- paste0( "int num_basis; // total number of basis coefficients\n", "vector[n] time; // index of time for changepoint model\n", "int n_changepoints; // number of potential trend changepoints\n", "vector[n_changepoints] t_change; // times of potential changepoints\n", if (trend_model == 'PWlogistic') { "matrix[n, n_series] cap; // carrying capacities for logistic trends\n" } else { NULL }, 'real changepoint_scale; // scale of changepoint shock prior\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Update the transformed data block if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', '// sorted changepoint matrix\n', 'matrix[n, n_changepoints] A = get_changepoint_matrix(time, t_change, n, n_changepoints);\n' ) } else { model_file[grep('parameters {', model_file, fixed = TRUE)[1]] <- paste0( 'transformed data {\n', '// sorted changepoint matrix\n', 'matrix[n, n_changepoints] A = get_changepoint_matrix(time, t_change, n, n_changepoints);\n', '}\nparameters {' ) } model_file <- readLines(textConnection(model_file), n = -1) # Update the parameters block model_file <- model_file[ -c( grep( '// latent trend variance parameters', model_file, fixed = TRUE ):(grep('// latent trend variance parameters', model_file, fixed = TRUE) + 1) ) ] model_file <- model_file[ -c( grep('// latent trends', model_file, fixed = TRUE):(grep( '// latent trends', model_file, fixed = TRUE ) + 1) ) ] model_file[grep("vector[num_basis] b_raw;", model_file, fixed = TRUE)] <- paste0( "vector[num_basis] b_raw;\n", "// base trend growth rates\n", "vector[n_series] k_trend;\n\n", "// trend offset parameters\n", "vector[n_series] m_trend;\n\n", "// trend rate adjustments per series\n", "matrix[n_changepoints, n_series] delta_trend;\n" ) model_file <- readLines(textConnection(model_file), n = -1) # Update the transformed parameters block model_file[grep("transformed parameters {", model_file, fixed = TRUE)] <- paste0( "transformed parameters {\n", "// latent trends\n", "matrix[n, n_series] trend;\n" ) max_rawline <- max(grep('= b_raw', model_file)) model_file[max_rawline] <- paste0( model_file[max_rawline], '\n\n', '// trend estimates\n', 'for (s in 1 : n_series) {\n', if (trend_model == 'PWlogistic') { 'trend[1 : n, s] = logistic_trend(k_trend[s], m_trend[s], to_vector(delta_trend[,s]), time, to_vector(cap[,s]), A, t_change, n_changepoints);\n' } else { 'trend[1 : n, s] = linear_trend(k_trend[s], m_trend[s], to_vector(delta_trend[,s]), time, A, t_change);\n' }, '}\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Update the model block model_file <- model_file[ -c( grep( '// priors for latent trend variance parameters', model_file, fixed = TRUE ):(grep( '// priors for latent trend variance parameters', model_file, fixed = TRUE ) + 1) ) ] rw_start <- grep( "trend[1, 1:n_series] ~ normal(0, sigma);", model_file, fixed = TRUE ) rw_lines <- (rw_start - 1):(rw_start + 3) model_file <- model_file[-rw_lines] model_file[grep("// likelihood functions", model_file, fixed = TRUE) - 1] <- paste0( '// trend parameter priors\n', 'm_trend ~ student_t(3, 0, 2.5);\n', 'k_trend ~ std_normal();\n', 'to_vector(delta_trend) ~ double_exponential(0, changepoint_scale);\n', model_file[grep("// likelihood functions", model_file, fixed = TRUE) - 1] ) model_file <- readLines(textConnection(model_file), n = -1) # Update the generated quantities block model_file <- model_file[ -grep("vector[n_series] tau;", model_file, fixed = TRUE) ] tau_start <- grep("tau[s] = pow(sigma[s], -2.0);", model_file, fixed = TRUE) - 1 model_file <- model_file[-c(tau_start:(tau_start + 2))] model_file <- readLines(textConnection(model_file), n = -1) #### Return #### return(list(model_file = model_file, model_data = model_data)) } ================================================ FILE: R/plot.mvgam.R ================================================ #' Default plots for \pkg{mvgam} models #' #' This function takes a fitted \code{mvgam} object and produces plots of #' smooth functions, forecasts, trends and uncertainty components #' #' @param x \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param type \code{character} specifying which type of plot to return. #' Options are: `"series"`, `"residuals"`, `"smooths"`, `"re"` (random effect smooths), #' `"pterms"` (parametric effects), `"forecast"`, `"trend"`, `"uncertainty"`, #' `"factors"` #' #' @param residuals \code{logical}. If \code{TRUE} and `type = 'smooths'`, #' posterior quantiles of partial residuals are added to plots of 1-D #' smooths as a series of ribbon rectangles. Partial residuals for a #' smooth term are the median Dunn-Smyth residuals that would be obtained #' by dropping the term concerned from the model, while leaving all other #' estimates fixed (i.e. the estimates for the term plus the original #' median Dunn-Smyth residuals). Note that because \code{mvgam} works with #' Dunn-Smyth residuals and not working residuals, which are used by #' \code{mgcv}, the magnitudes of partial residuals will be different to #' what you would expect from \code{\link[mgcv]{plot.gam}}. Interpretation #' is similar though, as these partial residuals should be evenly scattered #' around the smooth function if the function is well estimated #' #' @param series \code{integer} specifying which series in the set is to be #' plotted. This is ignored if \code{type == 're'} #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing at least 'series' and 'time' in addition to any other #' variables included in the linear predictor of the original #' \code{formula}. This argument is optional when plotting out of sample #' forecast period observations (when \code{type = forecast}) and required #' when plotting uncertainty components (\code{type = uncertainty}). #' #' @param trend_effects logical. If `TRUE` and a `trend_formula` was used in #' model fitting, terms from the trend (i.e. process) model will be plotted #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param ... Additional arguments for each individual plotting function. #' #' @details These plots are useful for getting an overview of the fitted #' model and its estimated random effects or smooth functions, but the #' individual plotting functions and the functions from the `marginaleffects` #' and `gratia` packages offer far more customisation. #' #' @seealso \code{\link{plot_mvgam_resids}}, #' \code{\link{plot_mvgam_smooth}}, #' \code{\link{plot_mvgam_fc}}, #' \code{\link{plot_mvgam_trend}}, #' \code{\link{plot_mvgam_uncertainty}}, #' \code{\link{plot_mvgam_factors}}, #' \code{\link{plot_mvgam_randomeffects}}, #' \code{\link{conditional_effects.mvgam}}, #' \code{\link[marginaleffects]{plot_predictions}}, #' \code{\link[marginaleffects]{plot_slopes}}, #' \code{\link{gratia_mvgam_enhancements}} #' #' @author Nicholas J Clark #' #' @return A base R plot or set of plots #' #' @examples #' \dontrun{ #' # Simulate some time series #' dat <- sim_mvgam( #' T = 80, #' n_series = 3 #' ) #' #' # Fit a basic model #' mod <- mvgam( #' y ~ s(season, bs = 'cc') + s(series, bs = 're'), #' data = dat$data_train, #' trend_model = RW(), #' chains = 2, #' silent = 2 #' ) #' #' # Plot predictions and residuals for each series #' plot(mod, type = 'forecast', series = 1) #' plot(mod, type = 'forecast', series = 2) #' plot(mod, type = 'forecast', series = 3) #' plot(mod, type = 'residuals', series = 1) #' plot(mod, type = 'residuals', series = 2) #' plot(mod, type = 'residuals', series = 3) #' #' # Plot model effects #' plot(mod, type = 'smooths') #' plot(mod, type = 're') #' #' # More flexible plots with 'marginaleffects' utilities #' library(marginaleffects) #' #' plot_predictions( #' mod, #' condition = 'season', #' type = 'link' #' ) #' #' plot_predictions( #' mod, #' condition = c('season', 'series', 'series'), #' type = 'link' #' ) #' #' plot_predictions( #' mod, #' condition = 'series', #' type = 'link' #' ) #' #' # When using a State-Space model with predictors on the process #' # model, set trend_effects = TRUE to visualise process effects #' mod <- mvgam( #' y ~ -1, #' trend_formula = ~ s(season, bs = 'cc'), #' data = dat$data_train, #' trend_model = RW(), #' chains = 2, #' silent = 2 #' ) #' #' plot(mod, type = 'smooths', trend_effects = TRUE) #' #' # But 'marginaleffects' functions work without any modification #' plot_predictions( #' mod, #' condition = 'season', #' type = 'link' #' ) #' } #' #' @export plot.mvgam = function( x, type = 'residuals', series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... ) { object <- x # Argument checks type <- match.arg( arg = type, choices = c( "residuals", "smooths", "re", "pterms", "forecast", "trend", "uncertainty", "factors", "series" ) ) if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (!missing("newdata")) { data_test <- newdata } # Other errors and warnings will propagate from individual functions below if (type == 'series') { print(plot_mvgam_series(object, series = series, newdata = data_test, ...)) } if (type == 're') { plot_mvgam_randomeffects(object, trend_effects = trend_effects, ...) } if (type == 'pterms') { plot_mvgam_pterms(object, trend_effects = trend_effects, ...) } if (type == 'residuals') { return(plot_mvgam_resids(object, series = series, ...)) } if (type == 'factors') { if (!object$use_lv) { stop('no latent variables were fitted in the model') } else { return(plot_mvgam_factors(object)) } } if (type == 'forecast') { if (missing(data_test)) { plot_mvgam_fc(object, series = series, ...) } else { plot_mvgam_fc(object, series = series, data_test = data_test, ...) } } if (type == 'trend') { if (missing(data_test)) { return(plot_mvgam_trend(object, series = series, ...)) } else { return(plot_mvgam_trend( object, series = series, data_test = data_test, ... )) } } if (type == 'uncertainty') { if (missing(data_test)) { stop('data_test is required for plotting uncertainty contributions') } else { plot_mvgam_uncertainty( object, series = series, data_test = data_test, ... ) } } if (type == 'smooths') { object2 <- object if (trend_effects) { if (is.null(object$trend_call)) { stop('no trend_formula exists so there no trend-level smooths to plot') } object2$mgcv_model <- object2$trend_mgcv_model } # Get labels of all included smooths from the object2 smooth_labs <- do.call( rbind, lapply(seq_along(object2$mgcv_model$smooth), function(x) { data.frame( label = object2$mgcv_model$smooth[[x]]$label, class = class(object2$mgcv_model$smooth[[x]])[1], mgcv_plottable = object2$mgcv_model$smooth[[x]]$plot.me ) }) ) n_smooths <- NROW(smooth_labs) if (n_smooths == 0) { stop( "No smooth terms to plot. Use plot_predictions() to visualise other effects", call. = FALSE ) } smooth_labs$smooth_index <- 1:NROW(smooth_labs) # Leave out random effects and MRF smooths, and any others that are not # considered plottable by mgcv smooth_labs %>% dplyr::filter(class != 'random.effect') %>% dplyr::filter(class != 'mrf.smooth') %>% dplyr::filter(mgcv_plottable) -> smooth_labs if (length(smooth_labs$label) == 0) { stop("No terms to plot - nothing for plot.mvgam() to do.") } # Check which ones plot_mvgam_smooth can handle (no more than 3 dimensions) plottable = function(x) { length(unlist(strsplit(x, ','))) <= 3 & length(unlist(strsplit(x, ':'))) <= 3 } which_to_plot <- (smooth_labs$smooth_index)[sapply( as.character(smooth_labs$label), plottable )] n_smooths <- length(which_to_plot) # For remaining plots, get the needed page numbers n_plots <- n_smooths if (n_plots == 0) { stop( "No suitable terms to plot - plot.mvgam() only handles smooths of 2 or fewer dimensions." ) } pages <- 1 if (n_plots > 4) { pages <- 2 } if (n_plots > 8) { pages <- 3 } if (n_plots > 12) { pages <- 4 } if (pages != 0) { ppp <- n_plots %/% pages if (n_plots %% pages != 0) { ppp <- ppp + 1 while (ppp * (pages - 1) >= n_plots) { pages <- pages - 1 } } # Configure layout matrix c <- r <- trunc(sqrt(ppp)) if (c < 1) { r <- c <- 1 } if (c * r < ppp) { c <- c + 1 } if (c * r < ppp) { r <- r + 1 } .pardefault <- par(no.readonly = T) on.exit(par(.pardefault)) oldpar <- par(mfrow = c(r, c)) } else { ppp <- 1 oldpar <- par() } # Plot the smooths for (i in which_to_plot) { plot_mvgam_smooth( object = object2, smooth = i, series = series, residuals = residuals, trend_effects = trend_effects, ... ) } layout(1) } } ================================================ FILE: R/plot_mvgam_factors.R ================================================ #' Latent factor summaries for a fitted \pkg{mvgam} object #' #' This function takes a fitted \code{mvgam} object and returns plots and #' summary statistics for the latent dynamic factors #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param plot \code{logical} specifying whether factors should be plotted #' #' @author Nicholas J Clark #' #' @details If the model in \code{object} was estimated using dynamic factors, #' it is possible that not all factors contributed to the estimated trends. #' This is due to the regularisation penalty that acts independently on each #' factor's Gaussian precision, which will squeeze un-needed factors to a #' white noise process (effectively dropping that factor from the model). In #' this function, each factor is tested against a null hypothesis of white #' noise by calculating the sum of the factor's 2nd derivatives. A factor #' that has a larger contribution will have a larger sum due to the weaker #' penalty on the factor's precision. If \code{plot == TRUE}, the factors #' are also plotted. #' #' @return A \code{data.frame} of factor contributions #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam() #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' use_lv = TRUE, #' n_lv = 2, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' plot_mvgam_factors(mod) #' } #' #' @export plot_mvgam_factors = function(object, plot = TRUE) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } # Check object has latent dynamic factors if (!object$use_lv) { stop('No latent factors used in object') } # Get indices of LV estimates ends <- seq( 0, dim(mcmc_chains(object$model_output, 'LV'))[2], length.out = object$n_lv + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, object$n_lv + 1)]) ends <- ends[-1] probs <- c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) # Loop across each lv and calculate probability that the lv was dropped lv_estimates <- do.call( rbind, lapply(1:object$n_lv, function(x) { if (object$fit_engine == 'stan') { inds_lv <- seq( x, dim(mcmc_chains(object$model_output, 'LV'))[2], by = object$n_lv ) preds <- mcmc_chains(object$model_output, 'LV')[, inds_lv] } else { preds <- mcmc_chains(object$model_output, 'LV')[, starts[x]:ends[x]] } # Keep only the in-sample observations for testing against the null of white noise preds <- preds[, 1:(length(object$obs_data$y) / NCOL(object$ytimes))] cred <- as.data.frame(t(sapply( 1:NCOL(preds), function(n) quantile(preds[, n], probs = probs, na.rm = TRUE) ))) %>% dplyr::mutate(lv = paste0('Factor ', x), time = 1:NCOL(preds)) colnames(cred) <- c( paste0('lower', 1:4), 'med', paste0('upper', 4:1), 'lv', 'time' ) cred }) ) # If plot = TRUE, plot the LVs if (plot) { p <- ggplot2::ggplot( data = lv_estimates, mapping = ggplot2::aes(x = time, y = med) ) + ggplot2::facet_wrap(~lv) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "#DCBCBC" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower2, ymax = upper2), fill = "#C79999" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower3, ymax = upper3), fill = "#B97C7C" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower4, ymax = upper4), fill = "#A25050" ) + ggplot2::geom_line( mapping = ggplot2::aes(x = time, y = med), col = "#8F2727", linewidth = 1 ) + ggplot2::theme_bw() + ggplot2::labs(x = 'Time', y = 'Posterior prediction') } # Calculate second derivatives of empirical medians and upper / lower intervals; # factors with small second derivatives are moving in roughly a straight line and not # likely contributing much (or at all) to the latent trend estimates lv_contributions <- lv_estimates %>% dplyr::group_by(lv) %>% dplyr::reframe( med_deriv = abs(diff(diff(med))), upper_deriv = abs(diff(diff(upper2))), lower_deriv = abs(diff(diff(lower2))) ) %>% dplyr::rowwise() %>% dplyr::mutate(contribution = sum(med_deriv, upper_deriv, lower_deriv)) %>% dplyr::group_by(lv) %>% dplyr::summarise(sum_contribution = sum(contribution)) %>% dplyr::ungroup() %>% dplyr::mutate( total = sum(sum_contribution), Contribution = sum_contribution / total, Factor = lv ) %>% dplyr::select(Factor, Contribution) if (plot) { print(p) } lv_contributions } ================================================ FILE: R/plot_mvgam_fc.R ================================================ #' Plot posterior forecast predictions from \pkg{mvgam} models #' #' @importFrom stats formula terms #' #' @param object \code{list} object of class \code{mvgam}. See [mvgam()] #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing at least 'series' and 'time' in addition to any other #' variables included in the linear predictor of the original #' \code{formula}. If included, the covariate information in \code{newdata} #' will be used to generate forecasts from the fitted model equations. If #' this same \code{newdata} was originally included in the call to #' \code{mvgam}, then forecasts have already been produced by the generative #' model and these will simply be extracted and plotted. However if no #' \code{newdata} was supplied to the original model call, an assumption is #' made that the \code{newdata} supplied here comes sequentially after the #' data supplied as \code{data} in the original model (i.e. we assume there #' is no time gap between the last observation of series 1 in \code{data} #' and the first observation for series 1 in \code{newdata}). If #' \code{newdata} contains observations in column \code{y}, these #' observations will be used to compute a Discrete Rank Probability Score #' for the forecast distribution #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param realisations \code{logical}. If \code{TRUE}, forecast realisations #' are shown as a spaghetti plot, making it easier to visualise the #' diversity of possible forecasts. If \code{FALSE}, the default, empirical #' quantiles of the forecast distribution are shown #' #' @param n_realisations \code{integer} specifying the number of posterior #' realisations to plot, if \code{realisations = TRUE}. Ignored otherwise #' #' @param n_cores \code{integer} specifying number of cores for generating #' forecasts in parallel #' #' @param hide_xlabels \code{logical}. If \code{TRUE}, no xlabels are printed #' to allow the user to add custom labels using \code{axis} from base #' \code{R} #' #' @param xlab Label for x axis #' #' @param ylab Label for y axis #' #' @param ylim Optional \code{vector} of y-axis limits (min, max) #' #' @param ... Further \code{\link[graphics]{par}} graphical parameters #' #' @param return_forecasts \code{logical}. If \code{TRUE}, the function will #' plot the forecast as well as returning the forecast object (as a #' \code{matrix} of dimension \code{n_samples} x \code{horizon}) #' #' @param return_score \code{logical}. If \code{TRUE} and out of sample test #' data is provided as \code{newdata}, a probabilistic score will be #' calculated and returned. The score used will depend on the observation #' family from the fitted model. Discrete families (\code{poisson}, #' \code{negative binomial}, \code{tweedie}) use the Discrete Rank #' Probability Score. Other families use the Continuous Rank Probability #' Score. The value returned is the \code{sum} of all scores within the out #' of sample forecast horizon #' #' @author Nicholas J Clark #' #' @details `plot_mvgam_fc` generates posterior predictions from an object of #' class \code{mvgam}, calculates posterior empirical quantiles and plots #' them against the observed data. If `realisations = FALSE`, the returned #' plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of #' increasingly darker shades of red) as well as the posterior median (as a #' dark red line). If `realisations = TRUE`, a set of `n_realisations` #' posterior draws are shown. This function produces an older style base #' \code{R} plot, as opposed to `plot.mvgam_forecast` #' #' `plot.mvgam_forecast` takes an object of class `mvgam_forecast`, in which #' forecasts have already been computed, and plots the resulting forecast #' distribution as a `ggplot` object. This function is therefore more #' versatile and is recommended over the older and clunkier #' `plot_mvgam_fc` version #' #' If \code{realisations = FALSE}, these posterior quantiles are plotted #' along with the true observed data that was used to train the model. #' Otherwise, a spaghetti plot is returned to show possible forecast paths. #' #' @return A base \code{R} graphics plot (for `plot_mvgam_fc`) or a `ggplot` #' object (for `plot.mvgam_forecast`) and an optional \code{list} containing #' the forecast distribution and the out of sample probabilistic forecast #' score #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam( #' n_series = 3, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Hindcasts on response scale #' hc <- hindcast(mod) #' str(hc) #' plot(hc, series = 1) #' plot(hc, series = 2) #' plot(hc, series = 3) #' #' # Forecasts on response scale #' fc <- forecast( #' mod, #' newdata = simdat$data_test #' ) #' str(fc) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' #' # Forecasts as expectations #' fc <- forecast( #' mod, #' newdata = simdat$data_test, #' type = 'expected' #' ) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' #' # Dynamic trend extrapolations #' fc <- forecast( #' mod, #' newdata = simdat$data_test, #' type = 'trend' #' ) #' plot(fc, series = 1) #' plot(fc, series = 2) #' plot(fc, series = 3) #' } #' #' @name plot_mvgam_forecasts NULL #' @rdname plot_mvgam_forecasts #' @export plot_mvgam_fc = function( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (sign(series) != 1) { stop('argument "series" must be a positive integer', call. = FALSE) } else { if (series %% 1 != 0) { stop('argument "series" must be a positive integer', call. = FALSE) } } if (series > NCOL(object$ytimes)) { stop( paste0( 'object only contains data / predictions for ', NCOL(object$ytimes), ' series' ), call. = FALSE ) } if (sign(n_realisations) != 1) { stop('argument "n_realisations" must be a positive integer', call. = FALSE) } else { if (n_realisations %% 1 != 0) { stop( 'argument "n_realisations" must be a positive integer', call. = FALSE ) } } if (return_score) { return_forecasts <- TRUE } if (missing(data_test) & missing("newdata")) { # Check if newdata already included in the model if (!is.null(object$test_data)) { data_test <- object$test_data } } if (!missing("newdata")) { data_test <- newdata # Ensure outcome is labelled 'y' when feeding data to the model for simplicity if (terms(formula(object$call))[[2]] != 'y') { data_test$y <- data_test[[terms(formula(object$call))[[2]]]] } } # Prediction indices for the particular series data_train <- object$obs_data ends <- seq( 0, dim(mcmc_chains(object$model_output, 'ypred'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] if (object$fit_engine == 'stan') { # For stan objects, ypred is stored as a vector in column-major order preds <- mcmc_chains(object$model_output, 'ypred')[, seq( series, dim(mcmc_chains(object$model_output, 'ypred'))[2], by = NCOL(object$ytimes) ), drop = FALSE ] } else { preds <- mcmc_chains(object$model_output, 'ypred')[, starts[series]:ends[series], drop = FALSE ] } # Add variables to data_test if missing s_name <- levels(data_train$series)[series] if (!missing(data_test)) { # Ensure outcome is labelled 'y' when feeding data to the model for simplicity if (terms(formula(object$call))[[2]] != 'y') { if (object$family %in% c('binomial', 'beta_binomial')) { resp_terms <- as.character(terms(formula(object$call))[[2]]) resp_terms <- resp_terms[-grepl('cbind', resp_terms)] trial_name <- resp_terms[2] data_test$y <- data_test[[resp_terms[1]]] if (!exists(trial_name, data_test)) { stop( paste0('Variable ', trial_name, ' not found in newdata'), call. = FALSE ) } } else { data_test$y <- data_test[[terms(formula(object$call))[[2]]]] } } if (!'y' %in% names(data_test)) { data_test$y <- rep(NA, NROW(data_test)) } if (inherits(data_test, 'list')) { if (!'time' %in% names(data_test)) { stop('data_test does not contain a "time" column') } if (!'series' %in% names(data_test)) { data_test$series <- factor('series1') } } else { if (!'time' %in% colnames(data_test)) { stop('data_test does not contain a "time" column') } if (!'series' %in% colnames(data_test)) { data_test$series <- factor('series1') } } # If the posterior predictions do not already cover the data_test period, the forecast needs to be # generated using the latent trend dynamics; note, this assumes that there is no gap between the training and # testing datasets if (inherits(data_test, 'list')) { all_obs <- c( data.frame( y = data_train$y, series = data_train$series, time = data_train$time ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), data.frame( y = data_test$y, series = data_test$series, time = data_test$time ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) } else { all_obs <- c( data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), data_test %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) } if (dim(preds)[2] != length(all_obs)) { s_name <- levels(object$obs_data$series)[series] if (attr(object$model_data, 'trend_model') == 'None') { if (class(object$obs_data)[1] == 'list') { series_obs <- which(data_test$series == s_name) series_test <- lapply(data_test, function(x) { if (is.matrix(x)) { matrix(x[series_obs, ], ncol = NCOL(x)) } else { x[series_obs] } }) } else { series_test = data_test %>% dplyr::filter(series == s_name) } fc_preds <- predict.mvgam( object, newdata = series_test, type = 'response', n_cores = n_cores ) } else { fc_preds <- forecast.mvgam( object, data_test = data_test, n_cores = n_cores )$forecasts[[series]] } preds <- cbind(preds, fc_preds) } } # Plot quantiles of the forecast distribution preds_last <- preds[1, ] probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) cred <- sapply( 1:NCOL(preds), function(n) quantile(preds[, n], probs = probs, na.rm = TRUE) ) c_light <- c("#DCBCBC") c_light_highlight <- c("#C79999") c_mid <- c("#B97C7C") c_mid_highlight <- c("#A25050") c_dark <- c("#8F2727") c_dark_highlight <- c("#7C0000") if (missing(ylim)) { ytrain <- data.frame( series = data_train$series, time = data_train$time, y = data_train$y ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) if (tolower(object$family) %in% c('beta', 'bernoulli')) { ylim <- c( min(cred, min(ytrain, na.rm = TRUE)), max(cred, max(ytrain, na.rm = TRUE)) ) ymin <- max(0, ylim[1]) ymax <- min(1, ylim[2]) ylim <- c(ymin, ymax) } else if (tolower(object$family) %in% c('lognormal', 'gamma')) { ylim <- c( min(cred, min(ytrain, na.rm = TRUE)), max(cred, max(ytrain, na.rm = TRUE)) ) ymin <- max(0, ylim[1]) ymax <- max(ylim) ylim <- c(ymin, ymax) } else { ylim <- c( min(cred, min(ytrain, na.rm = TRUE)), max(cred, max(ytrain, na.rm = TRUE)) ) } } if (missing(ylab)) { ylab <- paste0('Predicitons for ', levels(data_train$series)[series]) } if (missing(xlab)) { xlab <- 'Time' } pred_vals <- seq(1:length(preds_last)) if (hide_xlabels) { plot( 1, type = "n", bty = 'L', xlab = '', xaxt = 'n', ylab = ylab, xlim = c(0, length(preds_last)), ylim = ylim, ... ) } else { plot( 1, type = "n", bty = 'L', xlab = xlab, ylab = ylab, xaxt = 'n', xlim = c(0, length(preds_last)), ylim = ylim, ... ) if (!missing(data_test)) { axis( side = 1, at = floor(seq( 0, max(data_test$time) - (min(object$obs_data$time) - 1), length.out = 6 )), labels = floor(seq( min(object$obs_data$time), max(data_test$time), length.out = 6 )) ) } else { axis( side = 1, at = floor(seq( 0, max(object$obs_data$time) - (min(object$obs_data$time) - 1), length.out = 6 )), labels = floor(seq( min(object$obs_data$time), max(object$obs_data$time), length.out = 6 )) ) } } if (realisations) { for (i in 1:n_realisations) { lines(x = pred_vals, y = preds[i, ], col = 'white', lwd = 2.5) lines( x = pred_vals, y = preds[i, ], col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), lwd = 2.25 ) } } else { polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(pred_vals, cred[5, ], col = c_dark, lwd = 2.5) } box(bty = 'L', lwd = 2) if (!missing(data_test)) { if (class(data_train)[1] == 'list') { data_train <- data.frame( series = data_train$series, y = data_train$y, time = data_train$time ) data_test <- data.frame( series = data_test$series, y = data_test$y, time = data_test$time ) } last_train <- (NROW(data_train) / NCOL(object$ytimes)) # Show historical (hindcast) distribution in grey if (!realisations) { polygon( c( pred_vals[1:(NROW(data_train) / NCOL(object$ytimes))], rev(pred_vals[1:(NROW(data_train) / NCOL(object$ytimes))]) ), c( cred[1, 1:(NROW(data_train) / NCOL(object$ytimes))], rev(cred[9, 1:(NROW(data_train) / NCOL(object$ytimes))]) ), col = 'grey70', border = NA ) lines( pred_vals[1:(NROW(data_train) / NCOL(object$ytimes))], cred[5, 1:(NROW(data_train) / NCOL(object$ytimes))], col = 'grey70', lwd = 2.5 ) } # Plot training and testing points points( dplyr::bind_rows(data_train, data_test) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), pch = 16, col = "white", cex = 0.8 ) points( dplyr::bind_rows(data_train, data_test) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), pch = 16, col = "black", cex = 0.65 ) abline(v = last_train, col = '#FFFFFF60', lwd = 2.85) abline(v = last_train, col = 'black', lwd = 2.5, lty = 'dashed') # Calculate out of sample probabilistic score truth <- as.matrix( data_test %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) last_train <- length( data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) fc <- preds[, (last_train + 1):NCOL(preds)] if (all(is.na(truth))) { score <- NULL message( 'No non-missing values in data_test$y; cannot calculate forecast score' ) } else { if ( object$family %in% c( 'poisson', 'negative binomial', 'tweedie', 'binomial', 'beta_binomial' ) ) { if (max(fc, na.rm = TRUE) > 50000) { score <- sum( crps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE ) message(paste0('Out of sample CRPS:\n', score)) } else { score <- sum( drps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE ) message(paste0('Out of sample DRPS:\n', score)) } } else { score <- sum(crps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE) message(paste0('Out of sample CRPS:\n', score)) } } } else { if (class(data_train)[1] == 'list') { data_train <- data.frame( series = data_train$series, y = data_train$y, time = data_train$time ) } points( data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), pch = 16, col = "white", cex = 0.8 ) points( data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), pch = 16, col = "black", cex = 0.65 ) } if (return_forecasts) { if (return_score) { if (!missing(data_test)) { return(list( forecast = preds[, (last_train + 1):NCOL(preds)], score = score )) } else { return(list(forecast = preds, score = NULL)) } } else { if (!missing(data_test)) { return(preds[, (last_train + 1):NCOL(preds)]) } else { return(preds) } } } } #' @rdname plot_mvgam_forecasts #' #' @param x Object of class `mvgam_forecast` #' #' @method plot mvgam_forecast #' #' @export plot.mvgam_forecast = function( x, series = 1, realisations = FALSE, n_realisations = 15, xlab, ylab, ylim, ... ) { object <- x validate_pos_integer(series) validate_pos_integer(n_realisations) if (series > length(object$series_names)) { stop( paste0( 'object only contains data / predictions for ', length(object$series_names), ' series' ), call. = FALSE ) } s_name <- object$series_names[series] if (!s_name %in% names(object$hindcasts)) { stop( paste0('forecasts for ', s_name, ' have not yet been computed'), call. = FALSE ) } # Extract hindcast and forecast predictions type <- object$type preds <- cbind( object$hindcasts[[which(names(object$hindcasts) == s_name)]], object$forecasts[[which(names(object$forecasts) == s_name)]] ) # Plot quantiles of the forecast distribution probs <- c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) cred <- sapply( 1:NCOL(preds), function(n) quantile(preds[, n], probs = probs, na.rm = TRUE) ) if (type == 'trend') { if (missing(ylab)) { ylab <- paste0('Estimated trend for ', s_name) } } if (type == 'link') { if (missing(ylab)) { ylab <- paste0('Linear predictions for ', s_name) } } if (type == 'expected') { if (missing(ylab)) { ylab <- paste0('Expectations for ', s_name) } } if (type == 'detection') { if (missing(ylab)) { ylab <- paste0('Pr(detection) for ', s_name) } } if (type == 'latent_N') { if (missing(ylab)) { ylab <- paste0('Latent abundance for ', s_name) } } if (type == 'response') { if (missing(ylab)) { ylab <- paste0('Predictions for ', s_name) } } if (missing(xlab)) { xlab <- 'Time' } # Create a base plot using posterior credible intervals and observations # for the specified series plot_dat <- data.frame( time = c( object$train_times[[which(names(object$hindcasts) == s_name)]], object$test_times[[which(names(object$hindcasts) == s_name)]] ), med = cred[5, ], lower1 = cred[1, ], lower2 = cred[2, ], lower3 = cred[3, ], lower4 = cred[4, ], upper1 = cred[9, ], upper2 = cred[8, ], upper3 = cred[7, ], upper4 = cred[6, ], truth = c( object$train_observations[[s_name]], object$test_observations[[s_name]] ) ) base_plot <- ggplot2::ggplot( data = plot_dat, mapping = ggplot2::aes(x = time, y = truth) ) + ggplot2::theme_classic() + ggplot2::labs(x = xlab, y = ylab) # Add to the base plot accordingly if (realisations) { for (i in 1:n_realisations) { base_plot <- base_plot + ggplot2::geom_line( data = data.frame( y = preds[i, ], time = c( object$train_times[[which(names(object$hindcasts) == s_name)]], object$test_times[[which(names(object$hindcasts) == s_name)]] ) ), mapping = ggplot2::aes(x = time, y = y), col = "white", linewidth = 1 ) + ggplot2::geom_line( data = data.frame( y = preds[i, ], time = c( object$train_times[[which(names(object$hindcasts) == s_name)]], object$test_times[[which(names(object$hindcasts) == s_name)]] ) ), mapping = ggplot2::aes(x = time, y = y), col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), linewidth = 0.75 ) } } else { base_plot <- base_plot + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "#DCBCBC" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower2, ymax = upper2), fill = "#C79999" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower3, ymax = upper3), fill = "#B97C7C" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower4, ymax = upper4), fill = "#A25050" ) + ggplot2::geom_line( mapping = ggplot2::aes(x = time, y = med), col = "#8F2727", linewidth = 1 ) } # Show historical (hindcast) distribution in grey if this object # contains forecasts train_times <- object$train_times[[s_name]] last_train <- length(object$train_observations[[s_name]]) if (type == 'response' & !is.null(object$forecasts)) { if (!realisations) { base_plot <- base_plot + ggplot2::geom_line( data = data.frame( time = train_times, lower1 = cred[1, 1:last_train], upper1 = cred[9, 1:last_train], med = cred[5, 1:last_train], truth = 0 ), mapping = ggplot2::aes(x = time, y = med), col = "white", linewidth = 1 ) + ggplot2::geom_ribbon( data = data.frame( time = train_times, lower1 = cred[1, 1:last_train], upper1 = cred[9, 1:last_train], truth = 0 ), mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "grey70" ) } } if ( type == 'response' || c(type == 'expected' & object$family == 'bernoulli') ) { # Plot training and testing points base_plot <- base_plot + ggplot2::geom_point(pch = 21, col = 'white', fill = 'black') # Calculate out of sample probabilistic score; # need to ensure fc is a matrix, even if only a single # out of sample observation was forecasted (#111) if (!is.null(object$forecasts)) { fc <- as.matrix(object$forecasts[[s_name]]) } else { fc <- NULL } truth <- object$test_observations[[s_name]] if (all(is.na(truth))) { score <- NULL message(paste0( 'No non-missing values in test_observations; cannot calculate forecast score\n' )) } else { if ( object$family %in% c( 'poisson', 'negative binomial', 'tweedie', 'binomial', 'beta_binomial' ) ) { if (max(fc, na.rm = TRUE) > 50000) { score <- sum( crps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE ) message(paste0('Out of sample CRPS:\n', score)) } else { score <- sum( drps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE ) message(paste0('Out of sample DRPS:\n', score)) } } else if (object$family == 'bernoulli') { score <- sum(brier_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE) message(paste0('Out of sample Brier:\n', score)) } else { score <- sum(crps_mcmc_object(as.vector(truth), fc)[, 1], na.rm = TRUE) message(paste0('Out of sample CRPS:\n', score)) } } } if (!is.null(object$forecasts)) { base_plot <- base_plot + ggplot2::geom_vline( xintercept = max(train_times), linetype = 'dashed' ) } if (!missing(ylim)) { base_plot <- base_plot + ggplot2::scale_y_continuous(limits = ylim) } base_plot } ================================================ FILE: R/plot_mvgam_pterms.R ================================================ #' Plot parametric term partial effects for \pkg{mvgam} models #' #' This function plots posterior empirical quantiles for partial effects of #' parametric terms #' #' @importFrom graphics layout title rug bxp #' #' @importFrom stats coef predict #' #' @inheritParams plot.mvgam #' #' @param object \code{list} object of class \code{mvgam}. See [mvgam()] #' #' @details Posterior empirical quantiles of each parametric term's partial #' effect estimates (on the link scale) are calculated and visualised as #' ribbon plots. These effects can be interpreted as the partial effect that #' a parametric term contributes when all other terms in the model have been #' set to \code{0} #' #' @return A base \code{R} graphics plot #' #' @author Nicholas J Clark #' #' @export plot_mvgam_pterms = function(object, trend_effects = FALSE) { # General plotting colours and empirical quantile probabilities c_light <- c("#DCBCBC") c_light_trans <- c("#DCBCBC70") c_light_highlight <- c("#C79999") c_mid <- c("#B97C7C") c_mid_highlight <- c("#A25050") c_mid_highlight_trans <- c("#A2505095") c_dark <- c("#8F2727") c_dark_highlight <- c("#7C0000") probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } object2 <- object if (trend_effects) { if (is.null(object$trend_call)) { stop('no trend_formula exists so there no trend-level smooths to plot') } object2$mgcv_model <- object2$trend_mgcv_model } # Look for parametric terms in the model pterms <- attr(object2$mgcv_model$pterms, 'term.labels') if (length(pterms) > 0) { # Graphical parameters .pardefault <- par(no.readonly = T) on.exit(par(.pardefault)) if (length(pterms) == 1) { par(mfrow = c(1, 1), mar = c(4, 4.5, 3, 4)) } if (length(pterms) == 2) { par( mfrow = c(2, 1), mar = c(2.5, 2.3, 2, 2), oma = c(1, 1, 0, 0), mgp = c(1.5, 0.5, 0) ) } if (length(pterms) %in% c(3, 4)) { par( mfrow = c(2, 2), mar = c(2.5, 2.3, 2, 2), oma = c(1, 1, 0, 0), mgp = c(1.5, 0.5, 0) ) } for (i in 1:length(pterms)) { # Find out which beta corresponds to the associated parametric term betas_keep <- grepl( paste0('^(?=.*', pterms[i], ')(?!.*s\\()'), colnames(predict(object2$mgcv_model, type = 'lpmatrix')), perl = TRUE ) if (trend_effects) { betas <- mcmc_chains(object2$model_output, 'b_trend')[, betas_keep] } else { betas <- mcmc_chains(object2$model_output, 'b')[, betas_keep] } # Generate linear predictor matrix from fitted mgcv model Xp <- obs_Xp_matrix( newdata = object2$obs_data, mgcv_model = object2$mgcv_model ) # Zero out all other columns in Xp Xp[, !betas_keep] <- 0 # X-axis values if (inherits(object2$obs_data, 'list')) { pred_vals_orig <- sort(object2$obs_data[[pterms[i]]]) } else { pred_vals_orig <- sort( object2$obs_data %>% dplyr::pull(pterms[i]) ) } if (inherits(object2$obs_data[[pterms[i]]], 'factor')) { # Use a simple Boxplot for factor terms for now if (is.matrix(betas)) { beta_creds <- apply( betas, 2, function(x) { quantile( x, probs = c(0, 0.05, 0.5, 0.95, 1), na.rm = TRUE ) } ) } else { beta_creds <- matrix(quantile( betas, probs = c(0, 0.05, 0.5, 0.95, 1), na.rm = TRUE )) } colnames(beta_creds) <- substr( names(coef(object2$mgcv_model))[grepl( paste0('^(?=.*', pterms[i], ')(?!.*s\\()'), colnames(predict(object2$mgcv_model, type = 'lpmatrix')), perl = TRUE )], nchar(pterms[i]) + 1, 1000000L ) bp <- boxplot(beta_creds, range = 0, plot = FALSE) bxp( bp, whisklty = 0, staplelty = 0, boxfill = c_light, boxcol = c_light, medcol = c_dark, frame.plot = FALSE, ylab = paste0('Partial effect') ) if (is.matrix(betas)) { bp$stats <- apply( betas, 2, function(x) { quantile( x, probs = c(0, 0.3, 0.5, 0.7, 1), na.rm = TRUE ) } ) } else { bp$stats <- matrix(quantile( betas, probs = c(0, 0.3, 0.5, 0.7, 1), na.rm = TRUE )) } bxp( bp, whisklty = 0, staplelty = 0, add = TRUE, frame.plot = FALSE, boxcol = c_light_highlight, medcol = c_dark, boxfill = c_light_highlight ) if (is.matrix(betas)) { bp$stats <- apply( betas, 2, function(x) { quantile( x, probs = c(0, 0.2, 0.5, 0.8, 1), na.rm = TRUE ) } ) } else { bp$stats <- matrix(quantile( betas, probs = c(0, 0.2, 0.5, 0.8, 1), na.rm = TRUE )) } bxp( bp, whisklty = 0, staplelty = 0, add = TRUE, frame.plot = FALSE, boxcol = c_mid, medcol = c_dark, boxfill = c_mid ) if (is.matrix(betas)) { bp$stats <- apply( betas, 2, function(x) { quantile( x, probs = c(0, 0.4, 0.5, 0.6, 1), na.rm = TRUE ) } ) } else { bp$stats <- matrix(quantile( betas, probs = c(0, 0.4, 0.5, 0.6, 1), na.rm = TRUE )) } bxp( bp, whisklty = 0, staplelty = 0, add = TRUE, frame.plot = FALSE, boxcol = c_mid_highlight, medcol = c_dark, boxfill = c_mid_highlight ) box(bty = 'L', lwd = 2) title(pterms[i], adj = 0) } else { beta_creds <- quantile(betas, probs = probs, na.rm = TRUE) pred_vals <- seq( min(pred_vals_orig), max(pred_vals_orig), length.out = 500 ) cred <- as.matrix(beta_creds) %*% pred_vals # Plot plot( 1, type = "n", bty = 'L', xlab = pterms[i], ylab = paste0('Partial effect'), xlim = c(min(pred_vals), max(pred_vals)), ylim = c(min(cred), max(cred)) ) title(pterms[i], adj = 0) polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(pred_vals, cred[5, ], col = c_dark, lwd = 2.5) rug(pred_vals_orig, lwd = 1.75, ticksize = 0.025, col = c_mid_highlight) box(bty = 'L', lwd = 2) } } layout(1) } else { message('No parametric terms in model formula') } } ================================================ FILE: R/plot_mvgam_randomeffects.R ================================================ #' Plot random effect terms from \pkg{mvgam} models #' #' This function plots posterior empirical quantiles for random effect #' smooths (bs = re) #' #' @importFrom graphics layout title #' #' @inheritParams plot.mvgam #' #' @param object \code{list} object of class \code{mvgam}. See [mvgam()] #' #' @details Posterior empirical quantiles of random effect coefficient #' estimates (on the link scale) are calculated and visualised as ribbon #' plots. Labels for coefficients are taken from the levels of the original #' factor variable that was used to specify the smooth in the model's #' formula #' #' @return A base \code{R} graphics plot #' #' @author Nicholas J Clark #' #' @export #' plot_mvgam_randomeffects = function(object, trend_effects = FALSE) { # General plotting colours and empirical quantile probabilities c_light <- c("#DCBCBC") c_light_trans <- c("#DCBCBC70") c_light_highlight <- c("#C79999") c_mid <- c("#B97C7C") c_mid_highlight <- c("#A25050") c_mid_highlight_trans <- c("#A2505095") c_dark <- c("#8F2727") c_dark_highlight <- c("#7C0000") probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) object2 <- object if (trend_effects) { if (is.null(object$trend_call)) { stop('no trend_formula exists so there no trend-level smooths to plot') } object2$mgcv_model <- object2$trend_mgcv_model } # Labels of smooths in formula smooth_labs <- do.call( rbind, lapply(seq_along(object2$mgcv_model$smooth), function(x) { data.frame( label = object2$mgcv_model$smooth[[x]]$label, class = class(object2$mgcv_model$smooth[[x]])[1] ) }) ) # Check if any smooths were bs = "re"; if not, return a message if (any(smooth_labs$class == 'random.effect')) { re_smooths <- smooth_labs %>% dplyr::mutate(smooth_num = dplyr::row_number()) %>% dplyr::filter(class == 'random.effect') %>% dplyr::pull(label) if (trend_effects) { re_smooths <- gsub('series', 'trend', re_smooths, fixed = TRUE) } .pardefault <- par(no.readonly = T) on.exit(par(.pardefault)) if (length(re_smooths) == 1) { par(mfrow = c(1, 1)) } if (length(re_smooths) == 2) { par(mfrow = c(2, 1)) } if (length(re_smooths) %in% c(3, 4)) { par(mfrow = c(2, 2)) } for (i in 1:length(re_smooths)) { # Find out which betas correspond to the associated random effect estimates (smooth_labs %>% dplyr::mutate(smooth_num = dplyr::row_number()) %>% dplyr::filter(class == 'random.effect') %>% dplyr::pull(smooth_num))[i] -> smooth_number betas_keep <- object2$mgcv_model$smooth[[ smooth_number ]]$first.para:object2$mgcv_model$smooth[[smooth_number]]$last.para if (trend_effects) { betas <- mcmc_chains(object2$model_output, 'b_trend')[, betas_keep] } else { betas <- mcmc_chains(object2$model_output, 'b')[, betas_keep] } # Plot the random effect estimates beta_creds <- sapply( 1:NCOL(betas), function(n) quantile(betas[, n], probs = probs, na.rm = TRUE) ) N <- NCOL(betas) x <- 1:N idx <- rep(1:N, each = 2) repped_x <- rep(x, each = 2) x <- sapply( 1:length(idx), function(k) { if (k %% 2 == 0) { repped_x[k] + min(diff(x)) / 2 } else { repped_x[k] - min(diff(x)) / 2 } } ) plot( 1, type = "n", bty = 'L', ylab = 'Partial effect', xlab = '', xlim = range(x), xaxt = 'n', ylim = range(c(as.vector(beta_creds))) ) title(re_smooths[i], adj = 0) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = as.numeric(c(beta_creds[9, ])), ybottom = as.numeric(c(beta_creds[1, ])), col = c_light, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = as.numeric(c(beta_creds[8, ])), ybottom = as.numeric(c(beta_creds[2, ])), col = c_light_highlight, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = as.numeric(c(beta_creds[7, ])), ybottom = as.numeric(c(beta_creds[3, ])), col = c_mid, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = as.numeric(c(beta_creds[6, ])), ybottom = as.numeric(c(beta_creds[4, ])), col = c_mid_highlight, border = 'transparent' ) for (k in 1:(N)) { lines( x = c(x[seq(1, N * 2, by = 2)][k], x[seq(2, N * 2, by = 2)][k]), y = c(beta_creds[5, k], beta_creds[5, k]), col = c_dark, lwd = 2 ) } box(bty = 'L', lwd = 2) # Label x-axis with the factor variable levels factor_var_name <- tail( strsplit(gsub('\\)', '', gsub('s\\(', '', re_smooths[i])), ',')[[1]], 1 ) if (trend_effects & factor_var_name == 'trend') { # Just use trend labels axis(side = 1, at = 1:N, labels = paste0('trend_', 1:N)) } else { if (inherits(object2$obs_data, 'list')) { axis( side = 1, at = 1:N, labels = levels(object2$obs_data[[factor_var_name]]) ) } else { axis( side = 1, at = 1:N, labels = levels( object2$obs_data %>% dplyr::pull(factor_var_name) ) ) } } } layout(1) } else { message('No random effect smooths (bs = "re") in model formula') } } ================================================ FILE: R/plot_mvgam_resids.R ================================================ #' Residual diagnostics for a fitted \pkg{mvgam} object #' #' This function takes a fitted \code{mvgam} object and returns various #' residual diagnostic plots #' #' @importFrom graphics layout title #' #' @importFrom stats complete.cases qqnorm qqline acf pacf na.pass #' #' @importFrom mgcv bam #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param n_draws \code{integer} specifying the number of posterior residual #' draws to use for calculating uncertainty in the `"ACF"` and `"pACF"` frames. #' Default is `100` #' #' @param n_points \code{integer} specifying the maximum number of points to #' show in the "Resids vs Fitted" and "Normal Q-Q Plot" frames. Default is #' `1000` #' #' @author Nicholas J Clark #' #' @details A total of four ggplot plots are generated to examine posterior #' Dunn-Smyth residuals for the specified series. Plots include a residuals #' vs fitted values plot, a Q-Q plot, and two plots to check for any #' remaining temporal autocorrelation in the residuals. Note, all plots only #' report statistics from a sample of up to `100` posterior draws (to save #' computational time), so uncertainty in these relationships may not be #' adequately represented. #' #' @return A facetted `ggplot` object #' #' @author Nicholas J Clark and Matthijs Hollanders #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam( #' n_series = 3, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Plot Dunn Smyth residuals for some series #' plot_mvgam_resids(mod) #' plot_mvgam_resids(mod, series = 2) #' } #' #' @export plot_mvgam_resids = function( object, series = 1, n_draws = 100L, n_points = 1000L ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } validate_pos_integer(series) validate_pos_integer(n_draws) validate_pos_integer(n_points) if (series > NCOL(object$ytimes)) { stop( paste0( 'object only contains data / predictions for ', NCOL(object$ytimes), ' series' ), call. = FALSE ) } # Take a sample of posterior draws to compute autocorrelation statistics # This is because acf(posterior_median_residual) can induce spurious patterns # due to the randomness of DS residuals; # rather, we want median(acf(residual_i)), where i indexes all possible draws # But this is computationally expensive for some models so we compromise # by only taking a few draws n_total_draws <- NROW(object$resids[[series]]) n_samps <- min(n_draws, n_total_draws) hcs <- hindcast(object, type = 'expected')$hindcasts[[series]] resids <- object$resids[[series]] resid_df <- do.call( rbind, lapply(seq_len(n_samps), function(x) { data.frame(preds = hcs[x, ], resids = resids[x, ], .draw = x) %>% dplyr::filter(!is.na(resids)) }) ) # Plot predictions and residuals (but limit number of points to n_points to # speed up plotting) if (NROW(resid_df) > n_points) { resid_df <- resid_df[sample(1:NROW(resid_df), n_points, replace = FALSE), ] } fvr_plot <- ggplot2::ggplot(resid_df, ggplot2::aes(preds, resids)) + ggplot2::geom_point(shape = 16, col = 'white', size = 1.25, alpha = 0.4) + ggplot2::geom_point(shape = 16, col = 'black', size = 1, alpha = 0.4) + ggplot2::geom_smooth( method = "gam", formula = y ~ s(x, bs = "cs"), colour = "#7C000060", fill = "#7C000040" ) + ggplot2::labs( title = "Resids vs Fitted", x = "Fitted values", y = "DS residuals" ) + ggplot2::theme_bw() # Q-Q plot qq_plot <- ggplot2::ggplot(resid_df, ggplot2::aes(sample = resids)) + ggplot2::stat_qq_line(colour = "#8F2727", linewidth = 1) + ggplot2::stat_qq(shape = 16, col = 'white', size = 1.25, alpha = 0.4) + ggplot2::stat_qq(shape = 16, col = 'black', size = 1, alpha = 0.4) + ggplot2::labs( title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles" ) + ggplot2::theme_bw() # ACF plot acf_stats <- do.call( rbind, lapply(seq_len(n_samps), function(x) { acf_calc <- acf(resids[x, ], plot = FALSE, na.action = na.pass) data.frame( acf = acf_calc$acf[,, 1], lag = acf_calc$lag[, 1, 1], denom = sqrt(acf_calc$n.used) ) %>% dplyr::filter(lag > 0) }) ) %>% dplyr::group_by(lag) %>% dplyr::mutate( ylow = quantile(acf, probs = 0.05, na.rm = TRUE), yqlow = quantile(acf, probs = 0.2, na.rm = TRUE), ymidlow = quantile(acf, probs = 0.25, na.rm = TRUE), ymidhigh = quantile(acf, probs = 0.75, na.rm = TRUE), yqhigh = quantile(acf, probs = 0.8, na.rm = TRUE), yhigh = quantile(acf, probs = 0.95, na.rm = TRUE) ) %>% dplyr::select(-acf) %>% dplyr::distinct() acf_plot <- ggplot2::ggplot(acf_stats, ggplot2::aes(x = lag)) + ggplot2::geom_hline( yintercept = c(-1, 1) * qnorm((1 + 0.95) / 2) / acf_stats$denom[1], linetype = "dashed" ) + ggplot2::geom_hline(yintercept = 0, colour = "#7C0000", linewidth = 0.25) + ggplot2::geom_segment( colour = "#DCBCBC", linewidth = 1.5, ggplot2::aes(y = ylow, yend = yhigh) ) + ggplot2::geom_segment( colour = "#B97C7C", linewidth = 1.5, ggplot2::aes(y = yqlow, yend = yqhigh) ) + ggplot2::geom_segment( colour = "#7C0000", linewidth = 1.5, ggplot2::aes(y = ymidlow, yend = ymidhigh) ) + ggplot2::labs(title = "ACF", x = "Lag", y = "Autocorrelation") + ggplot2::theme_bw() # PACF plot pacf_stats <- do.call( rbind, lapply(seq_len(n_samps), function(x) { acf_calc <- pacf(resids[x, ], plot = FALSE, na.action = na.pass) data.frame( pacf = acf_calc$acf[,, 1], lag = acf_calc$lag[, 1, 1], denom = sqrt(acf_calc$n.used) ) %>% dplyr::filter(lag > 0) }) ) %>% dplyr::group_by(lag) %>% dplyr::mutate( ylow = quantile(pacf, probs = 0.05, na.rm = TRUE), yqlow = quantile(pacf, probs = 0.2, na.rm = TRUE), ymidlow = quantile(pacf, probs = 0.25, na.rm = TRUE), ymidhigh = quantile(pacf, probs = 0.75, na.rm = TRUE), yqhigh = quantile(pacf, probs = 0.8, na.rm = TRUE), yhigh = quantile(pacf, probs = 0.95, na.rm = TRUE) ) %>% dplyr::select(-pacf) %>% dplyr::distinct() pacf_plot <- ggplot2::ggplot(pacf_stats, ggplot2::aes(x = lag)) + ggplot2::geom_hline( yintercept = c(-1, 1) * qnorm((1 + 0.95) / 2) / pacf_stats$denom[1], linetype = "dashed" ) + ggplot2::geom_hline(yintercept = 0, colour = "#7C0000", linewidth = 0.25) + ggplot2::geom_segment( colour = "#DCBCBC", linewidth = 1.5, ggplot2::aes(y = ylow, yend = yhigh) ) + ggplot2::geom_segment( colour = "#B97C7C", linewidth = 1.5, ggplot2::aes(y = yqlow, yend = yqhigh) ) + ggplot2::geom_segment( colour = "#7C0000", linewidth = 1.5, ggplot2::aes(y = ymidlow, yend = ymidhigh) ) + ggplot2::labs(title = "pACF", x = "Lag", y = "Partial autocorrelation") + ggplot2::theme_bw() # return patchwork::wrap_plots( fvr_plot, qq_plot, acf_plot, pacf_plot, ncol = 2, nrow = 2, byrow = TRUE ) } ================================================ FILE: R/plot_mvgam_series.R ================================================ #' Plot observed time series used for \pkg{mvgam} modelling #' #' This function takes either a fitted \code{mvgam} object or a #' \code{data.frame} object and produces plots of observed time series, ACF, #' CDF and histograms for exploratory data analysis #' #' @importFrom stats lag #' #' @param object Optional \code{list} object returned from \code{mvgam}. Either #' \code{object} or \code{data} must be supplied #' #' @param data Optional \code{data.frame} or \code{list} of training data #' containing at least 'series' and 'time'. Use this argument if training #' data have been gathered in the correct format for \code{mvgam} modelling #' but no model has yet been fitted. #' #' @param newdata Optional \code{data.frame} or \code{list} of test data #' containing at least 'series' and 'time' for the forecast horizon, in #' addition to any other variables included in the linear predictor of #' \code{formula}. If included, the observed values in the test data are #' compared to the model's forecast distribution for exploring biases in #' model predictions #' #' @param y Character. What is the name of the outcome variable in the supplied #' data? Defaults to \code{'y'} #' #' @param lines Logical. If \code{TRUE}, line plots are used for visualizing #' time series. If \code{FALSE}, points are used. #' #' @param series Either an \code{integer} specifying which series in the set is #' to be plotted or the string 'all', which plots all series available in the #' supplied data #' #' @param n_bins \code{integer} specifying the number of bins to use for #' binning observed values when plotting a histogram. Default is to use the #' number of bins returned by a call to `hist` in base `R` #' #' @param log_scale \code{logical}. If \code{series == 'all'}, this flag is #' used to control whether the time series plot is shown on the log scale #' (using `log(Y + 1)`). This can be useful when visualizing many series that #' may have different observed ranges. Default is \code{FALSE} #' #' @author Nicholas J Clark and Matthijs Hollanders #' #' @return A set of ggplot objects. If \code{series} is an integer, the plots #' will show observed time series, autocorrelation and cumulative #' distribution functions, and a histogram for the series. If #' \code{series == 'all'}, a set of observed time series plots is returned in #' which all series are shown on each plot but only a single focal series is #' highlighted, with all remaining series shown as faint gray lines. #' #' @examples #' # Simulate and plot series with observations bounded at 0 and 1 (Beta responses) #' sim_data <- sim_mvgam( #' family = betar(), #' trend_model = RW(), #' prop_trend = 0.6 #' ) #' #' plot_mvgam_series( #' data = sim_data$data_train, #' series = 'all' #' ) #' #' plot_mvgam_series( #' data = sim_data$data_train, #' newdata = sim_data$data_test, #' series = 1 #' ) #' #' # Now simulate series with overdispersed discrete observations #' sim_data <- sim_mvgam( #' family = nb(), #' trend_model = RW(), #' prop_trend = 0.6, #' phi = 10 #' ) #' #' plot_mvgam_series( #' data = sim_data$data_train, #' series = 'all' #' ) #' #' @export plot_mvgam_series <- function( object, data, newdata, y = 'y', lines = TRUE, series = 1, n_bins = NULL, log_scale = FALSE ) { # Validate series if (is.character(series)) { if (series != 'all') { stop( 'argument "series" must be either a positive integer or "all"', call. = FALSE ) } } else { if (sign(series) != 1) { stop( 'argument "series" must be either a positive integer or "all"', call. = FALSE ) } else { if (series %% 1 != 0) { stop( 'argument "series" must be either a positive integer or "all"', call. = FALSE ) } } } # Extract training data if (!missing(object)) { if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (!missing("data")) { warning('both "object" and "data" were supplied; only using "object"') } data_train <- object$obs_data # What is the response variable? resp_terms <- as.character(terms(formula(object$call))[[2]]) if (length(resp_terms) == 1) { y <- as.character(terms(object$call)[[2]]) } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] y <- resp_terms[1] } } } else { data_train <- data } # Validate data data_train <- validate_plot_data(data_train, y) if (!missing(newdata)) { data_test <- validate_plot_data(newdata, y) } # Determine what to plot if (is.character(series) && series == 'all') { # Only return a plot of the time series dat <- dplyr::as_tibble(data_train) %>% dplyr::distinct(time, y, series) # Create time series plot plot_ts <- plot_time_series(dat, lines, log_scale, y, series) # Return return(plot_ts) } else { # Return multiple plots for one time series s_name <- levels(data_train$series)[series] # Bind test data if supplied dat <- dplyr::as_tibble(data_train) %>% dplyr::filter(series == s_name) %>% dplyr::distinct(time, y) %>% dplyr::mutate(data = "train") if (!missing(newdata)) { dat <- dplyr::bind_rows( dat, dplyr::as_tibble(data_test) %>% dplyr::filter(series == s_name) %>% dplyr::distinct(time, y) %>% dplyr::mutate(data = "validate") ) } # Create each plot component plot_ts <- plot_time_series(dat, lines, log_scale, y, series) plot_hist <- plot_histogram(dat, y, n_bins) plot_acf_obj <- plot_acf(dat) plot_ecdf_obj <- plot_ecdf(dat, y) # Wrap plots using patchwork return( patchwork::wrap_plots( plot_ts, plot_hist, plot_acf_obj, plot_ecdf_obj, ncol = 2, nrow = 2, byrow = TRUE ) ) } } #' Helper function to validate and format input plotting data #' @noRd validate_plot_data <- function(data, y) { # Check if data is not a list if (!inherits(data, 'list')) { # If 'series' column is missing, create a default factor if (!'series' %in% colnames(data)) { data$series <- factor('series1') } # If 'time' column is missing, stop with error if (!'time' %in% colnames(data)) { stop('data does not contain a "time" column', call. = FALSE) } } else { # If data is a list, check for 'series' and 'time' in names if (!'series' %in% names(data)) { data$series <- factor('series1') } if (!'time' %in% names(data)) { stop('data does not contain a "time" column') } } # Check if the outcome variable 'y' exists in data if (!y %in% names(data)) { stop(paste0('variable "', y, '" not found in data'), call. = FALSE) } else { # Assign the outcome variable to a standard column 'y' data$y <- data[[y]] } # Drop unused factor levels in 'series' data$series <- droplevels(data$series) # Return the validated and formatted data return(data) } #' Function to generate time series plots #' @noRd plot_time_series <- function( dat, lines = TRUE, log_scale = FALSE, ylab = 'y', series = 'all' ) { # Determine scale and y label if (log_scale) { dat$y <- log(dat$y + 1) ylab <- paste0('log(', ylab, ' + 1)') } # Create time series plot if (series == 'all') { p <- ggplot2::ggplot(dat, ggplot2::aes(time, y)) + ggplot2::facet_wrap(~series) + ggplot2::labs(x = "Time", y = ylab) + ggplot2::theme_bw() if (lines) { p <- p + ggplot2::geom_line(colour = "#8F2727", linewidth = 0.75) } else { p <- p + ggplot2::geom_point(colour = "#8F2727") } } else { p <- ggplot2::ggplot(dat, ggplot2::aes(time, y, colour = data)) + ggplot2::labs(title = "Time series", x = "Time", y = ylab) + ggplot2::geom_vline( xintercept = dat %>% dplyr::filter(data == "validate") %>% dplyr::pull(time) %>% min(c(., Inf)), linetype = "dashed", colour = "black" ) + ggplot2::scale_colour_manual(values = c("#8F2727", "black")) + ggplot2::theme_bw() if (lines) { p <- p + ggplot2::geom_line(show.legend = F, linewidth = 0.75) } else { p <- p + ggplot2::geom_point(show.legend = F) } } return(p) } #' Function to create histogram of observed values #' @noRd plot_histogram <- function(dat, ylab = 'y', n_bins = NULL) { # Determine bins if (is.null(n_bins)) { n_bins <- max(c(length(hist(c(dat$y), plot = F)$breaks), 20)) } # Plot the histogram ggplot2::ggplot(dat, ggplot2::aes(y)) + ggplot2::geom_histogram(bins = n_bins, fill = "#8F2727", col = 'white') + ggplot2::labs(title = "Histogram", x = ylab, y = "Count") + ggplot2::theme_bw() } #' Function to compute and plot autocorrelation #' @noRd plot_acf <- function(dat) { # Compute empirical ACF acf_y <- acf(dat$y, plot = F, na.action = na.pass) # Plot data.frame(acf = acf_y$acf[,, 1], lag = acf_y$lag[, 1, 1]) %>% ggplot2::ggplot(ggplot2::aes(x = lag, y = 0, yend = acf)) + ggplot2::geom_hline( yintercept = c(-1, 1) * qnorm((1 + 0.95) / 2) / sqrt(acf_y$n.used), linetype = "dashed" ) + ggplot2::geom_hline( yintercept = 0, colour = "#8F2727", linewidth = 0.25 ) + ggplot2::geom_segment(colour = "#8F2727", linewidth = 1) + ggplot2::labs(title = "ACF", x = "Lag", y = "Autocorrelation") + ggplot2::theme_bw() } #' Function to generate empirical cumulative distribution #' @noRd plot_ecdf <- function(dat, ylab = 'y') { # Compute empriical ECDF range_y <- range(dat$y, na.rm = T) data.frame(x = seq(range_y[1], range_y[2], length.out = 100)) %>% dplyr::mutate(y = ecdf(dat$y)(x)) %>% # Plot ggplot2::ggplot(ggplot2::aes(x, y)) + ggplot2::geom_line(colour = "#8F2727", linewidth = 0.75) + ggplot2::scale_y_continuous(limits = c(0, 1)) + ggplot2::labs( title = "CDF", x = ylab, y = "Empirical CDF" ) + ggplot2::theme_bw() } ================================================ FILE: R/plot_mvgam_smooth.R ================================================ #' Plot smooth terms from \pkg{mvgam} models #' #' This function plots posterior empirical quantiles for a series-specific #' smooth term #' #' @importFrom grDevices hcl.colors #' #' @importFrom stats quantile predict #' #' @inheritParams plot.mvgam #' #' @param object \code{list} object of class \code{mvgam}. See [mvgam()] #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param smooth Either a \code{character} or \code{integer} specifying which #' smooth term to be plotted #' #' @param residuals \code{logical}. If \code{TRUE}, posterior quantiles of #' partial residuals are added to plots of 1-D smooths as a series of ribbon #' rectangles. Partial residuals for a smooth term are the median Dunn-Smyth #' residuals that would be obtained by dropping the term concerned from the #' model, while leaving all other estimates fixed (i.e. the estimates for the #' term plus the original median Dunn-Smyth residuals). Note that because #' \code{mvgam} works with Dunn-Smyth residuals and not working residuals, #' which are used by \code{mgcv}, the magnitudes of partial residuals will be #' different to what you would expect from \code{\link[mgcv]{plot.gam}}. #' Interpretation is similar though, as these partial residuals should be #' evenly scattered around the smooth function if the function is well #' estimated #' #' @param n_resid_bins \code{integer} specifying the number of bins to group #' the covariate into when plotting partial residuals. Setting this argument #' too high can make for messy plots that are difficult to interpret, while #' setting it too low will likely mask some potentially useful patterns in #' the partial residuals. Default is \code{25} #' #' @param derivatives \code{logical}. If \code{TRUE}, an additional plot will #' be returned to show the estimated 1st derivative for the specified smooth #' (Note: this only works for univariate smooths) #' #' @param realisations \code{logical}. If \code{TRUE}, posterior realisations #' are shown as a spaghetti plot, making it easier to visualise the diversity #' of possible functions. If \code{FALSE}, the default, empirical quantiles #' of the posterior distribution are shown #' #' @param n_realisations \code{integer} specifying the number of posterior #' realisations to plot, if \code{realisations = TRUE}. Ignored otherwise #' #' @param newdata Optional \code{dataframe} for predicting the smooth, #' containing at least 'series' in addition to any other variables included #' in the linear predictor of the original model's \code{formula}. Note that #' this currently is only supported for plotting univariate smooths #' #' @details Smooth functions are shown as empirical quantiles (or spaghetti #' plots) of posterior partial expectations across a sequence of values #' between the variable's \code{min} and \code{max}, while zeroing out #' effects of all other variables. At present, only univariate and bivariate #' smooth plots are allowed, though note that bivariate smooths rely on #' default behaviour from \code{\link[mgcv]{plot.gam}}. `plot_mvgam_smooth` #' generates posterior predictions from an object of class \code{mvgam}, #' calculates posterior empirical quantiles and plots them. If #' `realisations = FALSE`, the returned plot shows 90, 60, 40 and 20 percent #' posterior quantiles (as ribbons of increasingly darker shades of red) as #' well as the posterior median (as a dark red line). If #' `realisations = TRUE`, a set of `n_realisations` posterior draws are #' shown. For more nuanced visualisation, supply \code{newdata} just as you #' would when predicting from a \code{\link[mgcv]{gam}} model or use the more #' flexible \code{\link{conditional_effects.mvgam}}. Alternatively, if you #' prefer to use partial effect plots in the style of `gratia`, and if you #' have the `gratia` package installed, you can use `draw.mvgam`. See #' \code{\link{gratia_mvgam_enhancements}} for details. #' #' @return A base \code{R} graphics plot #' #' @seealso \code{\link[mgcv]{plot.gam}}, #' \code{\link{conditional_effects.mvgam}}, #' \code{\link{gratia_mvgam_enhancements}} #' #' @author Nicholas J Clark #' #' @export plot_mvgam_smooth = function( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } object2 <- object if (trend_effects) { if (is.null(object$trend_call)) { stop( 'no trend_formula exists so there are no trend-level smooths to plot' ) } residuals <- FALSE object2$mgcv_model <- object2$trend_mgcv_model } if (sign(series) != 1) { stop('argument "series" must be a positive integer', call. = FALSE) } else { if (series %% 1 != 0) { stop('argument "series" must be a positive integer', call. = FALSE) } } if (series > NCOL(object2$ytimes)) { stop( paste0( 'object only contains data / predictions for ', NCOL(object2$ytimes), ' series' ), call. = FALSE ) } if (sign(n_resid_bins) != 1) { stop('argument "n_resid_bins" must be a positive integer', call. = FALSE) } else { if (n_resid_bins %% 1 != 0) { stop('argument "n_resid_bins" must be a positive integer', call. = FALSE) } } if (missing(smooth)) { smooth <- 1 } # Get smooth term names s_name <- levels(object2$obs_data$series)[series] data_train <- object2$obs_data smooth_terms <- unlist(purrr::map(object2$mgcv_model$smooth, 'label')) if (is.character(smooth)) { if (!grepl('\\(', smooth)) { smooth <- paste0('s(', smooth, ')') } if (!smooth %in% smooth_terms) { stop( smooth, ' not found in smooth terms of object2\nAppropriate names are: ', paste(smooth_terms, collapse = ', ') ) } smooth_int <- which(smooth_terms == smooth) } else { smooth_int <- smooth } # Check whether this type of smooth is even plottable if (!object2$mgcv_model$smooth[[smooth_int]]$plot.me) { stop( paste0( 'unable to plot ', object2$mgcv_model$smooth[[smooth_int]]$label, ' (class = ', attr(object2$mgcv_model$smooth[[smooth_int]], 'class')[1] ), ')' ) } if (is.numeric(smooth)) { if (!smooth %in% seq_along(smooth_terms)) { stop(smooth, ' not found in smooth terms of object') } smooth_int <- smooth smooth <- smooth_terms[smooth] } if (length(unlist(strsplit(smooth, ','))) > 3) { stop('mvgam cannot yet plot smooths of more than 3 dimensions') } # Check that this is not a random effect smooth smooth_labs <- do.call( rbind, lapply(seq_along(object2$mgcv_model$smooth), function(x) { data.frame( label = object2$mgcv_model$smooth[[x]]$label, class = class(object2$mgcv_model$smooth[[x]])[1] ) }) ) if (smooth_labs$class[smooth_int] == 'random.effect') { message('use function "plot_mvgam_randomeffects" to plot "re" bases') return(invisible) } # Be sure that parametric and by variables are included in newdata smooth_terms <- unique(trimws(strsplit( gsub('\\+', ',', as.character(object2$mgcv_model$pred.formula)[2]), ',' )[[1L]])) # Remove comma separated names as these won't match the column names in data smooth_terms[!grepl(',', smooth_terms)] -> smooth_terms # Change smooth name to the covariate that needs a sequence of prediction values smooth <- all.vars(parse(text = object2$mgcv_model$smooth[[smooth_int]]$term)) # Predictions and plots for multi-dimensional smooths if (length(unlist(strsplit(smooth, ','))) >= 2L) { # Use default mgcv plotting for bivariate smooths as it is quicker object2$mgcv_model <- relabel_gps(object2$mgcv_model) if ( inherits(object2$mgcv_model$smooth[[smooth_int]], 'tprs.smooth') | inherits(object2$mgcv_model$smooth[[smooth_int]], 't2smooth') | inherits(object2$mgcv_model$smooth[[smooth_int]], 'tensor.smooth') ) { suppressWarnings(plot( object2$mgcv_model, select = smooth_int, residuals = residuals, scheme = 2, main = '', too.far = 0, contour.col = 'black', hcolors = hcl.colors(25, palette = 'Reds 2'), lwd = 1, seWithMean = TRUE )) box(col = 'white') box(bty = 'l', lwd = 2) } else { suppressWarnings(plot( object2$mgcv_model, select = smooth_int, residuals = residuals, scheme = 2, main = '', too.far = 0, contour.col = 'black', hcolors = hcl.colors(25, palette = 'Reds 2'), lwd = 1, seWithMean = TRUE, ylab = 'Partial effect' )) box(col = 'white') box(bty = 'l', lwd = 2) } if (trend_effects) { title( sub( 'series', 'trend', object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ), adj = 0 ) } else { title(object2$mgcv_model$smooth[[smooth_int]]$label, adj = 0) } } else { # Use posterior predictions to generate univariate smooth plots if (missing(newdata) && !inherits(data_train, 'list')) { data_train %>% dplyr::select(c(series, smooth_terms)) %>% dplyr::filter(series == s_name) %>% dplyr::mutate(series = s_name) -> pred_dat # Use a larger sample size when estimating derivatives so they can be better approximated if (derivatives) { pred_dat %>% dplyr::select(-smooth) %>% dplyr::distinct() %>% dplyr::slice_head(n = 1) %>% dplyr::slice(rep(1:dplyr::n(), each = 1000)) %>% dplyr::mutate( smooth.var = seq( min(pred_dat[, smooth]), max(pred_dat[, smooth]), length.out = 1000 ) ) -> pred_dat } else { pred_dat %>% dplyr::select(-smooth) %>% dplyr::distinct() %>% dplyr::slice_head(n = 1) %>% dplyr::slice(rep(1:dplyr::n(), each = 500)) %>% dplyr::mutate( smooth.var = seq( min(pred_dat[, smooth]), max(pred_dat[, smooth]), length.out = 500 ) ) -> pred_dat } colnames(pred_dat) <- gsub('smooth.var', smooth, colnames(pred_dat)) } else if (missing(newdata) && inherits(object2$obs_data, 'list')) { # Make fake data by zeroing all other terms apart from the selected smooth # and the series indicator pred_dat <- vector(mode = 'list') for (x in 1:length(data_train)) { if (is.matrix(data_train[[x]])) { pred_dat[[x]] <- matrix(0, nrow = 500, ncol = NCOL(data_train[[x]])) } else { pred_dat[[x]] <- rep(0, 500) } } names(pred_dat) <- names(object2$obs_data) pred_dat$series <- rep((levels(data_train$series)[series]), 500) if (!is.matrix(pred_dat[[smooth]])) { pred_dat[[smooth]] <- seq( min(data_train[[smooth]]), max(data_train[[smooth]]), length.out = 500 ) } else { pred_dat[[smooth]] <- matrix( seq( min(data_train[[smooth]]), max(data_train[[smooth]]), length.out = length(pred_dat[[smooth]]) ), nrow = nrow(pred_dat[[smooth]]), ncol = ncol(pred_dat[[smooth]]) ) } if ('lag' %in% names(pred_dat)) { pred_dat[['lag']] <- matrix( 0:(NCOL(data_train$lag) - 1), nrow(pred_dat$lag), NCOL(data_train$lag), byrow = TRUE ) } } else { pred_dat <- newdata # Add series factor variable if missing if (class(pred_dat)[1] != 'list') { if (!'series' %in% colnames(pred_dat)) { pred_dat$series <- factor('series1') } } if (class(pred_dat)[1] == 'list') { if (!'series' %in% names(pred_dat)) { pred_dat$series <- factor('series1') } } } # Generate linear predictor matrix from fitted mgcv model if (trend_effects) { Xp <- trend_Xp_matrix( newdata = pred_dat, trend_map = object2$trend_map, mgcv_model = object2$trend_mgcv_model ) } else { Xp <- obs_Xp_matrix(newdata = pred_dat, mgcv_model = object2$mgcv_model) } # Zero out all other columns in Xp keeps <- object2$mgcv_model$smooth[[ smooth_int ]]$first.para:object2$mgcv_model$smooth[[smooth_int]]$last.para Xp[, !seq_len(length.out = NCOL(Xp)) %in% keeps] <- 0 # Prediction x-axis values if (class(pred_dat)[1] == 'list') { if (is.matrix(pred_dat[[smooth]])) { pred_vals <- as.vector(as.matrix(pred_dat[[smooth]][, 1])) } else { pred_vals <- as.vector(as.matrix(pred_dat[[smooth]])) } } else { pred_vals <- as.vector(as.matrix(pred_dat[, smooth])) } # If this term has a by variable, need to use mgcv's plotting utilities if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { # Check if this is a gp() term gp_term <- FALSE if (!is.null(attr(object2$mgcv_model, 'gp_att_table'))) { gp_term <- object2$mgcv_model$smooth[[smooth_int]]$gp_term } if (gp_term) { object2$mgcv_model$smooth[[smooth_int]]$label <- gsub( 's\\(|ti\\(', 'gp(', object2$mgcv_model$smooth[[smooth_int]]$label ) # Check if this is a factor by variable is_fac <- is.factor(object2$obs_data[[ object2$mgcv_model$smooth[[smooth_int]]$by ]]) if (is_fac) { fac_levels <- levels(object2$obs_data[[ object2$mgcv_model$smooth[[smooth_int]]$by ]]) whichlevel <- vector() for (i in seq_along(fac_levels)) { whichlevel[i] <- grepl( fac_levels[i], object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ) } pred_dat[[object2$mgcv_model$smooth[[smooth_int]]$by]] <- rep(fac_levels[whichlevel], length(pred_dat$series)) } if (!is_fac) { pred_dat[[object2$mgcv_model$smooth[[smooth_int]]$by]] <- rep(1, length(pred_dat$series)) } if (trend_effects) { Xp_term <- trend_Xp_matrix( newdata = pred_dat, trend_map = object2$trend_map, mgcv_model = object2$trend_mgcv_model ) } else { Xp_term <- obs_Xp_matrix( newdata = pred_dat, mgcv_model = object2$mgcv_model ) } Xp[, object2$mgcv_model$smooth[[ smooth_int ]]$first.para:object2$mgcv_model$smooth[[smooth_int]]$last.para ] <- Xp_term[, object2$mgcv_model$smooth[[ smooth_int ]]$first.para:object2$mgcv_model$smooth[[smooth_int]]$last.para ] } else { # Deal with by variables in non-gp() smooths by <- rep(1, length(pred_vals)) dat <- data.frame(x = pred_vals, by = by) names(dat) <- c( object2$mgcv_model$smooth[[smooth_int]]$term, object2$mgcv_model$smooth[[smooth_int]]$by ) Xp_term <- mgcv::PredictMat( object2$mgcv_model$smooth[[smooth_int]], dat ) Xp[, object2$mgcv_model$smooth[[ smooth_int ]]$first.para:object2$mgcv_model$smooth[[smooth_int]]$last.para ] <- Xp_term } } # Extract GAM coefficients if (trend_effects) { betas <- mcmc_chains(object2$model_output, 'b_trend') } else { betas <- mcmc_chains(object2$model_output, 'b') } # Calculate posterior marginal predictions preds <- matrix(NA, nrow = NROW(betas), ncol = NROW(Xp)) for (i in 1:NROW(betas)) { preds[i, ] <- (Xp %*% betas[i, ]) } if (residuals) { # Need to predict from a reduced set that zeroes out all terms apart from the # smooth of interest if (trend_effects) { Xp2 <- trend_Xp_matrix( newdata = object2$obs_data, trend_map = object2$trend_map, mgcv_model = object2$trend_mgcv_model ) } else { Xp2 <- obs_Xp_matrix( newdata = object2$obs_data, mgcv_model = object2$mgcv_model ) } if (!missing(newdata)) { stop('Partial residual plots not available when using newdata') } if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { by <- rep(1, length(object2$obs_data$series)) dat <- data.frame( x = object2$obs_data[[object2$mgcv_model$smooth[[smooth_int]]$term]], by = by ) names(dat) <- c( object2$mgcv_model$smooth[[smooth_int]]$term, object2$mgcv_model$smooth[[smooth_int]]$by ) Xp_term <- mgcv::PredictMat( object2$mgcv_model$smooth[[smooth_int]], dat ) Xp2[, object2$mgcv_model$smooth[[ smooth_int ]]$first.para:object2$mgcv_model$smooth[[smooth_int]]$last.para ] <- Xp_term } # Find index for the end of training for this series and keep only those training # observations for the particular series if (class(pred_dat)[1] == 'list') { end_train <- length(which( object2$obs_data[['series']] == (levels(data_train$series)[series]) )) } else { end_train <- object2$obs_data %>% dplyr::filter(series == s_name) %>% NROW() } Xp2 <- Xp2[object2$ytimes[, series][1:end_train], ] # # Zero out all other columns in Xp2 Xp2[, !grepl(paste0('(', smooth, ')'), colnames(Xp), fixed = T)] <- 0 # Calculate residuals from full prediction set all_resids <- object2$resids[[series]][, 1:end_train] partial_resids <- matrix(NA, nrow = nrow(betas), ncol = NCOL(all_resids)) for (i in 1:NROW(betas)) { partial_resids[i, ] <- (Xp2 %*% betas[i, ]) + all_resids[i, ] } } # Plot quantiles of the smooth function, along with observed values # if specified probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) cred <- sapply( 1:NCOL(preds), function(n) quantile(preds[, n], probs = probs, na.rm = TRUE) ) c_light <- c("#DCBCBC") c_light_highlight <- c("#C79999") c_mid <- c("#B97C7C") c_mid_highlight <- c("#A25050") c_dark <- c("#8F2727") c_dark_highlight <- c("#7C0000") if (derivatives) { .pardefault <- par(no.readonly = T) on.exit(par(.pardefault)) par(mfrow = c(2, 1)) if (residuals) { plot( 1, type = "n", bty = 'L', xlab = smooth, ylab = 'Partial effect', xlim = c(min(pred_vals), max(pred_vals)), ylim = c( min(min(partial_resids, min(cred) - 0.4 * sd(preds), na.rm = T)), max(max(partial_resids, max(cred) + 0.4 * sd(preds), na.rm = T)) ) ) if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { if (trend_effects) { title( sub( 'series', 'trend', object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ), adj = 0 ) } else { title(object2$mgcv_model$smooth[[smooth_int]]$label, adj = 0) } } else { if (trend_effects) { title(paste0('s(', smooth, ')'), adj = 0) } else { title(paste0('s(', smooth, ')'), adj = 0) } } } else { plot( 1, type = "n", bty = 'L', xlab = smooth, ylab = 'Partial effect', xlim = c(min(pred_vals), max(pred_vals)), ylim = c(min(cred) - 0.9 * sd(preds), max(cred) + 0.9 * sd(preds)) ) if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { if (trend_effects) { title( sub( 'series', 'trend', object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ), adj = 0 ) } else { title(object2$mgcv_model$smooth[[smooth_int]]$label, adj = 0) } } else { if (trend_effects) { title(paste0('s(', smooth, ')'), adj = 0) } else { title(paste0('s(', smooth, ')'), adj = 0) } } } if (realisations) { for (i in 1:n_realisations) { index <- sample(1:NROW(preds), 1, replace = TRUE) lines(x = pred_vals, y = preds[index, ], col = 'white', lwd = 2.5) lines( x = pred_vals, y = preds[index, ], col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), lwd = 2.25 ) } } else { if (residuals) { # Get x-axis values and bin if necessary to prevent overplotting sorted_x <- sort(unique(round(object2$obs_data[[smooth]], 6))) s_name <- levels(object2$obs_data$series)[series] obs_x <- round( data.frame( series = object2$obs_data$series, smooth_vals = object2$obs_data[[smooth]] ) %>% dplyr::filter(series == s_name) %>% dplyr::pull(smooth_vals), 6 ) if (length(sorted_x) > n_resid_bins) { sorted_x <- seq( min(sorted_x), max(sorted_x), length.out = n_resid_bins ) resid_probs <- do.call( rbind, lapply(2:n_resid_bins, function(i) { quantile( as.vector(partial_resids[, which( obs_x <= sorted_x[i] & obs_x > sorted_x[i - 1] )]), probs = probs, na.rm = TRUE ) }) ) resid_probs <- rbind( quantile( as.vector(partial_resids[, which(obs_x == sorted_x[1])]), probs = probs, na.rm = TRUE ), resid_probs ) } else { resid_probs <- do.call( rbind, lapply(sorted_x, function(i) { quantile( as.vector(partial_resids[, which(obs_x == i)]), probs = probs, na.rm = TRUE ) }) ) } # Get polygon coordinates and plot N <- length(sorted_x) idx <- rep(1:N, each = 2) repped_x <- rep(sorted_x, each = 2) x <- sapply( 1:length(idx), function(k) { if (k %% 2 == 0) { repped_x[k] + min(diff(sorted_x)) / 2 } else { repped_x[k] - min(diff(sorted_x)) / 2 } } ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 9], ybottom = resid_probs[, 1], col = c_light, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 8], ybottom = resid_probs[, 2], col = c_light_highlight, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 7], ybottom = resid_probs[, 3], col = c_mid, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 6], ybottom = resid_probs[, 4], col = c_mid_highlight, border = 'transparent' ) for (k in 1:N) { lines( x = c(x[seq(1, N * 2, by = 2)][k], x[seq(2, N * 2, by = 2)][k]), y = c(resid_probs[k, 5], resid_probs[k, 5]), col = c_dark, lwd = 2 ) } # Overlay a minimalist version of the estimated smooth function polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = rgb( red = 0, green = 0, blue = 0, alpha = 30, maxColorValue = 200 ), border = NA ) lines( pred_vals, cred[5, ], col = rgb( red = 0, green = 0, blue = 0, alpha = 45, maxColorValue = 200 ), lwd = 3 ) box(bty = 'L', lwd = 2) } else { polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(pred_vals, cred[5, ], col = c_dark, lwd = 2.5) } } box(bty = 'L', lwd = 2) # Show observed values of the smooth as a rug if (class(object2$obs_data)[1] == 'list') { rug( (as.vector(as.matrix(pred_dat[[smooth]])))[which( pred_dat[['series']] == levels(pred_dat[['series']])[series] )], lwd = 1.75, ticksize = 0.025, col = c_mid_highlight ) } else { rug( (as.vector(as.matrix(data_train[, smooth])))[which( data_train$series == levels(data_train$series)[series] )], lwd = 1.75, ticksize = 0.025, col = c_mid_highlight ) } # Compute 1st derivatives first_derivs <- cbind(rep(NA, NROW(preds)), t(apply(preds, 1, diff))) cred <- sapply( 1:NCOL(first_derivs), function(n) quantile(first_derivs[, n], probs = probs, na.rm = T) ) plot( 1, type = "n", bty = 'L', xlab = smooth, ylab = '1st derivative', xlim = c(min(pred_vals), max(pred_vals)), ylim = c( min(cred, na.rm = T) - sd(first_derivs, na.rm = T), max(cred, na.rm = T) + sd(first_derivs, na.rm = T) ) ) if (realisations) { for (i in 1:n_realisations) { index <- sample(1:NROW(first_derivs), 1, replace = TRUE) lines( x = pred_vals, y = first_derivs[index, ], col = 'white', lwd = 2.5 ) lines( x = pred_vals, y = first_derivs[index, ], col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), lwd = 2.25 ) } } else { polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(pred_vals, cred[5, ], col = c_dark, lwd = 2.5) } box(bty = 'L', lwd = 2) abline(h = 0, lty = 'dashed', lwd = 2) invisible() } else { if (residuals) { plot( 1, type = "n", bty = 'L', xlab = smooth, ylab = 'Partial effect', xlim = c(min(pred_vals), max(pred_vals)), ylim = c( min(min(partial_resids, min(cred) - 0.4 * sd(preds), na.rm = T)), max(max(partial_resids, max(cred) + 0.4 * sd(preds), na.rm = T)) ) ) if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { if (trend_effects) { title( sub( 'series', 'trend', object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ), adj = 0 ) } else { title(object2$mgcv_model$smooth[[smooth_int]]$label, adj = 0) } } else { if (trend_effects) { title(paste0('s(', smooth, ')'), adj = 0) } else { title(paste0('s(', smooth, ')'), adj = 0) } } # Get x-axis values and bin if necessary to prevent overplotting sorted_x <- sort(unique(round(object2$obs_data[[smooth]], 6))) s_name <- levels(object2$obs_data$series)[series] obs_x <- round( data.frame( series = object2$obs_data$series, smooth_vals = object2$obs_data[[smooth]] ) %>% dplyr::filter(series == s_name) %>% dplyr::pull(smooth_vals), 6 ) if (length(sorted_x) > n_resid_bins) { sorted_x <- seq( min(sorted_x), max(sorted_x), length.out = n_resid_bins ) resid_probs <- do.call( rbind, lapply(2:n_resid_bins, function(i) { quantile( as.vector(partial_resids[, which( obs_x <= sorted_x[i] & obs_x > sorted_x[i - 1] )]), probs = probs, na.rm = TRUE ) }) ) resid_probs <- rbind( quantile( as.vector(partial_resids[, which(obs_x == sorted_x[1])]), probs = probs, na.rm = TRUE ), resid_probs ) } else { resid_probs <- do.call( rbind, lapply(sorted_x, function(i) { quantile( as.vector(partial_resids[, which(obs_x == i)]), probs = probs, na.rm = TRUE ) }) ) } # Get polygon coordinates and plot N <- length(sorted_x) idx <- rep(1:N, each = 2) repped_x <- rep(sorted_x, each = 2) x <- sapply( 1:length(idx), function(k) { if (k %% 2 == 0) { repped_x[k] + min(diff(sorted_x)) / 2 } else { repped_x[k] - min(diff(sorted_x)) / 2 } } ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 9], ybottom = resid_probs[, 1], col = c_light, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 8], ybottom = resid_probs[, 2], col = c_light_highlight, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 7], ybottom = resid_probs[, 3], col = c_mid, border = 'transparent' ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = resid_probs[, 6], ybottom = resid_probs[, 4], col = c_mid_highlight, border = 'transparent' ) for (k in 1:N) { lines( x = c(x[seq(1, N * 2, by = 2)][k], x[seq(2, N * 2, by = 2)][k]), y = c(resid_probs[k, 5], resid_probs[k, 5]), col = c_dark, lwd = 2 ) } # Overlay a minimalist version of the estimated smooth function polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = rgb( red = 0, green = 0, blue = 0, alpha = 30, maxColorValue = 200 ), border = NA ) lines( pred_vals, cred[5, ], col = rgb( red = 0, green = 0, blue = 0, alpha = 45, maxColorValue = 200 ), lwd = 3 ) box(bty = 'L', lwd = 2) } else { plot( 1, type = "n", bty = 'L', xlab = smooth, ylab = 'Partial effect', xlim = c(min(pred_vals), max(pred_vals)), ylim = c(min(cred) - 0.9 * sd(preds), max(cred) + 0.9 * sd(preds)) ) if (object2$mgcv_model$smooth[[smooth_int]]$by != "NA") { if (trend_effects) { title( sub( 'series', 'trend', object2$mgcv_model$smooth[[smooth_int]]$label, fixed = TRUE ), adj = 0 ) } else { title(object2$mgcv_model$smooth[[smooth_int]]$label, adj = 0) } } else { if (trend_effects) { title(paste0('s(', smooth, ')'), adj = 0) } else { title(paste0('s(', smooth, ')'), adj = 0) } } if (realisations) { for (i in 1:n_realisations) { index <- sample(1:NROW(preds), 1, replace = TRUE) lines(x = pred_vals, y = preds[index, ], col = 'white', lwd = 2.5) lines( x = pred_vals, y = preds[index, ], col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), lwd = 2.25 ) } } else { polygon( c(pred_vals, rev(pred_vals)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(pred_vals, rev(pred_vals)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(pred_vals, cred[5, ], col = c_dark, lwd = 2.5) } box(bty = 'L', lwd = 2) } # Show observed values of the smooth as a rug if (class(object2$obs_data)[1] == 'list') { rug( (as.vector(as.matrix(data_train[[smooth]])))[which( data_train$series == levels(data_train$series)[series] )], lwd = 1.75, ticksize = 0.025, col = c_mid_highlight ) } else { rug( (as.vector(as.matrix(data_train[, smooth])))[which( data_train$series == levels(data_train$series)[series] )], lwd = 1.75, ticksize = 0.025, col = c_mid_highlight ) } } } } ================================================ FILE: R/plot_mvgam_trend.R ================================================ #' Plot latent trend predictions from \pkg{mvgam} models #' #' @importFrom graphics par lines polygon box abline #' #' @importFrom stats sd quantile #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing at least 'series' and 'time' in addition to any other #' variables included in the linear predictor of the original \code{formula}. #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param derivatives \code{logical}. If \code{TRUE}, an additional plot will #' be returned to show the estimated 1st derivative for the estimated trend #' #' @param realisations \code{logical}. If \code{TRUE}, posterior trend #' realisations are shown as a spaghetti plot, making it easier to visualise #' the diversity of possible trend paths. If \code{FALSE}, the default, #' empirical quantiles of the posterior distribution are shown #' #' @param n_realisations \code{integer} specifying the number of posterior #' realisations to plot, if \code{realisations = TRUE}. Ignored otherwise #' #' @param n_cores Deprecated. Parallel processing is no longer supported #' #' @param xlab Label for x axis #' #' @param ylab Label for y axis #' #' @return A `ggplot` object #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam( #' n_series = 3, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2 #' ) #' #' # Plot estimated trends for some series #' plot_mvgam_trend(mod) #' plot_mvgam_trend(mod, series = 2) #' #' # Extrapolate trends forward in time and plot on response scale #' plot_mvgam_trend( #' mod, #' newdata = simdat$data_test #' ) #' #' plot_mvgam_trend( #' mod, #' newdata = simdat$data_test, #' series = 2 #' ) #' #' # But it is recommended to compute extrapolations for all series #' # first and then plot #' trend_fc <- forecast( #' mod, #' newdata = simdat$data_test #' ) #' #' plot(trend_fc, series = 1) #' plot(trend_fc, series = 2) #' } #' #' @author Nicholas J Clark #' #' @export plot_mvgam_trend = function( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, xlab, ylab ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } validate_pos_integer(series) if (series > NCOL(object$ytimes)) { stop( paste0( 'object only contains data / predictions for ', NCOL(object$ytimes), ' series' ), call. = FALSE ) } if ( attr(object$model_data, 'trend_model') == 'None' & !object$use_lv ) { stop('no trend was estimated in object', call. = FALSE) } if (!missing("newdata")) { data_test <- newdata } # Prediction indices for the particular series data_train <- object$obs_data ends <- seq( 0, dim(mcmc_chains(object$model_output, 'trend'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] if (object$fit_engine == 'stan') { preds <- mcmc_chains(object$model_output, 'trend')[, seq( series, dim(mcmc_chains(object$model_output, 'trend'))[2], by = NCOL(object$ytimes) )] } else { preds <- mcmc_chains(object$model_output, 'trend')[, starts[series]:ends[series] ] } # If the posterior predictions do not already cover the data_test period, the forecast needs to be # generated using the latent trend dynamics; note, this assumes that there is no gap between the training and # testing datasets # Add variables to data_test if missing s_name <- levels(data_train$series)[series] if (!missing(data_test)) { if (!'y' %in% names(data_test)) { data_test$y <- rep(NA, NROW(data_test)) } if (!'series' %in% names(data_test)) { data_test$series <- factor('series1') } if (!'time' %in% names(data_test)) { stop('data_test does not contain a "time" column') } if (inherits(data_test, 'list')) { all_obs <- c( data.frame( y = data_train$y, series = data_train$series, time = data_train$time ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), data.frame( y = data_test$y, series = data_test$series, time = data_test$time ) %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) } else { all_obs <- c( data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y), data_test %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) ) } if (dim(preds)[2] != length(all_obs)) { fc_preds <- forecast( object, data_test = data_test, type = 'trend', n_cores = n_cores )$forecasts[[series]] preds <- cbind(preds, fc_preds) } } preds_last <- preds[1, ] pred_vals <- seq(1:length(preds_last)) # Plot quantiles of the smooth function, along with observed values # if specified probs <- c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) cred <- sapply(1:NCOL(preds), function(n) { quantile(preds[, n], probs = probs, na.rm = TRUE) }) if (missing(xlab)) { xlab <- 'Time' } if (missing(ylab)) { ylab <- paste0('Estimated trend for ', levels(data_train$series)[series]) } # Create a base plot using posterior credible intervals and observations # for the specified series plot_dat <- data.frame( time = 1:NCOL(cred), med = cred[5, ], lower1 = cred[1, ], lower2 = cred[2, ], lower3 = cred[3, ], lower4 = cred[4, ], upper1 = cred[9, ], upper2 = cred[8, ], upper3 = cred[7, ], upper4 = cred[6, ] ) base_plot <- ggplot2::ggplot( data = plot_dat, mapping = ggplot2::aes(x = time, y = med) ) + ggplot2::theme_classic() + ggplot2::labs(x = xlab, y = ylab) # Add to the base plot accordingly if (realisations) { for (i in 1:n_realisations) { base_plot <- base_plot + ggplot2::geom_line( data = data.frame( y = preds[i, ], time = 1:NCOL(cred) ), mapping = ggplot2::aes(x = time, y = y), col = "white", linewidth = 1 ) + ggplot2::geom_line( data = data.frame( y = preds[i, ], time = 1:NCOL(cred) ), mapping = ggplot2::aes(x = time, y = y), col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), linewidth = 0.75 ) } } else { base_plot <- base_plot + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "#DCBCBC" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower2, ymax = upper2), fill = "#C79999" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower3, ymax = upper3), fill = "#B97C7C" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower4, ymax = upper4), fill = "#A25050" ) + ggplot2::geom_line( mapping = ggplot2::aes(x = time, y = med), col = "#8F2727", linewidth = 1 ) } if (!missing(data_test)) { if (class(data_train)[1] == 'list') { base_plot <- base_plot + ggplot2::geom_vline( xintercept = length(data_train$y) / NCOL(object$ytimes), linetype = 'dashed' ) } else { base_plot <- base_plot + ggplot2::geom_vline( xintercept = NROW(data_train) / NCOL(object$ytimes), linetype = 'dashed' ) } } # Add the 1st derivative plot if necessary if (derivatives) { first_derivs <- cbind(rep(0, NROW(preds)), t(apply(preds, 1, diff))) cred <- sapply( 1:NCOL(first_derivs), function(n) quantile(first_derivs[, n], probs = probs, na.rm = TRUE) ) plot_dat <- data.frame( time = 1:NCOL(cred), med = cred[5, ], lower1 = cred[1, ], lower2 = cred[2, ], lower3 = cred[3, ], lower4 = cred[4, ], upper1 = cred[9, ], upper2 = cred[8, ], upper3 = cred[7, ], upper4 = cred[6, ] ) deriv_plot <- ggplot2::ggplot( data = plot_dat, mapping = ggplot2::aes(x = time, y = med) ) + ggplot2::theme_classic() + ggplot2::labs(x = xlab, y = '1st derivative') # Add to the base plot accordingly if (realisations) { for (i in 1:n_realisations) { deriv_plot <- deriv_plot + ggplot2::geom_line( data = data.frame( y = first_derivs[i, ], time = 1:NCOL(cred) ), mapping = ggplot2::aes(x = time, y = y), col = "white", linewidth = 1 ) + ggplot2::geom_line( data = data.frame( y = first_derivs[i, ], time = 1:NCOL(cred) ), mapping = ggplot2::aes(x = time, y = y), col = sample( c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#7C0000"), 1 ), linewidth = 0.75 ) } } else { deriv_plot <- deriv_plot + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower1, ymax = upper1), fill = "#DCBCBC" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower2, ymax = upper2), fill = "#C79999" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower3, ymax = upper3), fill = "#B97C7C" ) + ggplot2::geom_ribbon( mapping = ggplot2::aes(ymin = lower4, ymax = upper4), fill = "#A25050" ) + ggplot2::geom_line( mapping = ggplot2::aes(x = time, y = med), col = "#8F2727", linewidth = 1 ) } if (!missing(data_test)) { if (class(data_train)[1] == 'list') { deriv_plot <- deriv_plot + ggplot2::geom_vline( xintercept = length(data_train$y) / NCOL(object$ytimes), linetype = 'dashed' ) } else { deriv_plot <- deriv_plot + ggplot2::geom_vline( xintercept = NROW(data_train) / NCOL(object$ytimes), linetype = 'dashed' ) } } out <- patchwork::wrap_plots(base_plot, deriv_plot, ncol = 1) } else { out <- base_plot } return(out) } ================================================ FILE: R/plot_mvgam_uncertainty.R ================================================ #' Plot forecast uncertainty contributions from \pkg{mvgam} models #' #' @importFrom graphics legend #' @importFrom stats predict #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param newdata A \code{dataframe} or \code{list} containing at least 'series' #' and 'time' for the forecast horizon, in addition to any other variables #' included in the linear predictor of \code{formula} #' #' @param data_test Deprecated. Still works in place of \code{newdata} but users #' are recommended to use \code{newdata} instead for more seamless integration #' into `R` workflows #' #' @param legend_position The location may also be specified by setting x to a #' single keyword from the list: "none", "bottomright", "bottom", "bottomleft", #' "left", "topleft", "top", "topright", "right" and "center". This places the #' legend on the inside of the plot frame at the given location (if it is not #' "none"). #' #' @param hide_xlabels \code{logical}. If \code{TRUE}, no xlabels are printed to #' allow the user to add custom labels using \code{axis} from base \code{R} #' #' @details The basic idea of this function is to compute forecasts by ignoring #' one of the two primary components in a correlated residual model (i.e. by #' either ignoring the linear predictor effects or by ignoring the residual #' dynamics). Some caution is required however, as this function was designed #' early in the \pkg{mvgam} development cycle and there are now many types of #' models that it cannot handle very well. For example, models with shared #' latent states, or any type of State-Space models that include terms in the #' `trend_formula`, will either fail or give nonsensical results. Improvements #' are in the works to provide a more general way to decompose forecast #' uncertainties, so please check back at a later date. #' #' @return A base \code{R} graphics plot #' #' @export plot_mvgam_uncertainty = function( object, series = 1, newdata, data_test, legend_position = 'topleft', hide_xlabels = FALSE ) { # Check arguments if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (!is.null(object$trend_call)) { stop( 'cannot yet plot uncertainty decompositions for models with trend_formulae', call. = FALSE ) } if (sign(series) != 1) { stop('argument "series" must be a positive integer', call. = FALSE) } else { if (series %% 1 != 0) { stop('argument "series" must be a positive integer', call. = FALSE) } } if (series > NCOL(object$ytimes)) { stop( paste0( 'object only contains data / predictions for ', NCOL(object$ytimes), ' series' ), call. = FALSE ) } if (!missing(newdata)) { data_test <- newdata } if (missing(data_test) & missing(newdata)) { if (!is.null(object$test_data)) { data_test <- object$test_data } else { stop( 'No newdata supplied; cannot calculate uncertainty contributions', call. = FALSE ) } } # Prediction indices for the particular series data_train <- object$obs_data ends <- seq( 0, dim(mcmc_chains(object$model_output, 'ypred'))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] # Add series factor variable if missing if (class(data_train)[1] != 'list') { if (!'series' %in% colnames(data_test)) { data_test$series <- factor('series1') } } if (class(data_train)[1] == 'list') { if (!'series' %in% names(data_test)) { data_test$series <- factor('series1') } } # Generate linear predictor matrix for specified series if (class(data_test)[1] == 'list') { list_names <- names(data_test) indices_keep <- which(data_test$series == levels(data_train$series)[series]) series_test <- lapply(data_test, function(x) { if (is.matrix(x)) { matrix(x[indices_keep, ], ncol = NCOL(x)) } else { x[indices_keep] } }) names(series_test) <- list_names } else { series_test <- data_test[ which(data_test$series == levels(data_train$series)[series]), ] } Xp <- obs_Xp_matrix(newdata = series_test, object$mgcv_model) # Extract beta coefs betas <- mcmc_chains(object$model_output, 'b') # Extract current trend estimates if (object$fit_engine == 'stan') { trend <- mcmc_chains(object$model_output, 'trend')[, seq( series, dim(mcmc_chains(object$model_output, 'trend'))[2], by = NCOL(object$ytimes) )] } else { trend <- mcmc_chains(object$model_output, 'trend')[, starts[series]:ends[series] ] } if (length(unique(data_train$series)) == 1) { trend <- matrix(trend[, NCOL(trend)]) } else { if (class(data_test)[1] == 'list') { trend <- trend[, (length(data_train$series) / NCOL(object$ytimes) + 1):NCOL(trend) ] } else { trend <- trend[, (NROW(data_train) / NCOL(object$ytimes) + 1):NCOL(trend)] } } # Function to calculate intersection of two uncertainty distributions intersect_hist = function(fullpreds, gampreds) { from <- min(min(fullpreds, na.rm = T), min(gampreds, na.rm = T)) to <- max(max(fullpreds, na.rm = T), max(gampreds, na.rm = T)) fullhist <- hist( fullpreds, breaks = seq(from, to, length.out = 100), plot = F ) gamhist <- hist( gampreds, breaks = seq(from, to, length.out = 100), plot = F ) sum(gamhist$density / max(gamhist$density)) / sum(fullhist$density / max(fullhist$density)) } # Full predictions n_samples <- NROW(trend) if (class(data_test)[1] == 'list') { ncols <- length(series_test$series) } else { ncols <- NROW(series_test) } fullpreds <- matrix(NA, nrow = n_samples, ncol = ncols) for (i in 1:n_samples) { fullpreds[i, ] <- Xp %*% betas[i, ] + trend[i, ] + attr(Xp, 'model.offset') } # GAM only predictions gampreds <- matrix(NA, nrow = n_samples, ncol = ncols) for (i in 1:n_samples) { gampreds[i, ] <- Xp %*% betas[i, ] + attr(Xp, 'model.offset') } # GAM uncertainty contributions at each forecast horizon gam_cont <- vector() for (i in 1:NCOL(fullpreds)) { gam_cont[i] <- intersect_hist(fullpreds[, i], gampreds[, i]) } gam_cont[is.na(gam_cont)] <- 0.5 gam_cont[gam_cont > 1] <- 1 # Plot and return if (hide_xlabels) { plot( gam_cont, bty = "L", ylim = c(0, 1), type = 'n', ylab = paste0( 'Uncertainty contributions for ', levels(data_train$series)[series] ), xlab = "", xaxt = 'n' ) } else { plot( gam_cont, bty = 'L', ylim = c(0, 1), type = 'n', ylab = paste0( 'Uncertainty contributions for ', levels(data_train$series)[series] ), xlab = "Forecast horizon" ) } polygon( c(seq(1:(NCOL(gampreds))), rev(seq(1:NCOL(gampreds)))), c(gam_cont, rep(0, NCOL(gampreds))), col = "#7C0000", border = NA ) polygon( c(seq(1:(NCOL(gampreds))), rev(seq(1:NCOL(gampreds)))), c(gam_cont, rep(1, NCOL(gampreds))), col = '#DCBCBC', border = NA ) box(bty = 'L', lwd = 2) if (legend_position != 'none') { legend( legend_position, legend = c("Trend", "GAM"), bg = 'white', col = c('#DCBCBC', "#7C0000"), lty = 1, lwd = 6 ) } } ================================================ FILE: R/portal_data.R ================================================ #' Portal Project rodent capture survey data #' #' A dataset containing time series of total captures (across all control plots) for select rodent species from the Portal Project #' #' @format A `data.frame` containing the following fields: #' \describe{ #' \item{time}{time of sampling, in lunar monthly cycles} #' \item{series}{factor indicator of the time series, i.e. the species} #' \item{captures}{total captures across all control plots at each time point} #' \item{ndvi_ma12}{12-month moving average of the mean Normalised Difference Vegetation Index} #' \item{mintemp}{monthly mean of minimum temperature} #' } #' @source \url{https://github.com/weecology/PortalData/blob/main/SiteandMethods/Methods.md} "portal_data" ================================================ FILE: R/posterior_epred.mvgam.R ================================================ #' Draws from the expected value of the posterior predictive distribution for \pkg{mvgam} objects #' #' Compute posterior draws of the expected value of the posterior predictive #' distribution (i.e. the conditional expectation). Can be performed for the #' data used to fit the model (posterior predictive checks) or for new data. #' By definition, these predictions have smaller variance than the posterior #' predictions performed by the \code{\link{posterior_predict.mvgam}} method. #' This is because only the uncertainty in the expected value of the posterior #' predictive distribution is incorporated in the draws computed by #' \code{posterior_epred} while the residual error is ignored there. However, #' the estimated means of both methods averaged across draws should be very #' similar. #' #' @inheritParams predict.mvgam #' #' @param ndraws Positive \code{integer} indicating how many posterior draws #' should be used. If \code{NULL} (the default) all draws are used. #' #' @param process_error \code{logical}. If \code{TRUE} and \code{newdata} is #' supplied, expected uncertainty in the process model is accounted for by #' using draws from any latent trend SD parameters. If \code{FALSE}, #' uncertainty in the latent trend component is ignored when calculating #' predictions. If no \code{newdata} is supplied, draws from the fitted #' model's posterior predictive distribution will be used (which will always #' include uncertainty in any latent trend components) #' #' @method posterior_epred mvgam #' #' @details Note that for all types of predictions for models that did not #' include a `trend_formula`, uncertainty in the dynamic trend component can #' be ignored by setting \code{process_error = FALSE}. However, if a #' `trend_formula` was supplied in the model, predictions for this component #' cannot be ignored. If \code{process_error = TRUE}, trend predictions will #' ignore autocorrelation coefficients or GP length scale coefficients, #' ultimately assuming the process is stationary. This method is similar to #' the types of posterior predictions returned from `brms` models when using #' autocorrelated error predictions for newdata. This function is therefore #' more suited to posterior simulation from the GAM components of a #' \code{mvgam} model, while the forecasting functions #' \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better #' suited to generate h-step ahead forecasts that respect the temporal #' dynamics of estimated latent trends. #' #' @return A \code{matrix} of dimension \code{n_samples x n_obs}, where #' \code{n_samples} is the number of posterior samples from the fitted object #' and \code{n_obs} is the number of observations in \code{newdata} #' #' @seealso \code{\link{hindcast.mvgam}}, #' \code{\link{posterior_linpred.mvgam}}, #' \code{\link{posterior_predict.mvgam}} #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam( #' n_series = 1, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Compute posterior expectations #' expectations <- posterior_epred(mod) #' str(expectations) #' } #' #' @author Nicholas J Clark #' #' @export posterior_epred.mvgam = function( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... ) { if (missing(newdata) & missing(data_test)) { out <- .mvgam_fitted(object, type = 'expected') } else { out <- predict( object, newdata = newdata, data_test = data_test, process_error = process_error, type = 'expected', summary = FALSE ) } if (!is.null(ndraws)) { validate_pos_integer(ndraws) if (ndraws > NROW(out)) {} else { idx <- sample(1:NROW(out), ndraws, replace = FALSE) out <- out[idx, ] } } return(out) } #' Posterior draws of the linear predictor for \pkg{mvgam} objects #' #' Compute posterior draws of the linear predictor, that is draws before #' applying any link functions or other transformations. Can be performed for #' the data used to fit the model (posterior predictive checks) or for new data. #' #' @inheritParams posterior_epred.mvgam #' #' @param transform \code{logical}; if \code{FALSE} (the default), draws of #' the linear predictor are returned. If \code{TRUE}, draws of the #' transformed linear predictor, i.e. the conditional expectation, are #' returned. #' #' @method posterior_linpred mvgam #' #' @details Note that for all types of predictions for models that did not #' include a `trend_formula`, uncertainty in the dynamic trend component can #' be ignored by setting \code{process_error = FALSE}. However, if a #' `trend_formula` was supplied in the model, predictions for this component #' cannot be ignored. If \code{process_error = TRUE}, trend predictions will #' ignore autocorrelation coefficients or GP length scale coefficients, #' ultimately assuming the process is stationary. This method is similar to #' the types of posterior predictions returned from `brms` models when using #' autocorrelated error predictions for newdata. This function is therefore #' more suited to posterior simulation from the GAM components of a #' \code{mvgam} model, while the forecasting functions #' \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better #' suited to generate h-step ahead forecasts that respect the temporal #' dynamics of estimated latent trends. #' #' @return A \code{matrix} of dimension \code{n_samples x n_obs}, where #' \code{n_samples} is the number of posterior samples from the fitted object #' and \code{n_obs} is the number of observations in \code{newdata} #' #' @seealso \code{\link{hindcast.mvgam}}, #' \code{\link{posterior_epred.mvgam}}, #' \code{\link{posterior_predict.mvgam}} #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam( #' n_series = 1, #' trend_model = AR() #' ) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Extract linear predictor values #' linpreds <- posterior_linpred(mod) #' str(linpreds) #' } #' #' @author Nicholas J Clark #' #' @export posterior_linpred.mvgam = function( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... ) { if (transform) { type <- 'expected' } else { type <- 'link' } out <- predict( object, newdata = newdata, data_test = data_test, process_error = process_error, type = type, summary = FALSE ) if (!is.null(ndraws)) { validate_pos_integer(ndraws) if (ndraws > NROW(out)) {} else { idx <- sample(1:NROW(out), ndraws, replace = FALSE) out <- out[idx, ] } } return(out) } #' Draws from the posterior predictive distribution for \pkg{mvgam} objects #' #' Compute posterior draws of the posterior predictive distribution. Can be #' performed for the data used to fit the model (posterior predictive checks) #' or for new data. By definition, these draws have higher variance than draws #' of the expected value of the posterior predictive distribution computed by #' \code{\link{posterior_epred.mvgam}}. This is because the residual error is #' incorporated in \code{posterior_predict}. However, the estimated means of #' both methods averaged across draws should be very similar. #' #' @inheritParams predict.mvgam #' #' @inheritParams posterior_epred.mvgam #' #' @param process_error Logical. If \code{TRUE} and \code{newdata} is supplied, #' expected uncertainty in the process model is accounted for by using draws #' from any latent trend SD parameters. If \code{FALSE}, uncertainty in the #' latent trend component is ignored when calculating predictions. If no #' \code{newdata} is supplied, draws from the fitted model's posterior #' predictive distribution will be used (which will always include uncertainty #' in any latent trend components) #' #' @method posterior_predict mvgam #' #' @details Note that for all types of predictions for models that did not #' include a `trend_formula`, uncertainty in the dynamic trend component can #' be ignored by setting \code{process_error = FALSE}. However, if a #' `trend_formula` was supplied in the model, predictions for this component #' cannot be ignored. If \code{process_error = TRUE}, trend predictions will #' ignore autocorrelation coefficients or GP length scale coefficients, #' ultimately assuming the process is stationary. This method is similar to #' the types of posterior predictions returned from `brms` models when using #' autocorrelated error predictions for newdata. This function is therefore #' more suited to posterior simulation from the GAM components of a #' \code{mvgam} model, while the forecasting functions #' \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better #' suited to generate h-step ahead forecasts that respect the temporal #' dynamics of estimated latent trends. #' #' @return A \code{matrix} of dimension \code{n_samples x new_obs}, where #' \code{n_samples} is the number of posterior samples from the fitted object #' and \code{n_obs} is the number of observations in \code{newdata} #' #' @seealso #' \code{\link{hindcast.mvgam}}, #' \code{\link{posterior_linpred.mvgam}}, #' \code{\link{posterior_epred.mvgam}} #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam(n_series = 1, trend_model = AR()) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Compute posterior predictions #' predictions <- posterior_predict(mod) #' str(predictions) #' } #' #' @author Nicholas J Clark #' #' @export posterior_predict.mvgam = function( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... ) { out <- predict( object, newdata = newdata, data_test = data_test, process_error = process_error, type = 'response', summary = FALSE ) if (!is.null(ndraws)) { validate_pos_integer(ndraws) if (ndraws > NROW(out)) {} else { idx <- sample(1:NROW(out), ndraws, replace = FALSE) out <- out[idx, ] } } return(out) } #' @export #' @importFrom rstantools posterior_predict rstantools::posterior_predict #' @export #' @importFrom rstantools posterior_epred rstantools::posterior_epred #' @export #' @importFrom rstantools posterior_linpred rstantools::posterior_linpred #' Expected values of the posterior predictive distribution for \pkg{mvgam} objects #' #' This method extracts posterior estimates of the fitted values (i.e. the #' actual predictions, including estimates for any trend states, that were #' obtained when fitting the model). It also includes an option for obtaining #' summaries of the computed draws. #' #' @inheritParams brms::fitted.brmsfit #' #' @inheritParams predict.mvgam #' #' @param object An object of class `mvgam` #' #' @details This method gives the actual fitted values from the model (i.e. what #' you will see if you generate hindcasts from the fitted model using #' \code{\link{hindcast.mvgam}} with `type = 'expected'`). These predictions #' can be overly precise if a flexible dynamic trend component was included in #' the model. This is in contrast to the set of predict functions (i.e. #' \code{\link{posterior_epred.mvgam}} or \code{\link{predict.mvgam}}), which #' will assume any dynamic trend component has reached stationarity when #' returning hypothetical predictions. #' #' @return An \code{array} of predicted \emph{mean} response values. #' #' If \code{summary = FALSE} the output resembles those of #' \code{\link{posterior_epred.mvgam}} and \code{\link{predict.mvgam}}. #' #' If \code{summary = TRUE} the output is an \code{n_observations} x \code{E} #' matrix. The number of summary statistics \code{E} is equal to \code{2 + #' length(probs)}: The \code{Estimate} column contains point estimates (either #' mean or median depending on argument \code{robust}), while the #' \code{Est.Error} column contains uncertainty estimates (either standard #' deviation or median absolute deviation depending on argument #' \code{robust}). The remaining columns starting with \code{Q} contain #' quantile estimates as specified via argument \code{probs}. #' #' @seealso #' \code{\link{hindcast.mvgam}} #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam(n_series = 1, trend_model = AR()) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Extract fitted values (posterior expectations) #' expectations <- fitted(mod) #' str(expectations) #' } #' #' @author Nicholas J Clark #' #' @export fitted.mvgam <- function( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) { if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } scale <- match.arg(scale) type <- switch(scale, "response" = "expected", "linear" = "link") preds <- .mvgam_fitted(object = object, type = type) # Preserve original data ordering data.frame( time = object$obs_data$index..time..index, order = object$obs_data$index..orig..order, series = object$obs_data$series ) %>% dplyr::arrange(time, series) %>% dplyr::pull(order) -> orig_order preds <- preds[, order(orig_order)] if (summary) { Qupper <- apply(preds, 2, quantile, probs = max(probs), na.rm = TRUE) Qlower <- apply(preds, 2, quantile, probs = min(probs), na.rm = TRUE) if (robust) { estimates <- apply(preds, 2, median, na.rm = TRUE) errors <- apply(abs(preds - estimates), 2, median, na.rm = TRUE) } else { estimates <- apply(preds, 2, mean, na.rm = TRUE) errors <- apply(preds, 2, sd, na.rm = TRUE) } out <- cbind(estimates, errors, Qlower, Qupper) colnames(out) <- c( 'Estimate', 'Est.Error', paste0('Q', 100 * min(probs)), paste0('Q', 100 * max(probs)) ) } else { out <- preds } return(out) } #' @noRd .mvgam_fitted = function(object, type = 'expected') { # Extract the linear predictor draws mus <- mcmc_chains(object$model_output, 'mus') # Need to know which series each observation belongs to so we can # pull out appropriate family-level parameters (overdispersions, shapes, etc...) if (is.null(object$test_data)) { all_dat <- data.frame( series = object$obs_data$series, time = object$obs_data$time, y = object$obs_data$y ) %>% dplyr::arrange(series, time) } else { all_dat <- data.frame( series = c(object$obs_data$series, object$test_data$series), time = c(object$obs_data$time, object$test_data$time), y = c(object$obs_data$y, object$test_data$y) ) %>% dplyr::arrange(series, time) } obs <- all_dat$y series_obs <- as.numeric(all_dat$series) # Family-specific parameters family <- object$family family_pars <- extract_family_pars(object = object) n_series <- NCOL(object$ytimes) # Family parameters spread into a vector family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, series_obs]) } else { family_pars[[j]][] } }) names(family_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { trials <- as.vector(matrix( rep(as.vector(attr(object$mgcv_model, 'trials')), NROW(mus)), nrow = NROW(mus), byrow = TRUE )) family_extracts$trials <- trials } # Expectations as a vector Xp <- as.matrix(as.vector(mus)) attr(Xp, 'model.offset') <- 0 if (family == 'nmix') { latent_lambdas <- exp(as.vector(mcmc_chains(object$model_output, 'trend'))) n_draws <- dim(mcmc_chains(object$model_output, 'ypred'))[1] cap <- as.vector(t(replicate(n_draws, object$obs_data$cap))) } else { latent_lambdas <- NULL cap <- NULL } pred_vec <- mvgam_predict( family = family, family_pars = family_extracts, latent_lambdas = latent_lambdas, cap = cap, type = type, Xp = Xp, betas = 1 ) # Convert back to matrix and return pred_mat <- matrix(pred_vec, nrow = NROW(mus)) return(pred_mat) } ================================================ FILE: R/ppc.mvgam.R ================================================ #' @title Plot conditional posterior predictive checks from \pkg{mvgam} models #' #' @importFrom stats quantile density ecdf formula terms #' @importFrom graphics hist abline box rect lines polygon par #' @importFrom grDevices rgb #' #' @name ppc.mvgam #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing at least 'series' and 'time' for the forecast horizon, in #' addition to any other variables included in the linear predictor of #' \code{formula}. If included, the observed values in the test data are #' compared to the model's forecast distribution for exploring biases in #' model predictions. Note this is only useful if the same \code{newdata} #' was also included when fitting the original model. #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param series \code{integer} specifying which series in the set is to be #' plotted #' #' @param type \code{character} specifying the type of posterior predictive #' check to calculate and plot. Valid options are: 'rootogram', 'mean', #' 'hist', 'density', 'prop_zero', 'pit' and 'cdf' #' #' @param n_bins \code{integer} specifying the number of bins to use for #' binning observed values when plotting a rootogram or histogram. Default #' is `50` bins for a rootogram, which means that if there are >50 unique #' observed values, bins will be used to prevent overplotting and facilitate #' interpretation. Default for a histogram is to use the number of bins #' returned by a call to `hist` in base `R` #' #' @param legend_position The location may also be specified by setting x to a #' single keyword from the list "bottomright", "bottom", "bottomleft", #' "left", "topleft", "top", "topright", "right" and "center". This places #' the legend on the inside of the plot frame at the given location. Or #' alternatively, use "none" to hide the legend. #' #' @param xlab Label for x axis #' #' @param ylab Label for y axis #' #' @param ... Further \code{\link[graphics]{par}} graphical parameters #' #' @details Conditional posterior predictions are drawn from the fitted #' \code{mvgam} and compared against the empirical distribution of the #' observed data for a specified series to help evaluate the model's ability #' to generate unbiased predictions. For all plots apart from #' `type = 'rootogram'`, posterior predictions can also be compared to out #' of sample observations as long as these observations were included as #' 'data_test' in the original model fit and supplied here. Rootograms are #' currently only plotted using the 'hanging' style. #' #' #' Note that the predictions used for these plots are *conditional on #' the observed data*, i.e. they are those predictions that have been #' generated directly within the `mvgam()` model. They can be misleading if #' the model included flexible dynamic trend components. For a broader range #' of posterior checks that are created using *unconditional* "new data" #' predictions, see \code{\link{pp_check.mvgam}} #' #' @return A base \code{R} graphics plot showing either a posterior rootogram #' (for \code{type == 'rootogram'}), the predicted vs observed mean for the #' series (for \code{type == 'mean'}), predicted vs observed proportion of #' zeroes for the series (for \code{type == 'prop_zero'}), predicted vs #' observed histogram for the series (for \code{type == 'hist'}), kernel #' density or empirical CDF estimates for posterior predictions (for #' \code{type == 'density'} or \code{type == 'cdf'}) or a Probability #' Integral Transform histogram (for \code{type == 'pit'}). #' #' @author Nicholas J Clark #' #' @seealso \code{\link{pp_check.mvgam}}, \code{\link{predict.mvgam}} #' #' @examples #' \dontrun{ #' # Simulate some smooth effects and fit a model #' set.seed(0) #' #' dat <- mgcv::gamSim( #' 1, #' n = 200, #' scale = 2 #' ) #' #' mod <- mvgam( #' y ~ s(x0) + s(x1) + s(x2) + s(x3), #' data = dat, #' family = gaussian(), #' chains = 2, #' silent = 2 #' ) #' #' # Posterior checks #' ppc(mod, type = "hist") #' ppc(mod, type = "density") #' ppc(mod, type = "cdf") #' #' # Many more options are available with pp_check() #' pp_check(mod) #' pp_check(mod, type = "ecdf_overlay") #' pp_check(mod, type = "freqpoly") #' } #' #' @export ppc <- function(object, ...) { UseMethod("ppc", object) } #' @rdname ppc.mvgam #' @method ppc mvgam #' @export ppc.mvgam <- function( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... ) { # Check arguments type <- match.arg( arg = type, choices = c( "rootogram", "mean", "histogram", "density", "pit", "cdf", "prop_zero" ) ) if (type == "histogram") { type <- "hist" } if (type == "rootogram") { if ( !object$family %in% c( "poisson", "negative binomial", "tweedie", "nmix", "binomial", "beta_binomial" ) ) { stop( "Rootograms not supported for checking non-count data", call. = FALSE ) } } optional_args <- list(...) if (!(inherits(object, "mvgam"))) { stop('argument "object" must be of class "mvgam"') } if (sign(series) != 1) { stop('argument "series" must be a positive integer', call. = FALSE) } else { if (series %% 1 != 0) { stop('argument "series" must be a positive integer', call. = FALSE) } } if (series > NCOL(object$ytimes)) { stop( paste0( "object only contains data / predictions for ", NCOL(object$ytimes), " series" ), call. = FALSE ) } if (type == "rootogram" & missing(n_bins)) { n_bins <- 50 if (sign(n_bins) != 1) { stop('argument "n_bins" must be a positive integer', call. = FALSE) } else { if (n_bins %% 1 != 0) { stop('argument "n_bins" must be a positive integer', call. = FALSE) } } } if (!missing("newdata")) { data_test <- newdata # Ensure outcome is labelled 'y' when feeding data to the model for simplicity if (terms(formula(object$call))[[2]] != "y") { data_test$y <- data_test[[terms(formula(object$call))[[2]]]] } } # Pull out observations and posterior predictions for the specified series data_train <- object$obs_data ends <- seq( 0, dim(mcmc_chains(object$model_output, "ypred"))[2], length.out = NCOL(object$ytimes) + 1 ) starts <- ends + 1 starts <- c(1, starts[-c(1, (NCOL(object$ytimes) + 1))]) ends <- ends[-1] # Colours needed for plotting quantiles c_light <- c("#DCBCBC") c_light_highlight <- c("#C79999") c_mid <- c("#B97C7C") c_mid_highlight <- c("#A25050") c_dark <- c("#8F2727") c_dark_highlight <- c("#7C0000") s_name <- levels(data_train$series)[series] if (!missing(data_test)) { if (class(data_test)[1] == "list") { if (!"time" %in% names(data_test)) { stop('data_test does not contain a "time" column') } if (!"series" %in% names(data_test)) { data_test$series <- factor("series1") } } else { if (!"time" %in% colnames(data_test)) { stop('data_test does not contain a "time" column') } if (!"series" %in% colnames(data_test)) { data_test$series <- factor("series1") } } if (class(object$obs_data)[1] == "list") { truths <- data.frame( y = data_test$y, time = data_test$time, series = data_test$series ) %>% dplyr::arrange(time, series) %>% dplyr::filter(series == s_name) %>% dplyr::pull(y) if (object$fit_engine == "stan") { # For stan objects, ypred is stored as a vector in column-major order preds <- mcmc_chains(object$model_output, "ypred")[, seq( series, dim(mcmc_chains(object$model_output, "ypred"))[2], by = NCOL(object$ytimes) )] } else { preds <- mcmc_chains(object$model_output, "ypred")[, starts[series]:ends[series] ] } preds <- preds[, ((length(data_train$y) / NCOL(object$ytimes)) + 1):((length( data_train$y ) / NCOL(object$ytimes)) + length(truths)) ] } else { truths <- data_test %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) if (object$fit_engine == "stan") { # For stan objects, ypred is stored as a vector in column-major order preds <- mcmc_chains(object$model_output, "ypred")[, seq( series, dim(mcmc_chains(object$model_output, "ypred"))[2], by = NCOL(object$ytimes) )] } else { preds <- mcmc_chains(object$model_output, "ypred")[, starts[series]:ends[series] ] } preds <- preds[, ((NROW(data_train) / NCOL(object$ytimes)) + 1):((NROW(data_train) / NCOL(object$ytimes)) + length(truths)) ] } if (NROW(preds) > 4000) { preds <- preds[sample(1:NROW(preds), 4000, F), ] } } else { if (class(object$obs_data)[1] == "list") { truths <- data.frame( y = data_train$y, time = data_train$time, series = data_train$series ) %>% dplyr::arrange(series, time) %>% dplyr::filter(series == s_name) %>% dplyr::pull(y) } else { truths <- data_train %>% dplyr::filter(series == s_name) %>% dplyr::select(time, y) %>% dplyr::distinct() %>% dplyr::arrange(time) %>% dplyr::pull(y) } if (object$fit_engine == "stan") { # For stan objects, ypred is stored as a vector in column-major order preds <- mcmc_chains(object$model_output, "ypred")[, seq( series, dim(mcmc_chains(object$model_output, "ypred"))[2], by = NCOL(object$ytimes) )] } else { preds <- mcmc_chains(object$model_output, "ypred")[, starts[series]:ends[series] ] } preds <- preds[, 1:length(truths), drop = FALSE] if (NROW(preds) > 4000) { preds <- preds[sample(1:NROW(preds), 4000, F), ] } } # Can't deal with missing values in these diagnostic plots preds[is.nan(preds)] <- NA preds <- preds[, !is.na(truths)] truths <- truths[!is.na(truths)] probs <- c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) if (type == "prop_zero") { pred_props <- apply(preds, 1, function(x) length(which(x == 0)) / length(x)) lower <- quantile(pred_props, probs = 0.01, na.rm = TRUE) upper <- quantile(pred_props, probs = 0.99, na.rm = TRUE) if (lower == 0 & upper == 0) { stop("No predictions covered zero") } pred_props <- pred_props[-which(pred_props > upper)] if (lower != 0) { pred_props <- pred_props[-which(pred_props < lower)] } obs_prop <- length(which(truths == 0)) / length(truths) if (missing(ylab)) { ylab <- "Density" } if (missing(xlab)) { xlab <- paste0( "Predicted proportion of zeroes for ", levels(data_train$series)[series] ) } hist( pred_props, lwd = 2, xlim = c( min(min(pred_props), min(obs_prop)), max(max(pred_props), max(obs_prop)) ), main = "", breaks = seq(min(pred_props), max(pred_props), length.out = 15), border = "#B97C7C", col = "#C79999", ylab = ylab, xlab = xlab, ... ) abline(v = obs_prop, lwd = 3, col = "white") abline(v = obs_prop, lwd = 2.5, col = "black") box(bty = "L", lwd = 2) if (missing(legend_position)) { legend_position <- "topright" } if (legend_position != "none") { legend( legend_position, legend = c(expression(hat(y)[propzero]), expression(y[propzero])), bg = "white", col = c(c_mid, "black"), lty = 1, lwd = 2, bty = "n" ) } } if (type == "rootogram") { ymax <- floor(max(max(truths), quantile(preds, prob = 0.99, na.rm = TRUE))) ymin <- 0L xpos <- ymin:ymax # Bin if necessary to prevent overplotting if (length(xpos) > n_bins) { cutpoints <- seq(ymin, ymax, length.out = n_bins) xpos <- floor(cutpoints) # Find the cutpoint interval that each prediction falls in tpreds <- as.list(rep(NA, NROW(preds))) for (i in seq_along(tpreds)) { tpreds[[i]] <- table(xpos[findInterval(preds[i, ], cutpoints)]) matches <- match(xpos, rownames(tpreds[[i]])) tpreds[[i]] <- as.numeric(tpreds[[i]][matches]) } tpreds <- do.call(rbind, tpreds) tpreds[is.na(tpreds)] <- 0 tyquantile <- sqrt(t(apply( tpreds, 2, quantile, probs = probs, na.rm = TRUE ))) tyexp <- tyquantile[, 5] # Repeat for truths ty <- table(xpos[findInterval(truths, cutpoints)]) ty <- sqrt(as.numeric(ty[match(xpos, rownames(ty))])) } else { tpreds <- as.list(rep(NA, NROW(preds))) for (i in seq_along(tpreds)) { tpreds[[i]] <- table(as.vector(preds[i, ])) matches <- match(xpos, rownames(tpreds[[i]])) tpreds[[i]] <- as.numeric(tpreds[[i]][matches]) } tpreds <- do.call(rbind, tpreds) tpreds[is.na(tpreds)] <- 0 tyquantile <- sqrt(t(apply( tpreds, 2, quantile, probs = probs, na.rm = TRUE ))) tyexp <- tyquantile[, 5] ty <- table(truths) ty <- sqrt(as.numeric(ty[match(xpos, rownames(ty))])) } ty[is.na(ty)] <- 0 ypos <- ty / 2 ypos <- tyexp - ypos data <- data.frame(xpos, ypos, ty, tyexp, tyquantile) N <- length(xpos) idx <- rep(1:N, each = 2) repped_x <- rep(xpos, each = 2) x <- sapply( 1:length(idx), function(k) { if (k %% 2 == 0) { repped_x[k] + min(diff(xpos)) / 2 } else { repped_x[k] - min(diff(xpos)) / 2 } } ) if (missing(xlab)) { xlab <- expression(y) } if (missing(ylab)) { ylab <- expression(sqrt(frequency)) } # Plot the rootogram plot( 1, type = "n", bty = "L", xlab = xlab, ylab = ylab, xlim = range(xpos), ylim = range(c(data$tyexp, data[, 13], data[, 5], data$tyexp - data$ty)), ... ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = data$tyexp, ybottom = data$tyexp - data$ty, col = "grey80", border = "grey10", lwd = 2 ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = data[, 13], ybottom = data[, 5], col = "#DCBCBC85", border = "transparent" ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = data[, 12], ybottom = data[, 6], col = "#C7999985", border = "transparent" ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = data[, 11], ybottom = data[, 7], col = "#B97C7C85", border = "transparent" ) rect( xleft = x[seq(1, N * 2, by = 2)], xright = x[seq(2, N * 2, by = 2)], ytop = data[, 10], ybottom = data[, 8], col = "#A2505085", border = "transparent" ) abline(h = 0, col = "white", lwd = 3) abline(h = 0, col = "black", lwd = 2.5) for (k in 1:N) { lines( x = c(x[seq(1, N * 2, by = 2)][k], x[seq(2, N * 2, by = 2)][k]), y = c(data[k, 9], data[k, 9]), col = "#8F2727", lwd = 3 ) } box(bty = "L", lwd = 2) } if (type == "mean") { # Plot observed and predicted means pred_means <- apply(preds, 1, mean, na.rm = TRUE) lower <- quantile(pred_means, probs = 0.01, na.rm = TRUE) upper <- quantile(pred_means, probs = 0.99, na.rm = TRUE) pred_means <- pred_means[-which(pred_means > upper)] if (lower != 0) { pred_means <- pred_means[-which(pred_means < lower)] } obs_mean <- mean(truths) if (missing(ylab)) { ylab <- "Density" } if (missing(xlab)) { xlab <- paste0("Predicted mean for ", levels(data_train$series)[series]) } hist( pred_means, xlim = c( min(min(pred_means, na.rm = TRUE), min(obs_mean, na.rm = TRUE)), max(max(pred_means, na.rm = TRUE), max(obs_mean, na.rm = TRUE)) ), lwd = 2, main = "", breaks = seq( min(pred_means, na.rm = TRUE), max(pred_means, na.rm = TRUE), length.out = 20 ), border = "#B97C7C", col = "#C79999", ylab = ylab, xlab = xlab, ... ) abline(v = obs_mean, lwd = 3, col = "white") abline(v = obs_mean, lwd = 2.5, col = "black") box(bty = "L", lwd = 2) if (missing(legend_position)) { legend_position <- "topright" } if (legend_position != "none") { legend( legend_position, legend = c(expression(hat(mu)), expression(mu)), bg = "white", col = c(c_mid, "black"), lty = 1, lwd = 2, bty = "n" ) } } # Generate a sample sequence and plot if (type == "density") { max_x <- max( max(density(preds[1, ], na.rm = TRUE)$x), max(density(truths, na.rm = TRUE)$x) ) min_x <- min( min(density(preds[1, ], na.rm = TRUE)$x), min(density(truths, na.rm = TRUE)$x) ) pred_densities <- do.call( rbind, (lapply(1:NROW(preds), function(x) { if (length(which(is.na(preds[x, ]))) > (length(preds[x, ]) - 3)) { rep( 0, length(density(truths, from = min_x, to = max_x, na.rm = TRUE)$y) ) } else { dens <- density(preds[x, ], from = min_x, to = max_x, na.rm = TRUE) dens$y } })) ) cred <- sapply( 1:NCOL(pred_densities), function(n) quantile(pred_densities[, n], probs = probs, na.rm = TRUE) ) true_dens <- density(truths, from = min_x, to = max_x, na.rm = TRUE) ymax <- max(c(max(cred, na.rm = TRUE), max(true_dens$y, na.rm = TRUE))) if (missing(ylab)) { ylab <- paste0( "Predictive density for ", levels(data_train$series)[series] ) } if (missing(xlab)) { xlab <- "" } if (object$family == "beta") { xlimits <- c(0, 1) } else if ( object$family %in% c("poisson", "negative binomial", "lognormal", "Gamma") ) { xlimits <- c(0, max_x) } else { xlimits <- c(min_x, max_x) } plot( 1, type = "n", bty = "L", xlab = xlab, ylab = ylab, xlim = xlimits, ylim = c(0, ymax), ... ) polygon( c(true_dens$x, rev(true_dens$x)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(true_dens$x, rev(true_dens$x)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(true_dens$x, rev(true_dens$x)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(true_dens$x, rev(true_dens$x)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(true_dens$x, cred[5, ], col = c_dark, lwd = 2.5) lines(x = true_dens$x, y = true_dens$y, lwd = 3, col = "white") lines(x = true_dens$x, y = true_dens$y, lwd = 2.5, col = "black") if (missing(legend_position)) { legend_position <- "topright" } if (legend_position != "none") { legend( legend_position, legend = c(expression(hat(y)), "y"), bg = "white", col = c(c_mid, "black"), lty = 1, lwd = 2, bty = "n" ) } box(bty = "L", lwd = 2) } if (type == "hist") { if (missing(n_bins)) { n_bins <- max(c( length(hist(c(truths, as.vector(preds)), plot = F)$breaks), 20 )) } if (sign(n_bins) != 1) { stop('argument "n_bins" must be a positive integer', call. = FALSE) } else { if (n_bins %% 1 != 0) { stop('argument "n_bins" must be a positive integer', call. = FALSE) } } xlim <- c( min( min(density(preds[1, ], na.rm = TRUE)$x), min(density(truths, na.rm = TRUE)$x) ), max( max(density(preds[1, ], na.rm = TRUE)$x), max(density(truths, na.rm = TRUE)$x) ) ) if (object$family == "beta") { xlim <- c(0, 1) } else if ( object$family %in% c("poisson", "negative binomial", "lognormal", "Gamma") ) { xlim <- c(0, xlim[2]) } else { xlim <- xlim } breaks <- seq(xlim[1], xlim[2], length.out = n_bins) truths <- truths[truths <= xlim[2]] truths <- truths[truths >= xlim[1]] preds <- preds[preds <= xlim[2]] preds <- preds[preds >= xlim[1]] ylim <- c( 0, max( c( max(hist(truths, breaks = breaks, plot = F)$density, na.rm = TRUE), max(hist(preds, breaks = breaks, plot = F)$density, na.rm = TRUE) ), na.rm = TRUE ) ) if (missing(xlab)) { xlab <- paste0("Count") } if (missing(ylab)) { ylab <- "" } hist( preds, breaks = breaks, lwd = 2, main = "", xlab = xlab, ylab = ylab, ylim = ylim, xlim = xlim, border = "#B97C7C", col = "#C79999", freq = F, ... ) par(lwd = 2) hist( truths, breaks = breaks, main = "", xlab = "", ylim = ylim, xlim = xlim, ylab = "", yaxt = "n", col = rgb(red = 0, green = 0, blue = 0, alpha = 0), border = "black", add = T, freq = F ) par(lwd = 1) box(bty = "L", lwd = 2) if (missing(legend_position)) { legend_position <- "topright" } if (legend_position != "none") { legend( legend_position, legend = c(expression(hat(y)), "y"), bg = "white", col = c(c_mid, "black"), lty = 1, lwd = 2, bty = "n" ) } } if (type == "cdf") { ecdf_plotdat <- function(vals, x) { if (length(which(is.na(vals))) > (length(vals) - 3)) {} else { func <- ecdf(vals) func(x) } } plot_x <- seq( from = min(truths, na.rm = T), to = max(truths, na.rm = T), length.out = 100 ) pred_cdfs <- do.call( rbind, (lapply(1:NROW(preds), function(x) { ecdf_plotdat(preds[x, ], x = plot_x) })) ) probs <- c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95) cred <- sapply( 1:NCOL(pred_cdfs), function(n) quantile(pred_cdfs[, n], probs = probs, na.rm = TRUE) ) if (missing(ylab)) { ylab <- paste0("Predictive CDF for ", levels(data_train$series)[series]) } if (missing(xlab)) { xlab <- "" } plot( 1, type = "n", bty = "L", xlab = xlab, ylab = ylab, xlim = c(min(plot_x, na.rm = TRUE), max(plot_x, na.rm = TRUE)), ylim = c(0, 1), ... ) polygon( c(plot_x, rev(plot_x)), c(cred[1, ], rev(cred[9, ])), col = c_light, border = NA ) polygon( c(plot_x, rev(plot_x)), c(cred[2, ], rev(cred[8, ])), col = c_light_highlight, border = NA ) polygon( c(plot_x, rev(plot_x)), c(cred[3, ], rev(cred[7, ])), col = c_mid, border = NA ) polygon( c(plot_x, rev(plot_x)), c(cred[4, ], rev(cred[6, ])), col = c_mid_highlight, border = NA ) lines(plot_x, cred[5, ], col = c_dark, lwd = 2.5) lines(x = plot_x, y = ecdf_plotdat(truths, plot_x), col = "white", lwd = 3) lines( x = plot_x, y = ecdf_plotdat(truths, plot_x), col = "black", lwd = 2.5 ) if (missing(legend_position)) { legend_position <- "bottomright" } if (legend_position != "none") { legend( legend_position, legend = c(expression(hat(y)), "y"), bg = "white", col = c(c_mid, "black"), lty = 1, lwd = 2, bty = "n" ) } box(bty = "L", lwd = 2) } if (type == "pit") { # Calculate emipirical cumulative distribution function as the # portion of (y_predicted <= y_true) n_pred <- ncol(preds) P_x <- vapply( seq_along(truths), function(i) { sum(preds[i, ] <= truths[i]) / n_pred }, .0 ) P_xm1 <- vapply( seq_along(truths), function(i) { sum(preds[i, ] <= truths[i] - 1.e-6) / n_pred }, .0 ) # 1000 replicates for randomised PIT u <- replicate(1000, P_xm1 + stats::runif(length(truths)) * (P_x - P_xm1)) pit_hist <- hist(u, breaks = seq(0, 1, by = 0.1), plot = F)$density pit_hist <- (pit_hist / sum(pit_hist)) * 10 barplot( pit_hist, lwd = 2, col = "#B97C7C", xlab = paste0("Predictive PIT for ", levels(data_train$series)[series]), border = NA, ... ) abline(h = 1, col = "#FFFFFF60", lwd = 2.85) abline(h = 1, col = "black", lwd = 2.5, lty = "dashed") box(bty = "L", lwd = 2) } } #' Posterior Predictive Checks for \code{mvgam} models #' #' Perform unconditional posterior predictive checks with the help #' of the \pkg{bayesplot} package. #' #' @aliases pp_check #' #' @inheritParams brms::pp_check #' @inheritParams brms::prepare_predictions.brmsfit #' #' @importFrom insight get_predictors #' @importFrom brms do_call #' @importFrom bayesplot pp_check #' #' @param object An object of class \code{mvgam} #' #' @param newdata Optional \code{dataframe} or \code{list} of test data containing the #' variables included in the linear predictor of \code{formula}. If not supplied, #' predictions are generated for the original observations used for the model fit. #' Ignored if using one of the residual plots (i.e. 'resid_hist') #' #' @param ... Further arguments passed to \code{\link{predict.mvgam}} #' as well as to the PPC function specified in \code{type} #' #' @return A ggplot object that can be further #' customized using the \pkg{ggplot2} package. #' #' @details Unlike the conditional posterior checks provided by \code{\link{ppc}}, #' This function computes *unconditional* posterior predictive checks (i.e. it generates #' predictions for fake data without considering the true observations associated with those #' fake data). For a detailed explanation of each of the ppc functions, #' see the \code{\link[bayesplot:PPC-overview]{PPC}} #' documentation of the \pkg{\link[bayesplot:bayesplot-package]{bayesplot}} #' package. #' #' @seealso \code{\link{ppc}}, \code{\link{predict.mvgam}} #' #' @examples #' \dontrun{ #' simdat <- sim_mvgam(seasonality = "hierarchical") #' mod <- mvgam( #' y ~ series + #' s(season, bs = "cc", k = 6) + #' s(season, series, bs = "fs", k = 4), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Use pp_check(mod, type = "xyz") for a list of available plot types #' #' # Default is a density overlay for all observations #' pp_check(mod) #' #' # Rootograms particularly useful for count data #' pp_check(mod, type = "rootogram") #' #' # Grouping plots by series is useful #' pp_check(mod, #' type = "bars_grouped", #' group = "series", ndraws = 50 #' ) #' pp_check(mod, #' type = "ecdf_overlay_grouped", #' group = "series", ndraws = 50 #' ) #' pp_check(mod, #' type = "stat_freqpoly_grouped", #' group = "series", ndraws = 50 #' ) #' #' # Several types can be used to plot distributions of randomized #' # quantile residuals #' pp_check( #' object = mod, #' x = "season", #' type = "resid_ribbon" #' ) #' pp_check( #' object = mod, #' x = "season", #' group = "series", #' type = "resid_ribbon_grouped" #' ) #' pp_check(mod, #' ndraws = 5, #' type = "resid_hist_grouped", #' group = "series" #' ) #' #' # Custom functions accepted #' pp_check(mod, type = "stat", stat = function(x) mean(x == 0)) #' pp_check(mod, #' type = "stat_grouped", #' stat = function(x) mean(x == 0), #' group = "series" #' ) #' #' # Some functions accept covariates to set the x-axes #' pp_check(mod, #' x = "season", #' type = "ribbon_grouped", #' prob = 0.5, #' prob_outer = 0.8, #' group = "series" #' ) #' #' # Many plots can be made without the observed data #' pp_check(mod, prefix = "ppd") #' } #' #' @export pp_check #' #' @author Nicholas J Clark #' #' @export pp_check.mvgam <- function( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... ) { # Set red colour scheme col_scheme <- attr(color_scheme_get(), "scheme_name") color_scheme_set("red") dots <- list(...) if (missing(type)) { type <- "dens_overlay" } prefix <- match.arg(prefix) ndraws_given <- "ndraws" %in% names(match.call()) if (is.null(newdata)) { newdata <- object$obs_data } if (prefix == "ppc") { # No type checking for prefix 'ppd' yet valid_types <- sort( c( as.character(bayesplot::available_ppc("")), "ppc_resid_hist", "ppc_resid_hist_grouped", "ppc_resid_ribbon", "ppc_resid_ribbon_grouped" ) ) valid_types <- sub("^ppc_", "", valid_types) if (!type %in% valid_types) { stop( "Type '", type, "' is not a valid ppc type. ", "Valid types are:\n", paste0("'", valid_types, "'", collapse = ", "), call. = FALSE ) } } bptype <- type if (bptype %in% c("resid_hist", "resid_hist_grouped")) { if (is.null(object$resids)) { object <- add_residuals(object) } bptype <- sub("resid", "error", bptype) } if (bptype %in% c("resid_ribbon", "resid_ribbon_grouped")) { if (is.null(object$resids)) { object <- add_residuals(object) } bptype <- sub("resid_", "", bptype) } ppc_fun <- get(paste0(prefix, "_", bptype), asNamespace("bayesplot")) family <- object$family if (family == "nmix") { stop("'pp_check' is not implemented for this family.", call. = FALSE) } valid_vars <- names(get_predictors(object)) if ("group" %in% names(formals(ppc_fun))) { if (is.null(group)) { stop( "Argument 'group' is required for ppc type '", type, "'.", call. = FALSE ) } if (!group %in% valid_vars) { stop( "Variable '", group, "' could not be found in the data.", call. = FALSE ) } } if ("x" %in% names(formals(ppc_fun))) { if (!is.null(x) && !x %in% valid_vars) { stop("Variable '", x, "' could not be found in the data.", call. = FALSE) } } if (type == "error_binned") { method <- "posterior_epred" } else { method <- "posterior_predict" } if (!ndraws_given) { aps_types <- c( "error_scatter_avg", "error_scatter_avg_vs_x", "intervals", "intervals_grouped", "loo_intervals", "loo_pit", "loo_pit_overlay", "loo_pit_qq", "loo_ribbon", "pit_ecdf", "pit_ecdf_grouped", "ribbon", "ribbon_grouped", "rootogram", "scatter_avg", "scatter_avg_grouped", "stat", "stat_2d", "stat_freqpoly_grouped", "stat_grouped", "violin_grouped" ) if (type %in% aps_types) { ndraws <- NULL message("Using all posterior draws for ppc type '", type, "' by default.") } else { ndraws <- 10 message("Using 10 posterior draws for ppc type '", type, "' by default.") } } y <- NULL if (prefix == "ppc") { # y is ignored in prefix 'ppd' plots; get the response variable, # but take care that binomial models use the cbind() lhs resp_terms <- as.character(terms(formula(object$call))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(object$call)[[2]]) } else { if (any(grepl("cbind", resp_terms))) { resp_terms <- resp_terms[-grepl("cbind", resp_terms)] out_name <- resp_terms[1] } } y <- newdata[[out_name]] } # For plotting DS residuals, set y to zero and take # -1 * residual so that errors are in the correct direction if (grepl("resid", type)) { y[!is.na(y)] <- 0 yrep <- t(-1 * residuals(object, summary = FALSE)) if (!is.null(ndraws)) { yrep <- yrep[1:ndraws, ] } } else { pred_args <- list( object, newdata = newdata, ndraws = ndraws, ... ) yrep <- do_call(method, pred_args) } if (anyNA(y)) { warning("NA responses are not shown in 'pp_check'.") take <- !is.na(y) y <- y[take] yrep <- yrep[, take, drop = FALSE] } else { take <- NULL } # Prepare plotting arguments ppc_args <- list() if (prefix == "ppc") { ppc_args$y <- y ppc_args$yrep <- yrep } else if (prefix == "ppd") { ppc_args$ypred <- yrep } if (!is.null(group)) { if (!exists(group, newdata)) { stop(paste0("Variable ", group, " not in newdata"), call. = FALSE) } ppc_args$group <- newdata[[group]] if (!is.null(take)) { ppc_args$group <- ppc_args$group[take] } } is_like_factor <- function(x) { is.factor(x) || is.character(x) || is.logical(x) } if (!is.null(x)) { ppc_args$x <- newdata[[x]] if (!is_like_factor(ppc_args$x)) { ppc_args$x <- as.numeric(ppc_args$x) } if (!is.null(take)) { ppc_args$x <- ppc_args$x[take] } } if ("psis_object" %in% setdiff(names(formals(ppc_fun)), names(ppc_args))) { # ppc_args$psis_object <- do_call( # compute_loo, c(pred_args, criterion = "psis") # ) # compute_loo() not available yet for mvgam ppc_args$psis_object <- NULL } if ("lw" %in% setdiff(names(formals(ppc_fun)), names(ppc_args))) { # ppc_args$lw <- weights( # do_call(compute_loo, c(pred_args, criterion = "psis")) # ) # compute_loo() not available yet for mvgam ppc_args$lw <- NULL } # Most ... arguments are meant for the prediction function for_pred <- names(dots) %in% names(formals(posterior_predict.mvgam)) ppc_args <- c(ppc_args, dots[!for_pred]) # Generate plot out_plot <- do_call(ppc_fun, ppc_args) if ("x" %in% names(formals(ppc_fun)) && !is.null(x)) { out_plot <- out_plot + ggplot2::labs(x = x) } # Improve labels for residual plots if (type %in% c("resid_hist", "resid_hist_grouped")) { out_plot <- out_plot + ggplot2::labs(x = "DS residuals") } if (type %in% c("resid_ribbon", "resid_ribbon_grouped")) { out_plot <- out_plot + ggplot2::theme(legend.position = "none") + ggplot2::labs(y = "DS residuals") } # Reset color scheme and return the plot color_scheme_set(col_scheme) return(out_plot) } ================================================ FILE: R/predict.mvgam.R ================================================ #' Predict from a fitted \pkg{mvgam} model #' #' @importFrom stats predict #' #' @inheritParams brms::fitted.brmsfit #' #' @param object \code{list} object of class \code{mvgam} or \code{jsdgam}. #' See [mvgam()] #' #' @param newdata Optional \code{dataframe} or \code{list} of test data #' containing the same variables that were included in the original `data` #' used to fit the model. If not supplied, predictions are generated for the #' original observations used for the model fit. #' #' @param data_test Deprecated. Still works in place of \code{newdata} but #' users are recommended to use \code{newdata} instead for more seamless #' integration into `R` workflows #' #' @param type When this has the value \code{link} (default) the linear #' predictor is calculated on the link scale. If \code{expected} is used, #' predictions reflect the expectation of the response (the mean) but ignore #' uncertainty in the observation process. When \code{response} is used, the #' predictions take uncertainty in the observation process into account to #' return predictions on the outcome scale. When \code{variance} is used, the #' variance of the response with respect to the mean (mean-variance #' relationship) is returned. When `type = "terms"`, each component of the #' linear predictor is returned separately in the form of a `list` (possibly #' with standard errors, if `summary = TRUE`): this includes parametric model #' components, followed by each smooth component, but excludes any offset and #' any intercept. Two special cases are also allowed: type `latent_N` will #' return the estimated latent abundances from an N-mixture distribution, #' while type `detection` will return the estimated detection probability from #' an N-mixture distribution #' #' @param process_error Logical. If \code{TRUE} and a dynamic trend model was #' fit, expected uncertainty in the process model is accounted for by using #' draws from a stationary, zero-centred multivariate Normal distribution #' using any estimated process variance-covariance parameters. If #' \code{FALSE}, uncertainty in the latent trend component is ignored when #' calculating predictions #' #' @param ... Ignored #' #' @details Note that if your model included a latent temporal trend (i.e. if #' you used something other than `"None"` for the `trend_model` argument), the #' predictions returned by this function will ignore autocorrelation #' coefficients or GP length scale coefficients by *assuming the process is #' stationary*. This approach is similar to how predictions are computed from #' other types of regression models that can include correlated residuals, #' *ultimately treating the temporal dynamics as random effect nuisance #' parameters*. The `predict` function is therefore more suited to #' scenario-based posterior simulation from the GAM components of a #' \code{mvgam} model, while the hindcast / forecast functions #' [hindcast.mvgam()] and [forecast.mvgam()] are better suited to generate #' predictions that respect the temporal dynamics of estimated latent trends #' at the actual time points supplied in `data` and `newdata`. #' #' @return Predicted values on the appropriate scale. #' #' If \code{summary = FALSE} and `type != "terms"`, the output is a matrix of #' dimension `n_draw x n_observations` containing predicted values for each #' posterior draw in `object`. #' #' If \code{summary = TRUE} and `type != "terms"`, the output is an #' \code{n_observations} x \code{E} matrix. The number of summary statistics #' \code{E} is equal to \code{2 + length(probs)}: The \code{Estimate} column #' contains point estimates (either mean or median depending on argument #' \code{robust}), while the \code{Est.Error} column contains uncertainty #' estimates (either standard deviation or median absolute deviation depending #' on argument \code{robust}). The remaining columns starting with \code{Q} #' contain quantile estimates as specified via argument \code{probs}. #' #' If `type = "terms"` and `summary = FALSE`, the output is a named `list` #' containing a separate slot for each effect, with the effects returned as #' matrices of dimension `n_draw x 1`. If `summary = TRUE`, the output #' resembles that from \code{\link[mgcv]{predict.gam}} when using the call #' `predict.gam(object, type = "terms", se.fit = TRUE)`, where mean #' contributions from each effect are returned in `matrix` form while standard #' errors (representing the interval: `(max(probs) - min(probs)) / 2`) are #' returned in a separate `matrix` #' #' @author Nicholas J Clark #' #' @seealso #' [hindcast.mvgam()], #' [forecast.mvgam()], #' [fitted.mvgam()], #' [augment.mvgam()] #' #' @examples #' \dontrun{ #' # Simulate 4 time series with hierarchical seasonality #' # and independent AR1 dynamic processes #' set.seed(123) #' simdat <- sim_mvgam( #' seasonality = 'hierarchical', #' prop_trend = 0.75, #' trend_model = AR(), #' family = gaussian() #' ) #' #' # Fit a model with shared seasonality #' # and AR(1) dynamics #' mod1 <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' data = simdat$data_train, #' family = gaussian(), #' trend_model = AR(), #' noncentred = TRUE, #' chains = 2, #' silent = 2 #' ) #' #' # Generate predictions against observed data #' preds <- predict( #' mod1, #' summary = TRUE #' ) #' head(preds) #' #' # Generate predictions against test data #' preds <- predict( #' mod1, #' newdata = simdat$data_test, #' summary = TRUE #' ) #' head(preds) #' #' # Use plot_predictions(), which relies on predict() #' # to more easily see how the latent AR(1) dynamics are #' # being ignored when using predict() #' plot_predictions( #' mod1, #' by = c('time', 'series', 'series'), #' points = 0.5 #' ) #' #' # Using the hindcast() function will give a more accurate #' # representation of how the AR(1) processes were estimated to give #' # accurate predictions to the in-sample training data #' hc <- hindcast(mod1) #' plot(hc) + #' plot(hc, series = 2) + #' plot(hc, series = 3) #' } #' #' @export predict.mvgam = function( object, newdata, data_test, type = 'link', process_error = FALSE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) { # Argument checks if (!missing("data_test")) { newdata <- data_test } if (missing(newdata)) { newdata <- object$obs_data } if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) # Check names of supplied variables against those required # for prediction validate_predictors(object, newdata) # newdata needs to have a 'series' indicator in it for integrating # over the trend uncertainties if (inherits(object, 'jsdgam')) { newdata <- validate_series_time( data = newdata, trend_model = attr(object$model_data, 'prepped_trend_model'), check_levels = FALSE, check_times = FALSE ) } else { newdata <- validate_series_time( data = newdata, trend_model = object$trend_model, check_levels = FALSE, check_times = FALSE ) } type <- match.arg( arg = type, choices = c( "link", "expected", "response", "variance", "latent_N", "detection", "terms" ) ) if (type == 'latent_N' & object$family != 'nmix') { stop('"latent_N" type only available for N-mixture models', call. = FALSE) } if (type == 'detection' & object$family != 'nmix') { stop('"detection" type only available for N-mixture models', call. = FALSE) } # terms is the easiest return type, so evaluate it first if (type == 'terms') { out <- list() out$obs_effects <- terms_preds( object = object, newdata = newdata, summary = summary, robust = robust, probs = probs, trend_effects = FALSE ) if (!is.null(object$trend_call)) { out$process_effects <- terms_preds( object = object, newdata = newdata, summary = summary, robust = robust, probs = probs, trend_effects = TRUE ) } } else { # If a linear predictor was supplied for the latent process models, calculate # predictions by assuming the trend is stationary (this is basically what brms # does when predicting for autocor() models) if (!is.null(object$trend_call)) { # Linear predictor matrix for the latent process models Xp <- trend_Xp_matrix( newdata = newdata, trend_map = object$trend_map, series = 'all', mgcv_model = object$trend_mgcv_model ) # Extract process error estimates if ( attr(object$model_data, 'trend_model') %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'CAR1', 'ZMVN') ) { if (object$family == 'nmix') { family_pars <- list(sigma_obs = .Machine$double.eps) } else { family_pars <- list( sigma_obs = mcmc_chains(object$model_output, 'sigma') ) } } if (attr(object$model_data, 'trend_model') %in% c('VAR1')) { if (object$use_lv) { family_pars <- list( sigma_obs = mcmc_chains(object$model_output, 'Sigma')[, seq(1, object$n_lv^2, by = object$n_lv + 1) ] ) } else { family_pars <- list( sigma_obs = mcmc_chains(object$model_output, 'Sigma')[, seq(1, NCOL(object$ytimes)^2, by = NCOL(object$ytimes) + 1) ] ) } } # Indicators of which trend to use for each observation if (inherits(newdata, 'list')) { data.frame(series = newdata$series) %>% dplyr::left_join(object$trend_map, by = 'series') %>% dplyr::pull(trend) -> trend_inds newdata_trend <- newdata newdata_trend$trend <- trend_inds } else { newdata %>% dplyr::left_join(object$trend_map, by = 'series') -> newdata_trend } trend_ind <- as.numeric(newdata_trend$trend) # Beta coefficients for GAM process model component betas <- mcmc_chains(object$model_output, 'b_trend') # Family parameters spread into a vector family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, trend_ind]) } else { family_pars[[j]] } }) names(family_extracts) <- names(family_pars) # Trend stationary predictions if (!process_error) { family_extracts <- list(sigma_obs = .Machine$double.eps) } if (inherits(object, 'jsdgam')) { # JSDMs should generate one set of predictions per latent variable and then # create a weighted set of predictions based on the loading estimates lv_coefs <- mcmc_chains(object$model_output, 'lv_coefs') n_draws <- dim(mcmc_chains(object$model_output, 'b'))[1] series_ind <- as.numeric(newdata$series) trend_predictions_raw <- lapply(1:object$n_lv, function(x) { # Linear predictor matrix for the latent process models Xp <- trend_Xp_matrix( newdata = newdata, trend_map = data.frame( trend = x, series = unique(object$trend_map$series) ), series = 'all', mgcv_model = object$trend_mgcv_model ) all_linpreds <- as.matrix(as.vector(t(apply( as.matrix(betas), 1, function(row) Xp %*% row + attr(Xp, 'model.offset') )))) attr(all_linpreds, 'model.offset') <- 0 pred_vec <- mvgam_predict( family = 'gaussian', Xp = all_linpreds, type = 'response', betas = 1, family_pars = family_extracts ) matrix(pred_vec, nrow = NROW(betas)) }) # Create weighted set of predictions using the loadings weighted_mat = function(pred_matrices, weights, draw = 1, obs = 1) { lv_draws <- unlist( lapply(pred_matrices, function(x) x[draw, obs]), use.names = FALSE ) as.vector(lv_draws %*% weights) } trend_predictions <- matrix( NA, nrow = n_draws, ncol = length(newdata[[1]]) ) n_lv <- object$n_lv for (i in 1:n_draws) { for (x in 1:length(newdata[[1]])) { trend_predictions[i, x] <- weighted_mat( trend_predictions_raw, matrix(lv_coefs[i, ], nrow = n_lv)[, series_ind[x]], draw = i, obs = x ) } } trend_predictions <- as.vector(trend_predictions) } else { # Pre-multiply the linear predictors all_linpreds <- as.matrix(as.vector(t(apply( as.matrix(betas), 1, function(row) Xp %*% row + attr(Xp, 'model.offset') )))) attr(all_linpreds, 'model.offset') <- 0 trend_predictions <- mvgam_predict( family = 'gaussian', Xp = all_linpreds, type = 'response', betas = 1, family_pars = family_extracts ) } } else if ( attr(object$model_data, 'trend_model') != 'None' & process_error ) { # If no linear predictor for the trends but a dynamic trend model was used, # and the process_error flag is set to TRUE, # simulate from stationary time series to capture uncertainty # in the dynamic trend component n_draws <- dim(mcmc_chains(object$model_output, 'b'))[1] series_ind <- as.numeric(newdata$series) # Draw from fixed sigma for latent variable models if (object$use_lv & is.null(object$trend_map)) { if (attr(object$model_data, 'trend_model') != 'GP') { trends <- array( rnorm( n_draws * object$n_lv * length(newdata[[1]]), mean = 0, sd = 0.1 ), dim = c(n_draws, object$n_lv, length(newdata[[1]])) ) } else { trends <- array( rnorm( n_draws * object$n_lv * length(newdata[[1]]), mean = 0, sd = 0.25 ), dim = c(n_draws, object$n_lv, length(newdata[[1]])) ) } lv_coefs <- mcmc_chains(object$model_output, 'lv_coefs') trend_predictions <- matrix( NA, nrow = n_draws, ncol = length(newdata[[1]]) ) for (i in 1:n_draws) { for (x in 1:length(newdata[[1]])) { trend_predictions[i, x] <- t(trends[i, , series_ind[x]]) %*% matrix(lv_coefs[i, ], nrow = object$n_lv)[, series_ind[x]] } } trend_predictions <- as.vector(trend_predictions) } if (!object$use_lv | !is.null(object$trend_map)) { if ( attr(object$model_data, 'trend_model') %in% c('RW', 'AR1', 'AR2', 'AR3', 'VAR1', 'CAR1', 'ZMVN') ) { family_pars <- list( sigma_obs = mcmc_chains(object$model_output, 'sigma') ) } if (attr(object$model_data, 'trend_model') %in% c('GP')) { family_pars <- list( sigma_obs = mcmc_chains(object$model_output, 'alpha_gp') ) } if ( attr(object$model_data, 'trend_model') %in% c('PWlogistic', 'PWlinear') ) { trend_hcs <- hindcast(object, type = 'trend') sigma_obs <- unlist( lapply(seq_along(trend_hcs$hindcasts), function(x) { mean(apply(trend_hcs$hindcasts[[x]], 2, sd)) }), use.names = FALSE ) family_pars <- list( sigma_obs = sigma_obs ) } # Indicators of which trend to use for each observation if (!is.null(object$trend_map)) { newdata %>% dplyr::left_join(object$trend_map, by = 'series') -> newdata_trend trend_ind <- as.numeric(newdata_trend$trend) } else { trend_ind <- as.numeric(newdata$series) } # Create a fake design matrix of 1s betas <- matrix( 0, ncol = 1, nrow = dim(mcmc_chains(object$model_output, 'b'))[1] ) Xp <- matrix(1, ncol = 1, nrow = length(newdata$time)) attr(Xp, 'model.offset') <- 0 # Family parameters spread into a vector family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, trend_ind]) } else { family_pars[[j]] } }) names(family_extracts) <- names(family_pars) # Pre-multiply the linear predictors all_linpreds <- as.matrix(as.vector(t(apply( as.matrix(betas), 1, function(row) Xp %*% row + attr(Xp, 'model.offset') )))) attr(all_linpreds, 'model.offset') <- 0 # Trend stationary predictions trend_predictions <- mvgam_predict( family = 'gaussian', Xp = all_linpreds, type = 'response', betas = 1, family_pars = family_extracts ) } } else { # If no trend_model was used, or if process_error == FALSE, # ignore uncertainty in any latent trend component trend_predictions <- 0 } #### Once trend predictions are made, calculate observation predictions #### # Generate linear predictor matrix from the mgcv observation model Xp <- obs_Xp_matrix(newdata = newdata, mgcv_model = object$mgcv_model) # Beta coefficients for GAM component betas <- mcmc_chains(object$model_output, 'b') # Family of model family <- object$family # Family-specific parameters family_pars <- extract_family_pars(object = object) # Determine which series each observation belongs to series_ind <- as.numeric(newdata$series) # Family parameters spread into long vectors family_extracts <- lapply(seq_along(family_pars), function(j) { if (is.matrix(family_pars[[j]])) { as.vector(family_pars[[j]][, series_ind]) } else { as.vector(matrix( rep(family_pars[[j]], NROW(Xp)), nrow = NROW(betas), byrow = FALSE )) } }) names(family_extracts) <- names(family_pars) # Add trial information if this is a Binomial model if (object$family %in% c('binomial', 'beta_binomial')) { resp_terms <- as.character(terms(formula(object))[[2]]) resp_terms <- resp_terms[-grepl('cbind', resp_terms)] trial_name <- resp_terms[2] if (!trial_name %in% names(newdata)) { stop( paste0('variable ', trial_name, ' not found in newdata'), call. = FALSE ) } trials <- newdata[[trial_name]] trials <- as.vector(matrix( rep(as.vector(trials), NROW(betas)), nrow = NROW(betas), byrow = TRUE )) family_extracts$trials <- trials } # Pre-multiply the linear predictors, including any offset and trend # predictions if applicable if (family == 'nmix') { all_linpreds <- as.matrix(as.vector(t(apply( as.matrix(betas), 1, function(row) Xp %*% row + attr(Xp, 'model.offset') )))) latent_lambdas <- exp(trend_predictions) if (!(exists('cap', where = newdata))) { stop( 'Max abundances must be supplied as a variable named "cap" for N-mixture models', call. = FALSE ) } validate_pos_integers(newdata$cap) if (any(is.na(newdata$cap)) | any(is.infinite(newdata$cap))) { stop( paste0('Missing or infinite values found for some "cap" terms'), call. = FALSE ) } cap <- as.vector(t(replicate(NROW(betas), newdata$cap))) } else { all_linpreds <- as.matrix( as.vector(t(apply( as.matrix(betas), 1, function(row) Xp %*% row + attr(Xp, 'model.offset') ))) + trend_predictions ) latent_lambdas <- NULL cap <- NULL } attr(all_linpreds, 'model.offset') <- 0 # Calculate vectorized predictions predictions_vec <- mvgam_predict( family = family, Xp = all_linpreds, latent_lambdas = latent_lambdas, cap = cap, type = type, betas = 1, family_pars = family_extracts ) # Convert back to matrix preds <- matrix(predictions_vec, nrow = NROW(betas)) if (summary) { Qupper <- apply(preds, 2, quantile, probs = max(probs), na.rm = TRUE) Qlower <- apply(preds, 2, quantile, probs = min(probs), na.rm = TRUE) if (robust) { estimates <- apply(preds, 2, median, na.rm = TRUE) errors <- apply(abs(preds - estimates), 2, median, na.rm = TRUE) } else { estimates <- apply(preds, 2, mean, na.rm = TRUE) errors <- apply(preds, 2, sd, na.rm = TRUE) } out <- cbind(estimates, errors, Qlower, Qupper) colnames(out) <- c( 'Estimate', 'Est.Error', paste0('Q', 100 * min(probs)), paste0('Q', 100 * max(probs)) ) } else { out <- preds } } return(out) } #' Term-specific predictions and uncertainties #' @noRd terms_preds = function( object, newdata, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), trend_effects = FALSE ) { if (trend_effects) { Xp <- trend_Xp_matrix( newdata = newdata, trend_map = object$trend_map, series = 'all', mgcv_model = object$trend_mgcv_model ) betas <- mcmc_chains(object$model_output, 'b_trend') effect_names <- colnames(predict( relabel_gps(object$trend_mgcv_model), type = 'terms', se.fit = FALSE )) effect_names <- gsub('series', 'trend', effect_names, fixed = TRUE) coef_names <- names(coef(object$trend_mgcv_model)) coef_names <- gsub('series', 'trend', coef_names, fixed = TRUE) } else { Xp <- obs_Xp_matrix(newdata = newdata, mgcv_model = object$mgcv_model) betas <- mcmc_chains(object$model_output, 'b') effect_names <- colnames(predict( relabel_gps(object$mgcv_model), type = 'terms', se.fit = FALSE )) coef_names <- names(coef(object$mgcv_model)) } # Contributions considering full uncertainties contributions <- serrors <- vector( mode = 'list', length = length(effect_names) ) for (i in seq_along(effect_names)) { effect_idxs <- grep(effect_names[i], coef_names, fixed = TRUE) linpred <- as.matrix(as.vector(t(apply( as.matrix(betas[, effect_idxs, drop = FALSE]), 1, function(row) Xp[, effect_idxs, drop = FALSE] %*% row )))) contributions[[i]] <- matrix(linpred, nrow = NROW(betas)) if (summary) { serrors[[i]] <- (apply( contributions[[i]], 2, function(x) quantile(x, probs = max(probs), na.rm = TRUE) ) - apply( contributions[[i]], 2, function(x) quantile(x, probs = min(probs), na.rm = TRUE) )) / 2 if (robust) { contributions[[i]] <- apply(contributions[[i]], 2, median) } else { contributions[[i]] <- apply(contributions[[i]], 2, mean) } } } if (summary) { out <- list() contributions <- do.call(cbind, contributions) serrors <- do.call(cbind, serrors) colnames(contributions) <- colnames(serrors) <- effect_names out$fit <- contributions out$se.fit <- serrors } else { names(contributions) <- effect_names out <- contributions } return(out) } ================================================ FILE: R/print.mvgam.R ================================================ #' Print a fitted \pkg{mvgam} object #' #' This function takes a fitted \code{mvgam} or \code{jsdgam} object and prints #' a quick summary. #' #' @param x \code{list} object returned from \code{mvgam} #' #' @param ... Ignored #' #' @details A brief summary of the model's call is printed #' #' @return A \code{list} is printed on-screen #' #' @author Nicholas J Clark #' #' @export print.mvgam = function(x, ...) { object <- x # Use shared extractor functions to eliminate code duplication model_spec <- extract_model_spec(object) sampling_info <- extract_sampling_info(object) # Print model specification with simplified logic for print.mvgam print_model_specification_simple(model_spec) # Print sampling information using shared helper print_sampling_information(sampling_info) } #' Print model specification section for print.mvgam (simplified version) #' @param model_spec Model specification from extract_model_spec #' @noRd print_model_specification_simple <- function(model_spec) { # Print formulas if (!is.null(model_spec$formulas$process)) { cat("GAM observation formula:\n") print(model_spec$formulas$observation) cat("\nGAM process formula:\n") print(model_spec$formulas$process) } else { cat("GAM formula:\n") print(model_spec$formulas$observation) } # Print family and link cat("\nFamily:\n") cat(paste0(model_spec$family, '\n')) cat("\nLink function:\n") cat(paste0(model_spec$link, '\n')) # Print trend model if (!model_spec$is_jsdgam) { cat("\nTrend model:\n") if (is.call(model_spec$trend_model)) { print(model_spec$trend_model) cat('\n') } else { cat(paste0(model_spec$trend_model, '\n')) } } # Print latent variable info (simplified - always "latent factors" for print.mvgam) if (!is.null(model_spec$latent_variables)) { cat("\nN latent factors:\n") cat(model_spec$latent_variables$count, '\n') } # Print dimensions if (model_spec$is_jsdgam) { cat('\nN species:\n') cat(model_spec$dimensions$n_species, '\n') cat('\nN sites:\n') cat(model_spec$dimensions$n_sites, '\n') } else { cat('\nN series:\n') cat(model_spec$dimensions$n_series, '\n') cat('\nN timepoints:\n') cat(model_spec$dimensions$n_timepoints, '\n') } # Print upper bounds if present if (!is.null(model_spec$upper_bounds)) { cat('\nUpper bounds:\n') cat(model_spec$upper_bounds, '\n') } } #'@export print.mvgam_prefit = function(x, ...) { object <- x # Use shared extractor function for model specification model_spec <- extract_model_spec(object) # Print model specification using shared helper print_model_specification(model_spec) # Add prefit-specific status message cat('\nStatus:\n') cat('Not fitted', '\n') } ================================================ FILE: R/residual_cor.R ================================================ #' Extract residual correlations based on latent factors #' #' Compute residual correlation estimates from Joint Species Distribution #' (\code{jsdgam}) or \code{mvgam} models that either used latent factors #' or included correlated process errors directly #' #' @name residual_cor.jsdgam #' #' @inheritParams brms::residuals.brmsfit #' #' @param object \code{list} object of class \code{mvgam} resulting from a #' call to [jsdgam()] or a call to [mvgam()] in which either #' `use_lv = TRUE` or a multivariate process was used with `cor = TRUE` #' (see [RW()] and [VAR()] for examples) #' #' @param robust If `FALSE` (the default) the mean is used as a measure of #' central tendency. If `TRUE`, the median is used instead. Only used if #' `summary` is `TRUE` #' #' @param ... ignored #' #' @return If `summary = TRUE`, a `list` of #' \code{\link{mvgam_residcor-class}} with the following components: #' \item{cor, cor_lower, cor_upper}{A set of \eqn{p \times p} correlation #' matrices, containing either the posterior median or mean estimate, plus #' lower and upper limits of the corresponding credible intervals supplied #' to `probs`} #' \item{sig_cor}{A \eqn{p \times p} correlation matrix containing only #' correlations whose credible interval does not contain zero. All other #' correlations are set to zero} #' \item{prec, prec_lower, prec_upper}{A set of \eqn{p \times p} precision #' matrices, containing either the posterior median or mean estimate, plus #' lower and upper limits of the corresponding credible intervals supplied #' to `probs`} #' \item{sig_prec}{A \eqn{p \times p} precision matrix containing only #' precisions whose credible interval does not contain zero. All other #' precisions are set to zero} #' \item{cov}{A \eqn{p \times p} posterior median or mean covariance #' matrix} #' \item{trace}{The median/mean point estimator of the trace (sum of the #' diagonal elements) of the residual covariance matrix `cov`} #' #' If `summary = FALSE`, this function returns a `list` containing the #' following components: #' \item{all_cormat}{A \eqn{n_{draws} \times p \times p} `array` of #' posterior residual correlation matrix draws} #' \item{all_covmat}{A \eqn{n_{draws} \times p \times p} `array` of #' posterior residual covariance matrix draws} #' \item{all_presmat}{A \eqn{n_{draws} \times p \times p} `array` of #' posterior residual precision matrix draws} #' \item{all_trace}{A \eqn{n_{draws}} `vector` of posterior covariance #' trace draws} #' #' @details #' See \code{\link{mvgam_residcor-class}} for a description of the quantities #' that are computed and returned by this function, along with key references. #' #' @references Hui, F. K. C. (2016). boral – Bayesian Ordination and #' Regression Analysis of Multivariate Abundance Data in r. \emph{Methods #' in Ecology and Evolution}, 7(6), 744-750. #' \doi{10.1111/2041-210X.12514} #' #' @seealso [jsdgam()], [lv_correlations()], \code{\link{mvgam_residcor-class}} #' #' @export residual_cor <- function(object, ...) { UseMethod("residual_cor", object) } #' @rdname residual_cor.jsdgam #' @method residual_cor mvgam #' @export residual_cor.mvgam <- function( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) { # Only applicable if this is a dynamic factor model or a model # that included a process error variance-covariance matrix if ( any( grepl( 'Sigma', variables(object)$trend_pars$orig_name ) ) | object$use_lv ) { class(object) <- c('jsdgam', 'mvgam') return( residual_cor( object, object = object, summary = summary, robust = robust, probs = probs, ... ) ) } else { stop( paste0( 'Cannot compute residual correlations if no latent factors ', 'or correlated process errors were modelled' ), call. = FALSE ) } } #' @rdname residual_cor.jsdgam #' @method residual_cor jsdgam #' @examples #'\dontrun{ #' # Fit a JSDGAM to the portal_data captures #' mod <- jsdgam( #' formula = captures ~ #' # Fixed effects of NDVI and mintemp, row effect as a GP of time #' ndvi_ma12:series + mintemp:series + gp(time, k = 15), #' factor_formula = ~ -1, #' data = portal_data, #' unit = time, #' species = series, #' family = poisson(), #' n_lv = 2, #' silent = 2, #' chains = 2 #' ) #' #' # Plot residual correlations #' plot( #' residual_cor(mod) #' ) #' #' # Compare to a residual ordination biplot #' if(requireNamespace('ggrepel', quietly = TRUE)){ #' ordinate(mod) #' } #' #' # Not needed for general use; cleans up connections for automated testing #' closeAllConnections() #' } #' @export residual_cor.jsdgam <- function( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) { insight::check_if_installed("corpcor") if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) # Initiate objects to store all posterior correlation and covariance matrices p <- NCOL(object$ytimes) sp_names <- levels(object$obs_data$series) ndraws <- brms::ndraws(as_draws_array(object, variable = 'betas')) all_cormat <- all_covmat <- all_precmat <- array( 0, dim = c(ndraws, p, p) ) all_trace_rescor <- numeric(ndraws) # Check whether this model included a full variance-covariance matrix use_lv <- TRUE if ( any( grepl( 'Sigma', variables(object)$trend_pars$orig_name ) ) ) { # Use the factors if they were supplied; otherwise # use the full variance-covariance matrix if (object$use_lv) { use_lv <- TRUE } else { use_lv <- FALSE } } if (use_lv) { # Take draws of factor loadings to compute residual correlations, # covariances, and precisions n_lv <- object$n_lv loadings <- as.matrix(object$model_output, 'lv_coefs') # Calculate posterior covariance, correlation, precision and trace estimates for (i in 1:ndraws) { lv_coefs <- matrix(loadings[i, ], nrow = p, ncol = n_lv) lambdalambdaT <- tcrossprod(lv_coefs) all_covmat[i, , ] <- lambdalambdaT all_trace_rescor[i] <- sum(diag(lambdalambdaT)) all_cormat[i, , ] <- cov2cor(lambdalambdaT) all_precmat[i, , ] <- corpcor::cor2pcor(lambdalambdaT) } } else { # If the model already included a variance-covariance matrix, # compute directly Sigma_post <- as.matrix( object, variable = "Sigma", regex = TRUE ) for (i in 1:ndraws) { cov <- matrix( Sigma_post[i, ], nrow = p, ncol = p ) all_covmat[i, , ] <- cov all_trace_rescor[i] <- sum(diag(cov)) all_cormat[i, , ] <- cov2cor(cov) all_precmat[i, , ] <- corpcor::cor2pcor(cov) } } if (!summary) { out <- list( all_cormat = all_cormat, all_covmat = all_covmat, all_precmat = all_precmat, all_trace = all_trace_rescor ) } else { #### If summary, calculate summary statistics #### # Initiate summary correlation and covariance matrices sig_cormat <- cormat <- cormat_lower <- cormat_upper <- sig_precmat <- precmat <- precmat_lower <- precmat_upper <- covmat <- matrix(0, nrow = p, ncol = p) rownames(cormat) <- rownames(cormat_lower) <- rownames(cormat_upper) <- rownames(sig_cormat) <- rownames(precmat) <- rownames(precmat_lower) <- rownames(precmat_upper) <- rownames(sig_precmat) <- rownames(covmat) <- colnames(cormat) <- colnames(cormat_lower) <- colnames( cormat_upper ) <- colnames(sig_cormat) <- colnames(precmat) <- colnames( precmat_lower ) <- colnames(precmat_upper) <- colnames(sig_precmat) <- colnames( covmat ) <- sp_names # Calculate posterior summaries for (j in 1:p) { for (j2 in 1:p) { if (robust) { covmat[j, j2] <- median(all_covmat[, j, j2]) cormat[j, j2] <- median(all_cormat[, j, j2]) precmat[j, j2] <- median(all_precmat[, j, j2]) } else { covmat[j, j2] <- mean(all_covmat[, j, j2]) cormat[j, j2] <- mean(all_cormat[, j, j2]) precmat[j, j2] <- mean(all_precmat[, j, j2]) } sig_cormat[j, j2] <- cormat[j, j2] cormat_lower[j, j2] <- quantile( all_cormat[, j, j2], probs = min(probs), na.rm = TRUE ) cormat_upper[j, j2] <- quantile( all_cormat[, j, j2], probs = max(probs), na.rm = TRUE ) if (0 > cormat_lower[j, j2] & 0 < cormat_upper[j, j2]) { sig_cormat[j, j2] <- 0 } sig_precmat[j, j2] <- precmat[j, j2] precmat_lower[j, j2] <- quantile( all_precmat[, j, j2], probs = min(probs), na.rm = TRUE ) precmat_upper[j, j2] <- quantile( all_precmat[, j, j2], probs = max(probs), na.rm = TRUE ) if (0 > precmat_lower[j, j2] & 0 < precmat_upper[j, j2]) { sig_precmat[j, j2] <- 0 } } } if (robust) { final_trace <- median(all_trace_rescor) } else { final_trace <- mean(all_trace_rescor) } out <- structure( list( cor = cormat, cor_lower = cormat_lower, cor_upper = cormat_upper, sig_cor = sig_cormat, cov = covmat, prec = precmat, prec_lower = precmat_lower, prec_upper = precmat_upper, sig_prec = sig_precmat, trace = final_trace ), class = 'mvgam_residcor' ) } return(out) } ================================================ FILE: R/residuals.mvgam.R ================================================ #' Posterior draws of residuals from \pkg{mvgam} models #' #' This method extracts posterior draws of Dunn-Smyth (randomized quantile) #' residuals in the order in which the data were supplied to the model. It #' includes additional arguments for obtaining summaries of the computed #' residuals. #' #' @inheritParams brms::residuals.brmsfit #' #' @param object An object of class `mvgam` #' #' @param ... Ignored #' #' @details This method gives residuals as Dunn-Smyth (randomized quantile) #' residuals. Any observations that were missing (i.e. `NA`) in the original #' data will have missing values in the residuals. #' #' @return An \code{array} of randomized quantile residual values. #' #' If \code{summary = FALSE} the output resembles those of #' \code{\link{posterior_epred.mvgam}} and \code{\link{predict.mvgam}}. #' #' If \code{summary = TRUE} the output is an \code{n_observations} x \code{E} #' matrix. The number of summary statistics \code{E} is equal to \code{2 + #' length(probs)}. The \code{Estimate} column contains point estimates (either #' mean or median depending on argument \code{robust}), while the #' \code{Est.Error} column contains uncertainty estimates (either standard #' deviation or median absolute deviation depending on argument #' \code{robust}). The remaining columns starting with \code{Q} contain #' quantile estimates as specified via argument \code{probs}. #' #' @seealso #' \code{\link{augment.mvgam}} #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # Simulate some data and fit a model #' simdat <- sim_mvgam(n_series = 1, trend_model = AR()) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Extract posterior residuals #' resids <- residuals(mod) #' str(resids) #' #' # Or add them directly to the observed data, along with fitted values #' augment(mod, robust = FALSE, probs = c(0.25, 0.75)) #' } #' #' @export residuals.mvgam <- function( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) { if (length(probs) != 2L) { stop("argument 'probs' must be a vector of length 2", call. = FALSE) } validate_proportional(min(probs)) validate_proportional(max(probs)) # What was the original time / series order? orig_order <- data.frame( series = object$obs_data$series, time = object$obs_data$index..time..index ) series_numeric <- as.numeric(orig_order$series) time_numeric <- match(orig_order$time, unique(orig_order$time)) # Build a matrix to return residuals in this order resid_matrix <- matrix( NA, nrow = NROW(orig_order), ncol = NROW(object$resids[[1]]) ) for (i in 1:NROW(resid_matrix)) { resid_matrix[i, ] <- object$resids[[series_numeric[i]]][, time_numeric[i]] } if (summary) { Qupper <- apply(resid_matrix, 1, quantile, probs = max(probs), na.rm = TRUE) Qlower <- apply(resid_matrix, 1, quantile, probs = min(probs), na.rm = TRUE) if (robust) { estimates <- apply(resid_matrix, 1, median, na.rm = TRUE) errors <- apply(abs(resid_matrix - estimates), 1, median, na.rm = TRUE) } else { estimates <- apply(resid_matrix, 1, mean, na.rm = TRUE) errors <- apply(resid_matrix, 1, sd, na.rm = TRUE) } out <- cbind(estimates, errors, Qlower, Qupper) colnames(out) <- c( 'Estimate', 'Est.Error', paste0('Q', 100 * min(probs)), paste0('Q', 100 * max(probs)) ) } else { out <- resid_matrix } return(out) } ================================================ FILE: R/sanitise_modelfile.R ================================================ #' Clean up a stan file #' @noRd sanitise_modelfile = function(model_file) { # Remove empty lines clean_up <- vector() for (x in 1:length(model_file)) { clean_up[x] <- trimws(model_file[x]) == "" | trimws(model_file[x]) == "NA" } clean_up[is.na(clean_up)] <- FALSE model_file <- model_file[!clean_up] # Expand on backslashes to make model more readable hashes <- vector() hashes[1] <- FALSE for (x in 2:length(model_file)) { hashes[x] <- grepl('//', model_file[x], fixed = TRUE) & trimws(model_file[x - 1]) != "" & (!grepl('{', model_file[x - 1], fixed = TRUE) | grepl('}', model_file[x - 1], fixed = TRUE)) & !grepl(';', model_file[x], fixed = TRUE) } if (any(hashes)) { model_file[hashes] <- paste0('\n', model_file[hashes]) } model_file <- readLines(textConnection(model_file), n = -1) return(model_file) } ================================================ FILE: R/score.mvgam_forecast.R ================================================ #' @title Compute probabilistic forecast scores for \pkg{mvgam} models #' #' @param object `mvgam_forecast` object. See [forecast.mvgam()]. If the test #' data supplied to \code{forecast.mvgam} contained out of sample test #' observations, the calibration of probabilistic forecasts can be scored #' using proper scoring rules #' #' @param ... Ignored #' #' @param score \code{character} specifying the type of proper scoring rule #' to use for evaluation. Options are: `sis` (i.e. the Scaled Interval #' Score), `energy`, `variogram`, `elpd` (i.e. the Expected log pointwise #' Predictive Density), `drps` (i.e. the Discrete Rank Probability Score), #' `crps` (the Continuous Rank Probability Score) or `brier` (the latter #' of which is only applicable for `bernoulli` models. Note that when #' choosing `elpd`, the supplied object must have forecasts on the `link` #' scale so that expectations can be calculated prior to scoring. If #' choosing `brier`, the object must have forecasts on the `expected` scale #' (i.e. probability predictions). For all other scores, forecasts should #' be supplied on the `response` scale (i.e. posterior predictions) #' #' @param log \code{logical}. Should the forecasts and truths be logged #' prior to scoring? This is often appropriate for comparing performance #' of models when series vary in their observation ranges. Ignored if #' `score = 'brier'` #' #' @param weights optional \code{vector} of weights (where #' \code{length(weights) == n_series}) for weighting pairwise correlations #' when evaluating the variogram score for multivariate forecasts. Useful #' for down-weighting series that have larger magnitude observations or #' that are of less interest when forecasting. Ignored if #' \code{score != 'variogram'} #' #' @param interval_width proportional value on `[0.05,0.95]` defining the #' forecast interval for calculating coverage and, if `score = 'sis'`, for #' calculating the interval score. Ignored if `score = 'brier'` #' #' @param n_cores \code{integer} specifying number of cores for calculating #' scores in parallel #' #' @return A \code{list} containing scores and interval coverages per #' forecast horizon. If \code{score %in% c('drps', 'crps', 'elpd', 'brier')}, #' the list will also contain return the sum of all series-level scores #' per horizon. If \code{score %in% c('energy','variogram')}, no #' series-level scores are computed and the only score returned will be #' for all series. For all scores apart from `elpd` and `brier`, the #' `in_interval` column in each series-level slot is a binary indicator of #' whether or not the true value was within the forecast's corresponding #' posterior empirical quantiles. Intervals are not calculated when using #' `elpd` because forecasts will only contain the linear predictors #' #' @author Nicholas J Clark #' #' @method score mvgam_forecast #' #' @references Gneiting, T. and Raftery, A. E. (2007). Strictly Proper #' Scoring Rules, Prediction, and Estimation. \emph{Journal of the American #' Statistical Association}, 102(477), 359-378. #' \doi{10.1198/016214506000001437} #' #' @seealso \code{\link{forecast.mvgam}}, \code{\link{ensemble}} #' #' @examples #' \dontrun{ #' # Simulate observations for three count-valued time series #' data <- sim_mvgam() #' #' # Fit a dynamic model using 'newdata' to automatically produce forecasts #' mod <- mvgam( #' y ~ 1, #' trend_model = RW(), #' data = data$data_train, #' newdata = data$data_test, #' chains = 2, #' silent = 2 #' ) #' #' # Extract forecasts into a 'mvgam_forecast' object #' fc <- forecast(mod) #' plot(fc) #' #' # Compute Discrete Rank Probability Scores and 0.90 interval coverages #' fc_scores <- score(fc, score = 'drps') #' str(fc_scores) #' #' # An example using binary data #' data <- sim_mvgam(family = bernoulli()) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc', k = 6), #' trend_model = AR(), #' data = data$data_train, #' newdata = data$data_test, #' family = bernoulli(), #' chains = 2, #' silent = 2 #' ) #' #' # Extract forecasts on the expectation (probability) scale #' fc <- forecast(mod, type = 'expected') #' plot(fc) #' #' # Compute Brier scores #' fc_scores <- score(fc, score = 'brier') #' str(fc_scores) #' } #' #' @export score.mvgam_forecast = function( object, score = 'crps', log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) { score <- match.arg( arg = score, choices = c('crps', 'drps', 'brier', 'elpd', 'sis', 'energy', 'variogram') ) if (object$type == 'trend') { stop( 'cannot evaluate accuracy of latent trend forecasts. Use "type == response" when forecasting instead', call. = FALSE ) } if (object$type != 'link' & score == 'elpd') { stop( 'cannot evaluate elpd scores unless linear predictors are supplied. Use "type == link" when forecasting instead', call. = FALSE ) } if (object$type != 'expected' & score == 'brier') { stop( 'cannot evaluate brier scores unless probability predictions are supplied. Use "type == expected" when forecasting instead', call. = FALSE ) } validate_pos_integer(n_cores) validate_proportional(interval_width) if (interval_width < 0.05 || interval_width > 0.95) { stop('interval width must be between 0.05 and 0.95, inclusive') } # Get truths (out of sample) into correct format n_series <- length(object$series_names) truths <- do.call( rbind, lapply(seq_len(n_series), function(series) { object$test_observations[[series]] }) ) if (score == 'elpd') { # Get linear predictor forecasts into the correct format linpreds <- do.call(cbind, object$forecasts) # Build a dataframe for indexing which series each observation belongs to newdata <- data.frame( series = factor( sort(rep(object$series_names, NCOL(object$forecasts[[1]]))), levels = levels(object$series_names) ), y = unname(unlist(object$test_observations)) ) class(object) <- c('mvgam', class(object)) # Calculate log-likelihoods elpd_score <- logLik( object = object, linpreds = linpreds, newdata = newdata, family_pars = object$family_pars, n_cores = n_cores ) elpd_score <- apply(elpd_score, 2, log_mean_exp) # Construct series-level score dataframes series_score <- lapply(seq_len(n_series), function(series) { DRPS <- data.frame(drps_mcmc_object( truths[series, ], object$forecasts[[series]], log = log, interval_width = interval_width )) data.frame( score = elpd_score[which( newdata$series == levels(object$series_names)[series] )], eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'elpd' ) }) names(series_score) <- object$series_names all_scores <- data.frame( score = rowSums(do.call( cbind, lapply(seq_len(n_series), function(series) { series_score[[series]]$score }) )), eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'sum_elpd' ) series_score$all_series <- all_scores } if (score %in% c('energy', 'variogram')) { if (missing(weights)) { weights <- rep(1, length(object$series_names)) } # Calculate coverage using one of the univariate scores series_score <- lapply(seq_len(n_series), function(series) { DRPS <- data.frame(drps_mcmc_object( truths[series, ], object$forecasts[[series]], log = log, interval_width = interval_width )) colnames(DRPS) <- c('score', 'in_interval') DRPS$interval_width <- interval_width DRPS$eval_horizon <- seq(1, NCOL(object$forecasts[[1]])) DRPS[, 2:4] }) names(series_score) <- object$series_names if (score == 'variogram') { var_score <- variogram_mcmc_object( truths = truths, fcs = object$forecasts, log = log, weights = weights ) series_score$all_series <- data.frame( score = var_score, eval_horizon = 1:NCOL(object$forecasts[[1]]), score_type = 'variogram' ) } if (score == 'energy') { en_score <- energy_mcmc_object( truths = truths, fcs = object$forecasts, log = log ) series_score$all_series <- data.frame( score = en_score, eval_horizon = 1:NCOL(object$forecasts[[1]]), score_type = 'energy' ) } } if (score == 'sis') { series_score <- lapply(seq_len(n_series), function(series) { SIS <- data.frame(sis_mcmc_object( truths[series, ], object$forecasts[[series]], log = log, interval_width = interval_width )) colnames(SIS) <- c('score', 'in_interval') SIS$interval_width <- interval_width SIS$eval_horizon <- seq(1, NCOL(object$forecasts[[1]])) SIS$score_type <- 'sis' SIS }) names(series_score) <- object$series_names all_scores <- data.frame( score = rowSums(do.call( cbind, lapply(seq_len(n_series), function(series) { series_score[[series]]$score }) )), eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'sum_sis' ) series_score$all_series <- all_scores } if (score == 'drps') { series_score <- lapply(seq_len(n_series), function(series) { DRPS <- data.frame(drps_mcmc_object( truths[series, ], object$forecasts[[series]], log = log, interval_width = interval_width )) colnames(DRPS) <- c('score', 'in_interval') DRPS$interval_width <- interval_width DRPS$eval_horizon <- seq(1, NCOL(object$forecasts[[1]])) DRPS$score_type <- 'drps' DRPS }) names(series_score) <- object$series_names all_scores <- data.frame( score = rowSums(do.call( cbind, lapply(seq_len(n_series), function(series) { series_score[[series]]$score }) )), eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'sum_drps' ) series_score$all_series <- all_scores } if (score == 'crps') { series_score <- lapply(seq_len(n_series), function(series) { CRPS <- data.frame(crps_mcmc_object( truths[series, ], object$forecasts[[series]], log = log, interval_width = interval_width )) colnames(CRPS) <- c('score', 'in_interval') CRPS$interval_width <- interval_width CRPS$eval_horizon <- seq(1, NCOL(object$forecasts[[1]])) CRPS$score_type <- 'crps' CRPS }) names(series_score) <- object$series_names all_scores <- data.frame( score = rowSums(do.call( cbind, lapply(seq_len(n_series), function(series) { series_score[[series]]$score }) )), eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'sum_crps' ) series_score$all_series <- all_scores } if (score == 'brier') { if (object$family != 'bernoulli') { stop('brier score only applicable for Bernoulli forecasts', call. = FALSE) } series_score <- lapply(seq_len(n_series), function(series) { BRIER <- data.frame(brier_mcmc_object( truths[series, ], object$forecasts[[series]], log = log )) colnames(BRIER) <- c('score', 'in_interval') BRIER$interval_width <- interval_width BRIER$eval_horizon <- seq(1, NCOL(object$forecasts[[1]])) BRIER$score_type <- 'brier' BRIER }) names(series_score) <- object$series_names all_scores <- data.frame( score = rowSums(do.call( cbind, lapply(seq_len(n_series), function(series) { series_score[[series]]$score }) )), eval_horizon = seq(1, NCOL(object$forecasts[[1]])), score_type = 'sum_brier' ) series_score$all_series <- all_scores } series_score } #'@name score.mvgam_forecast #'@param object `mvgam_forecast` object. See [forecast.mvgam()]. #'@param ... Ignored #'@export score = function(object, ...) { UseMethod("score", object) } ================================================ FILE: R/series_to_mvgam.R ================================================ #' Convert timeseries object to format necessary for \pkg{mvgam} models #' #' This function converts univariate or multivariate time series (\code{xts} or #' \code{ts} objects) to the format necessary for \code{\link{mvgam}}. #' #' @importFrom stats is.ts ts start time frequency #' #' @importFrom utils head #' #' @param series \code{\link[xts]{xts}} or \code{\link[stats]{ts}} object to be #' converted to \code{\link{mvgam}} format #' #' @param freq \code{integer}. The seasonal frequency of the series #' #' @param train_prop \code{numeric} stating the proportion of data to use for #' training. Should be between \code{0.25} and \code{0.95} #' #' @return A \code{list} object containing outputs needed for #' \code{\link{mvgam}}, including 'data_train' and 'data_test' #' #' @examples #' # A ts object example #' data("sunspots") #' series <- cbind(sunspots, sunspots) #' colnames(series) <- c('blood', 'bone') #' head(series) #' series_to_mvgam(series, frequency(series), 0.85) #' #' # An xts object example #' library(xts) #' dates <- seq(as.Date("2001-05-01"), length = 30, by = "quarter") #' #' data <- cbind( #' c(gas = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))), #' c(oil = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))) #' ) #' #' series <- xts(x = data, order.by = dates) #' colnames(series) <- c('gas', 'oil') #' head(series) #' series_to_mvgam(series, freq = 4, train_prop = 0.85) #' #' @export series_to_mvgam <- function(series, freq, train_prop = 0.85) { # Check for xts and lubridate packages insight::check_if_installed("xts") insight::check_if_installed("lubridate") # Check series format type <- 'wrong' if (is.ts(series)) { type <- 'ts' } if (xts::is.xts(series)) { type <- 'xts' } if (type == 'wrong') { stop("series must be either a ts or xts object") } # Extract information on years and seasons from the series object if (type == 'ts') { dates <- lubridate::date_decimal(as.numeric(time(series))) years <- lubridate::year(dates) seasons <- as.vector(1 + ((time(series) %% 1) * frequency(series))) } # Function to convert xts to ts object xts.to.ts <- function(x, freq = 52) { start_time <- head( 1 + (round( (lubridate::yday(lubridate::date(time(series))) / 365) * freq, 0 )), 1 ) ts( as.numeric(x), start = c(lubridate::year(start(x)), start_time), frequency = freq ) } if (type == 'xts') { dates <- lubridate::date(time(series)) years <- lubridate::year(time(series)) seasons <- as.vector( 1 + ((time(xts.to.ts(series[, 1], freq = freq)) %% 1) * freq) ) } # Extract remaining information and put into correct format n_series <- NCOL(series) T <- NROW(series) series_names <- factor(colnames(series), levels = colnames(series)) if (length(levels(series_names)) == 0) { series_names <- factor( paste0('series_', seq(1, n_series)), levels = paste0('series_', seq(1, n_series)) ) } mvgam_data = data.frame( y = as.vector(series), season = rep(seasons, n_series), year = rep(years, n_series), date = rep(dates, n_series), series = as.factor(sort(rep(series_names, T))) ) %>% dplyr::arrange(year, season, series) mvgam_data %>% dplyr::left_join( mvgam_data %>% dplyr::select(year, season) %>% dplyr::distinct() %>% dplyr::arrange(year, season) %>% dplyr::mutate(time = dplyr::row_number()), by = c('season', 'year') ) -> mvgam_data # Split into training and testing and return last_time <- floor(max(mvgam_data$time) * train_prop) return(list( data_train = mvgam_data %>% dplyr::filter(time <= last_time), data_test = mvgam_data %>% dplyr::filter(time > last_time) )) } ================================================ FILE: R/shared_obs_params.R ================================================ #' Updates for allowing shared observation params across series #' @noRd shared_obs_params = function(model_file, family) { if (family == 'poisson') { message( 'Context share_obs_params: Poisson family has no additional observation params' ) model_file <- model_file } if (family == 'nmix') { message( 'Context share_obs_params: nmix family has no additional observation params' ) model_file <- model_file } if (family %in% c('student', 'gaussian', 'lognormal')) { model_file[grep( "vector[n_series] sigma_obs;", model_file, fixed = TRUE )] <- "real sigma_obs;" model_file <- model_file[ -grep( "flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];", model_file, fixed = TRUE ) ] model_file <- model_file[ -grep("vector[n_nonmissing] flat_sigma_obs;", model_file, fixed = TRUE) ] model_file[grep("flat_sigma_obs);", model_file, fixed = TRUE)] <- 'sigma_obs);' if (any(grepl("flat_sigma_obs,", model_file, fixed = TRUE))) { model_file[grep("flat_sigma_obs,", model_file, fixed = TRUE)] <- "sigma_obs," model_file[grep( "data vector Y, matrix X, vector b, vector sigma_obs, real alpha) {", model_file, fixed = TRUE )] <- "data vector Y, matrix X, vector b, real sigma_obs, real alpha) {" model_file[grep( "ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha, b, sigma_obs[start:end]);", model_file, fixed = TRUE )] <- "ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha, b, sigma_obs);" } model_file[grep( "sigma_obs_vec[1:n,s] = rep_vector(sigma_obs[s], n);", model_file, fixed = TRUE )] <- "sigma_obs_vec[1:n,s] = rep_vector(sigma_obs, n);" } if (family == 'student') { model_file[grep( "vector[n_series] nu;", model_file, fixed = TRUE )] <- "real nu;" model_file <- model_file[ -grep("flat_nu = rep_each(nu, n)[obs_ind];", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("vector[n_nonmissing] flat_nu;", model_file, fixed = TRUE) ] model_file[grep( "flat_ys ~ student_t(flat_nu,", model_file, fixed = TRUE )] <- "flat_ys ~ student_t(nu," model_file[grep( "nu_vec[1:n,s] = rep_vector(nu[s], n);", model_file, fixed = TRUE )] <- "nu_vec[1:n,s] = rep_vector(nu, n);" } if (family == 'negative binomial') { model_file[grep( 'vector[n_series] phi_inv;', model_file, fixed = TRUE )] <- 'real phi_inv;' model_file <- model_file[ -grep( 'flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);', model_file, fixed = TRUE ) ] model_file <- model_file[ -grep("real flat_phis[n_nonmissing];", model_file, fixed = TRUE) ] model_file[grep("inv(flat_phis));", model_file, fixed = TRUE)] <- 'inv(phi_inv));' model_file[grep("phi = inv(phi_inv);", model_file, fixed = TRUE)] <- "phi = rep_vector(inv(phi_inv), n_series);" } if (family == 'beta') { model_file[grep( 'vector[n_series] phi;', model_file, fixed = TRUE )] <- 'real phi;' model_file <- model_file[ -grep('flat_phis = rep_each(phi, n)[obs_ind];', model_file, fixed = TRUE) ] model_file <- model_file[ -grep("vector[n_nonmissing] flat_phis;", model_file, fixed = TRUE) ] model_file[grep( "inv_logit(flat_xs * b) .* flat_phis,", model_file, fixed = TRUE )] <- "inv_logit(flat_xs * b) .* phi," model_file[grep( "(1 - inv_logit(flat_xs * b)) .* flat_phis);", model_file, fixed = TRUE )] <- "(1 - inv_logit(flat_xs * b)) .* phi);" model_file[grep( "inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0)) .* flat_phis,", model_file, fixed = TRUE )] <- "inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0)) .* phi," model_file[grep( "(1 - inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0))) .* flat_phis);", model_file, fixed = TRUE )] <- "(1 - inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0))) .* phi);" model_file[grep( "phi_vec[1:n,s] = rep_vector(phi[s], n);", model_file, fixed = TRUE )] <- "phi_vec[1:n,s] = rep_vector(phi, n);" } if (family == 'Gamma') { model_file[grep( "vector[n_series] shape;", model_file, fixed = TRUE )] <- "real shape;" model_file <- model_file[ -grep( "flat_shapes = rep_each(shape, n)[obs_ind];", model_file, fixed = TRUE ) ] model_file <- model_file[ -grep("vector[n_nonmissing] flat_shapes;", model_file, fixed = TRUE) ] model_file[grep( "flat_shapes, flat_shapes ./ exp(flat_xs * b));", model_file, fixed = TRUE )] <- "shape, shape ./ exp(flat_xs * b));" model_file[grep( "flat_shapes, flat_shapes ./ exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)));", model_file, fixed = TRUE )] <- "shape, shape ./ exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)));" model_file[grep( "shape_vec[1:n,s] = rep_vector(shape[s], n);", model_file, fixed = TRUE )] <- "shape_vec[1:n,s] = rep_vector(shape, n);" } return(model_file) } ================================================ FILE: R/sim_mvgam.R ================================================ #' Simulate a set of time series for modelling in \pkg{mvgam} #' #' This function simulates sets of time series data for fitting a #' multivariate GAM that includes shared seasonality and dependence on #' State-Space latent dynamic factors. Random dependencies among series, #' i.e. correlations in their long-term trends, are included in the form of #' correlated loadings on the latent dynamic factors #' #' @importFrom stats rnorm rbeta rpois rlnorm rgamma cor cov2cor cov ts #' @importFrom brms lognormal #' #' @param T \code{integer}. Number of observations (timepoints) #' #' @param n_series \code{integer}. Number of discrete time series #' #' @param seasonality \code{character}. Either \code{shared}, meaning that #' all series share the exact same seasonal pattern, or #' \code{hierarchical}, meaning that there is a global seasonality but #' each series' pattern can deviate slightly #' #' @param use_lv \code{logical}. If \code{TRUE}, use dynamic factors to #' estimate series' latent trends in a reduced dimension format. If #' \code{FALSE}, estimate independent latent trends for each series #' #' @param n_lv \code{integer}. Number of latent dynamic factors for #' generating the series' trends. Defaults to `0`, meaning that dynamics #' are estimated independently for each series #' #' @param trend_model \code{character} specifying the time series dynamics #' for the latent trend. Options are: #' \itemize{ #' \item `None` (no latent trend component; i.e. the GAM component is #' all that contributes to the linear predictor, and the observation #' process is the only source of error; similarly to what is estimated #' by \code{\link[mgcv]{gam}}) #' \item `RW` (random walk with possible drift) #' \item `AR1` (with possible drift) #' \item `AR2` (with possible drift) #' \item `AR3` (with possible drift) #' \item `VAR1` (contemporaneously uncorrelated VAR1) #' \item `VAR1cor` (contemporaneously correlated VAR1) #' \item `GP` (Gaussian Process with squared exponential kernel) #' } #' See [mvgam_trends] for more details #' #' @param drift \code{logical}, simulate a drift term for each trend #' #' @param prop_trend \code{numeric}. Relative importance of the trend for #' each series. Should be between \code{0} and \code{1} #' #' @param trend_rel Deprecated. Use `prop_trend` instead #' #' @param freq \code{integer}. The seasonal frequency of the series #' #' @param family \code{family} specifying the exponential observation #' family for the series. Currently supported families are: `nb()`, #' `poisson()`, `bernoulli()`, `tweedie()`, `gaussian()`, `betar()`, #' `lognormal()`, `student()` and `Gamma()` #' #' @param phi \code{vector} of dispersion parameters for the series #' (i.e. `size` for `nb()` or `phi` for `betar()`). If #' \code{length(phi) < n_series}, the first element of `phi` will be #' replicated `n_series` times. Defaults to \code{5} for `nb()` and #' `tweedie()`; \code{10} for `betar()` #' #' @param shape \code{vector} of shape parameters for the series #' (i.e. `shape` for `gamma()`). If \code{length(shape) < n_series}, #' the first element of `shape` will be replicated `n_series` times. #' Defaults to \code{10} #' #' @param sigma \code{vector} of scale parameters for the series #' (i.e. `sd` for `gaussian()` or `student()`, `log(sd)` for #' `lognormal()`). If \code{length(sigma) < n_series}, the first element #' of `sigma` will be replicated `n_series` times. Defaults to #' \code{0.5} for `gaussian()` and `student()`; \code{0.2} for #' `lognormal()` #' #' @param nu \code{vector} of degrees of freedom parameters for the series #' (i.e. `nu` for `student()`). If \code{length(nu) < n_series}, the #' first element of `nu` will be replicated `n_series` times. Defaults #' to \code{3} #' #' @param mu \code{vector} of location parameters for the series. If #' \code{length(mu) < n_series}, the first element of `mu` will be #' replicated `n_series` times. Defaults to small random values between #' `-0.5` and `0.5` on the link scale #' #' @param prop_missing \code{numeric} stating proportion of observations #' that are missing. Should be between \code{0} and \code{0.8}, inclusive #' #' @param prop_train \code{numeric} stating the proportion of data to use #' for training. Should be between \code{0.2} and \code{1} #' #' @return A \code{list} object containing outputs needed for #' \code{\link{mvgam}}, including 'data_train' and 'data_test', as well #' as some additional information about the simulated seasonality and #' trend dependencies #' #' @references Clark, N. J. and Wells, K. (2022). Dynamic generalised #' additive models (DGAMs) for forecasting discrete ecological time #' series. \emph{Methods in Ecology and Evolution}, 13(11), 2388-2404. #' \doi{10.1111/2041-210X.13974} #' #' @examples #' # Simulate series with observations bounded at 0 and 1 (Beta responses) #' sim_data <- sim_mvgam( #' family = betar(), #' trend_model = RW(), #' prop_trend = 0.6 #' ) #' plot_mvgam_series(data = sim_data$data_train, series = 'all') #' #' # Now simulate series with overdispersed discrete observations #' sim_data <- sim_mvgam( #' family = nb(), #' trend_model = RW(), #' prop_trend = 0.6, #' phi = 10 #' ) #' plot_mvgam_series(data = sim_data$data_train, series = 'all') #' #' @export sim_mvgam = function( T = 100, n_series = 3, seasonality = 'shared', use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 ) { # Validate the family argument family <- validate_family(family) family_char <- match.arg( arg = family$family, choices = c( 'negative binomial', "poisson", "bernoulli", "tweedie", "beta", "gaussian", "lognormal", "student", "Gamma" ) ) # Validate the trend arguments trend_model <- validate_trend_model(trend_model, drift = drift, warn = FALSE) if (trend_model %in% c('VAR1', 'VAR1cor')) { use_lv <- FALSE } if (trend_model %in% c('RWcor', 'AR1cor', 'AR2cor', 'AR3cor')) { warning(paste0( 'Simulation of correlated AR or RW trends not yet supported.\n', 'Reverting to uncorrelated trends' )) } if (missing(trend_rel)) { trend_rel <- prop_trend } validate_proportional(trend_rel) # Check n_series validate_pos_integer(n_series) # Check prop_missing validate_proportional(prop_missing) # Check n_lv if (n_lv == 0) { use_lv <- FALSE n_lv <- n_series } else { validate_pos_integer(n_lv) use_lv <- TRUE } if (use_lv) { if (n_lv > n_series) { warning( 'Argument "n_lv" cannot be greater than n_series; changing n_lv to match n_series' ) n_lv <- n_series } } # Check seasonality if (!seasonality %in% c('shared', 'hierarchical')) { stop('seasonality must be either shared or hierarchical') } # Check family-specific parameters if (missing(phi)) { if (family_char == 'beta') { phi <- rep(10, n_series) } else { phi <- rep(5, n_series) } } if (any(phi <= 0)) { stop('Argument "phi" must be a non-negative real number', call. = FALSE) } if (missing(shape)) { shape <- rep(1, n_series) } if (any(shape <= 0)) { stop('Argument "shape" must be a non-negative real number', call. = FALSE) } if (missing(sigma)) { if (family_char == 'lognormal') { sigma <- rep(0.2, n_series) } else { sigma <- rep(0.5, n_series) } } if (any(sigma <= 0)) { stop('Argument "sigma" must be a non-negative real number', call. = FALSE) } if (missing(nu)) { nu <- rep(3, n_series) } if (any(nu <= 0)) { stop('Argument "nu" must be a non-negative real number', call. = FALSE) } if (missing(mu)) { mu <- sample(seq(-0.5, 0.5), n_series, TRUE) } if (length(phi) < n_series) { phi <- rep(phi[1], n_series) } if (length(shape) < n_series) { shape <- rep(shape[1], n_series) } if (length(sigma) < n_series) { sigma <- rep(sigma[1], n_series) } if (length(nu) < n_series) { nu <- rep(nu[1], n_series) } if (length(mu) < n_series) { mu <- rep(mu[1], n_series) } # Check data splitting if (missing(prop_train)) { prop_train <- 0.75 } if (prop_train < 0.2 || prop_train > 1) { stop( 'Argument "prop_train" must be a proportion ranging from 0.2 to 1, inclusive', call. = FALSE ) } # Set trend parameters if (trend_model %in% c('RW', 'RWcor')) { ar1s <- rep(1, n_lv) ar2s <- rep(0, n_lv) ar3s <- rep(0, n_lv) } if (trend_model %in% c('AR1', 'AR1cor')) { ar1s <- rnorm(n_lv, sd = 0.5) ar2s <- rep(0, n_lv) ar3s <- rep(0, n_lv) } if (trend_model %in% c('AR2', 'AR2cor')) { ar1s <- rnorm(n_lv, sd = 0.5) ar2s <- rnorm(n_lv, sd = 0.5) ar3s <- rep(0, n_lv) } if (trend_model %in% c('AR3', 'AR3cor')) { ar1s <- rnorm(n_lv, sd = 0.5) ar2s <- rnorm(n_lv, sd = 0.5) ar3s <- rnorm(n_lv, sd = 0.5) } if (trend_model %in% c('RW', 'AR1', 'AR2', 'AR3', 'VAR1', 'VAR1cor')) { # Sample trend drift terms so they are (hopefully) not too correlated if (drift) { trend_alphas <- rnorm(n_lv, sd = 0.5) } else { trend_alphas <- rep(0, n_lv) } # Simulate latent trends if (!trend_model %in% c('VAR1', 'VAR1cor')) { trends <- do.call( cbind, lapply(seq_len(n_lv), function(x) { sim_ar3( drift = 0, ar1 = ar1s[x], ar2 = ar2s[x], ar3 = ar3s[x], tau = 1, last_trends = rnorm(3), h = T ) + trend_alphas[x] * 1:T }) ) } if (trend_model %in% c('VAR1', 'VAR1cor')) { if (trend_model == 'VAR1') { # Simulate the Sigma matrix (contemporaneously uncorrelated) Sigma <- matrix(0, n_lv, n_lv) sigma <- runif(n_lv, 0.4, 1.2) diag(Sigma) <- sigma } if (trend_model == 'VAR1cor') { # Use the LKJ distribution to sample correlation matrices # with nice properties # Sample trend SD parameters and construct Sigma sigma <- runif(n_lv, 0.4, 1.2) Sigma <- outer(sigma, sigma) * lkj_corr(n_series = n_lv) } # Create a stationary VAR coefficient matrix A <- stationary_VAR_phi(p = 1, n_series = n_lv)[[1]] # Simulate the VAR trends trends <- sim_var1( drift = trend_alphas, A = A, Sigma = Sigma, last_trends = mvnfast::rmvn(n = 1, mu = rep(0, n_lv), sigma = Sigma), h = T ) } } if (trend_model == 'GP') { # Sample alpha and rho parameters trend_alphas <- runif(n_lv, 0.75, 1.25) trend_rhos <- runif(n_lv, 3, 8) # Generate latent GP trends trends <- do.call( cbind, lapply(seq_len(n_lv), function(lv) { Sigma <- trend_alphas[lv]^2 * exp(-0.5 * ((outer(1:T, 1:T, "-") / trend_rhos[lv])^2)) + diag(1e-9, T) mvnfast::rmvn(1, mu = rep(0, T), sigma = Sigma)[1, ] }) ) } if (use_lv) { Sigma <- random_Sigma(n_series) loadings <- as.matrix(matrix( mvnfast::rmvn(n = n_lv, mu = rep(0, n_series), sigma = Sigma), ncol = n_series )) } else { # Else use independent trend loadings loadings <- diag(n_lv) } # Simulate the global seasonal pattern glob_season <- periodic_gp(T, period = freq, rho = runif(1, 0.5, 1.2)) # Simulate observed series as dependent on seasonality and trend obs_trends <- matrix(NA, nrow = T, ncol = n_series) for (s in 1:n_series) { obs_trends[, s] <- as.vector(scale(as.vector(loadings[, s] %*% t(trends)))) } obs_ys <- c(unlist(lapply(seq_len(n_series), function(x) { if (seasonality == 'shared') { dynamics <- (glob_season * (1 - trend_rel)) + (obs_trends[, x] * trend_rel) } else { yseason <- as.vector(scale(stats::stl( ts(rnorm(T, glob_season, sd = 2), frequency = freq), 'periodic' )$time.series[, 1])) dynamics <- (yseason * (1 - trend_rel)) + (obs_trends[, x] * trend_rel) } if (family_char == 'negative binomial') { out <- rnbinom( length(dynamics), size = phi[x], mu = exp(mu[x] + dynamics) ) } if (family_char == 'poisson') { out <- rpois(length(dynamics), lambda = exp(mu[x] + dynamics)) } if (family_char == 'bernoulli') { out <- rbinom(length(dynamics), size = 1, prob = plogis(mu[x] + dynamics)) } if (family_char == 'tweedie') { out <- rpois( n = length(dynamics), lambda = tweedie::rtweedie( length(dynamics), mu = exp(mu[x] + dynamics), power = 1.5, phi = phi[x] ) ) } if (family_char == 'gaussian') { out <- rnorm(length(dynamics), mean = mu[x] + dynamics, sd = sigma[x]) } if (family_char == 'student') { out <- rstudent_t( n = length(dynamics), df = nu[x], mu = mu[x] + dynamics, sigma = sigma[x] ) } if (family_char == 'lognormal') { out <- rlnorm( length(dynamics), meanlog = mu[x] + (dynamics * 0.3), sdlog = sigma[x] ) } if (family_char == 'Gamma') { out <- rgamma( length(dynamics), rate = shape[x] / exp(mu[x] + dynamics), shape = shape[x] ) } if (family_char == 'beta') { shape_pars <- beta_shapes(mu = plogis(mu[x] + dynamics), phi = phi[x]) out <- rbeta( length(dynamics), shape1 = shape_pars$shape1, shape2 = shape_pars$shape2 ) } out[is.infinite(out)] <- NA if (prop_missing > 0) { out[sample(seq(1, length(out)), floor(length(out) * prop_missing))] <- NA } out }))) # Return simulated data in the format that is ready for mvgam analysis sim_data = data.frame( y = obs_ys, season = rep(rep(seq(1, freq), ceiling(T / freq))[1:T], n_series), year = rep(sort(rep(seq(1, ceiling(T / freq)), freq))[1:T], n_series), series = as.factor(paste0('series_', sort(rep(seq(1, n_series), T)))) ) %>% dplyr::group_by(series) %>% dplyr::arrange(year, season) %>% dplyr::mutate(time = 1:dplyr::n()) %>% dplyr::ungroup() data_train <- sim_data %>% dplyr::filter(time <= floor(max(sim_data$time) * prop_train)) %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(time) data_test <- sim_data %>% dplyr::filter(time > max(data_train$time)) %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(time) if (!use_lv) { if (trend_model %in% c('RW', 'AR1', 'AR2', 'AR3')) { trend_params = list(ar1 = ar1s, ar2 = ar2s, ar3 = ar3s) } if (trend_model %in% c('VAR1', 'VAR1cor')) { trend_params = list(var1 = A, Sigma = Sigma) } if (trend_model == 'GP') { trend_params = list(alpha = trend_alphas, rho = trend_rhos) } out <- list( data_train = data.frame(data_train), data_test = data.frame(data_test), true_corrs = cov2cor(cov(obs_trends)), true_trends = obs_trends, global_seasonality = glob_season, trend_params = trend_params ) } else { out <- list( data_train = data.frame(data_train), data_test = data.frame(data_test), true_corrs = cov2cor(cov(obs_trends)), true_trends = obs_trends, global_seasonality = glob_season ) } return(out) } #' Simulate a fixed seasonal pattern #' @noRd sim_seasonal = function(T, freq = 12) { beta1 <- runif(1, 0.2, 0.6) beta2 <- runif(1, -0.5, 0.5) cov1 <- sin(2 * pi * (1:T) / freq) cov2 <- cos(2 * pi * (1:T) / freq) rnorm(T, mean = beta1 * cov1 + beta2 * cov2, sd = 0.1) } #' Simulate from a periodic GP #' @noRd periodic_gp <- function(T, period = 12, rho = 1) { time <- 1:T cov_matrix = array(0, c(length(time), length(time))) for (i in 1:length(time)) { cov_matrix[i, i] = 1 + 0.00000001 if (i < length(time)) { for (j in (i + 1):length(time)) { covariance = exp( -2 * (sin(pi * abs(time[i] - time[j]) / period)^2) / (rho^2) ) cov_matrix[i, j] = covariance cov_matrix[j, i] = covariance } } } chol_cov <- t(chol(cov_matrix)) values <- as.vector(scale(chol_cov %*% rnorm(length(time)))) return(values) } #' Simulate from the LKJ distribution #' @noRd lkj_corr <- function(n_series, eta = 0.8) { alpha <- eta + (n_series - 2) / 2 r12 <- 2 * rbeta(1, alpha, alpha) - 1 R <- matrix(0, n_series, n_series) R[1, 1] <- 1 R[1, 2] <- r12 R[2, 2] <- sqrt(1 - r12^2) if (n_series > 2) { for (m in 2:(n_series - 1)) { alpha <- alpha - 0.5 y <- rbeta(1, m / 2, alpha) z <- rnorm(m, 0, 1) z <- z / sqrt(crossprod(z)[1]) R[1:m, m + 1] <- sqrt(y) * z R[m + 1, m + 1] <- sqrt(1 - y) } } return(crossprod(R)) } #' Generate a random covariance matrix #' @noRd random_Sigma = function(N) { L_Omega <- matrix(0, N, N) L_Omega[1, 1] <- 1 for (i in 2:N) { bound <- 1 for (j in 1:(i - 1)) { is_sparse <- rbinom(1, 1, 0.6) if (is_sparse) { L_Omega[i, j] <- runif(1, -0.05, 0.05) } else { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)) } bound <- bound - L_Omega[i, j]^2 } L_Omega[i, i] <- sqrt(bound) } Sigma <- L_Omega %*% t(L_Omega) return(Sigma) } ================================================ FILE: R/stability.R ================================================ #' Calculate measures of latent VAR community stability #' #' Compute reactivity, return rates and contributions of interactions to #' stationary forecast variance from \pkg{mvgam} models with Vector #' Autoregressive dynamics. #' #' @name stability.mvgam #' #' @param object \code{list} object of class \code{mvgam} resulting from a call #' to [mvgam()] that used a Vector Autoregressive latent process model (either #' as `VAR(cor = FALSE)` or `VAR(cor = TRUE)`) #' #' @param ... Ignored #' #' @details These measures of stability can be used to assess how important #' inter-series dependencies are to the variability of a multivariate system #' and to ask how systems are expected to respond to environmental #' perturbations. Using the formula for a latent VAR(1) as: #' #' \deqn{ #' \mu_t \sim \text{MVNormal}(A(\mu_{t - 1}), \Sigma) #' } #' #' this function will calculate the long-term stationary forecast distribution #' of the system, which has mean \eqn{\mu_{\infty}} and variance #' \eqn{\Sigma_{\infty}}, to then calculate the following quantities: #' #' \itemize{ #' \item `prop_int`: Proportion of the volume of the stationary forecast #' distribution that is attributable to lagged interactions: #' \deqn{ det(A)^2 } #' #' \item `prop_int_adj`: Same as `prop_int` but scaled by the number of #' series \eqn{p}: #' \deqn{ det(A)^{2/p} } #' #' \item `prop_int_offdiag`: Sensitivity of `prop_int` to inter-series #' interactions (off-diagonals of \eqn{A}): #' \deqn{ [2~det(A) (A^{-1})^T] } #' #' \item `prop_int_diag`: Sensitivity of `prop_int` to intra-series #' interactions (diagonals of \eqn{A}): #' \deqn{ [2~det(A) (A^{-1})^T] } #' #' \item `prop_cov_offdiag`: Sensitivity of \eqn{\Sigma_{\infty}} to #' inter-series error correlations: #' \deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] } #' #' \item `prop_cov_diag`: Sensitivity of \eqn{\Sigma_{\infty}} to error #' variances: #' \deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] } #' #' \item `reactivity`: Degree to which the system moves away from a stable #' equilibrium following a perturbation. If \eqn{\sigma_{max}(A)} is the #' largest singular value of \eqn{A}: #' \deqn{ \log\sigma_{max}(A) } #' #' \item `mean_return_rate`: Asymptotic return rate of the mean of the #' transition distribution to the stationary mean: #' \deqn{ \max(\lambda_{A}) } #' #' \item `var_return_rate`: Asymptotic return rate of the variance of the #' transition distribution to the stationary variance: #' \deqn{ \max(\lambda_{A \otimes A}) } #' } #' #' Major advantages of using \pkg{mvgam} to compute these metrics are that #' well-calibrated uncertainties are available and that VAR processes are #' forced to be stationary. These properties make it simple and insightful to #' calculate and inspect aspects of both long-term and short-term stability. #' #' You can also inspect interactions among the time series in a latent VAR #' process using \code{\link{irf}} for impulse response functions or #' \code{\link{fevd}} for forecast error variance decompositions. #' #' @return A \code{data.frame} containing posterior draws for each stability #' metric. #' #' @references #' AR Ives, B Dennis, KL Cottingham & SR Carpenter (2003). Estimating #' community stability and ecological interactions from time-series data. #' *Ecological Monographs*, 73, 301–330. #' #' @author Nicholas J Clark #' #' @seealso #' \code{\link{VAR}}, #' \code{\link{irf}}, #' \code{\link{fevd}} #' #' @examples #' \dontrun{ #' # Simulate some time series that follow a latent VAR(1) process #' simdat <- sim_mvgam( #' family = gaussian(), #' n_series = 4, #' trend_model = VAR(cor = TRUE), #' prop_trend = 1 #' ) #' #' plot_mvgam_series(data = simdat$data_train, series = 'all') #' #' # Fit a model that uses a latent VAR(1) #' mod <- mvgam( #' y ~ -1, #' trend_formula = ~ 1, #' trend_model = VAR(cor = TRUE), #' family = gaussian(), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' # Calculate stability metrics for this system #' metrics <- stability(mod) #' #' # Proportion of stationary forecast distribution attributable to interactions #' hist( #' metrics$prop_int, #' xlim = c(0, 1), #' xlab = 'Prop_int', #' main = '', #' col = '#B97C7C', #' border = 'white' #' ) #' #' # Inter- vs intra-series interaction contributions #' layout(matrix(1:2, nrow = 2)) #' hist( #' metrics$prop_int_offdiag, #' xlim = c(0, 1), #' xlab = '', #' main = 'Inter-series interactions', #' col = '#B97C7C', #' border = 'white' #' ) #' #' hist( #' metrics$prop_int_diag, #' xlim = c(0, 1), #' xlab = 'Contribution to interaction effect', #' main = 'Intra-series interactions (density dependence)', #' col = 'darkblue', #' border = 'white' #' ) #' layout(1) #' #' # Inter- vs intra-series contributions to forecast variance #' layout(matrix(1:2, nrow = 2)) #' hist( #' metrics$prop_cov_offdiag, #' xlim = c(0, 1), #' xlab = '', #' main = 'Inter-series covariances', #' col = '#B97C7C', #' border = 'white' #' ) #' #' hist( #' metrics$prop_cov_diag, #' xlim = c(0, 1), #' xlab = 'Contribution to forecast variance', #' main = 'Intra-series variances', #' col = 'darkblue', #' border = 'white' #' ) #' layout(1) #' #' # Reactivity: system response to perturbation #' hist( #' metrics$reactivity, #' main = '', #' xlab = 'Reactivity', #' col = '#B97C7C', #' border = 'white', #' xlim = c( #' -1 * max(abs(metrics$reactivity)), #' max(abs(metrics$reactivity)) #' ) #' ) #' abline(v = 0, lwd = 2.5) #' } #' #' @export stability <- function(object, ...) { UseMethod("stability", object) } #'@rdname stability.mvgam #'@method stability mvgam #'@export stability.mvgam = function(object, ...) { # Check trend_model trend_model <- attr(object$model_data, 'trend_model') if (!trend_model %in% c('VAR', 'VARcor', 'VAR1', 'VAR1cor')) { stop( 'Only VAR(1) models currently supported for calculating stability metrics', call. = FALSE ) } # Take posterior draws of the interaction matrix B_post <- mcmc_chains(object$model_output, 'A') # Take posterior draws of Sigma Sigma_post <- mcmc_chains(object$model_output, 'Sigma') # Number of series in the VAR process n_series <- object$n_lv if (is.null(n_series)) { n_series <- nlevels(object$obs_data$series) } metrics <- do.call( rbind, lapply(seq_len(NROW(B_post)), function(i) { B <- matrix(B_post[i, ], nrow = n_series, ncol = n_series, byrow = TRUE) p <- dim(B)[1] # If we want to get the variance of the stationary distribution (Sigma_inf) Sigma <- matrix( Sigma_post[i, ], nrow = n_series, ncol = n_series, byrow = TRUE ) vecS_inf <- solve(diag(p * p) - kronecker(B, B)) %*% as.vector(Sigma) Sigma_inf <- matrix(vecS_inf, nrow = p) # The difference in volume between Sigma_inf and Sigma is: # det(Sigma_inf - Sigma) = det(Sigma_inf) * det(B) ^ 2 # according to Ives et al 2003 (eqn 24) # We can take partial derivatives to determine which elements of # Sigma_inf contribute most to rates of change in the # proportion of Sigma_inf that is due to process error # Thanks to Mark Scheuerell for providing inspirational code # https://github.com/mdscheuerell/safs-quant-sem-2022/blob/main/lwa_analysis.R int_env <- det(Sigma_inf) * t(solve(Sigma_inf)) # Proportion of inter-series covariance to # to overall environmental variation contribution (i.e. how important are # correlated errors for controlling the shape of the stationary forecast # distribution?) dat <- data.frame( prop_cov_offdiag = mean(abs(int_env[lower.tri(int_env)])) / (mean(abs(diag(int_env))) + mean(abs(int_env[lower.tri(int_env)]))) ) # Proportion of error variances to stationary forecast distribution dat$prop_cov_diag <- 1 - dat$prop_cov_offdiag # Proportion of volume of Sigma_inf attributable to series interactions, # measuring the degree to which interactions increase # the variance of the stationary distribution (Sigma_inf) relative # to the variance of the process error (Sigma) # lower values = more stability dat$prop_int = abs(det(B))^2 # Ives et al 2003 suggest to scale this by the number of series for more direct # comparisons among different studies dat$prop_int_adj <- abs(det(B))^(2 / p) # Sensitivity of the species interaction proportion to particular # interactions is also calculated using partial derivatives # (note the use of 2 here because we squared det(B) in the above eqn) int_sens <- 2 * det(B) * t(solve(B)) # Proportion of interspecific contributions to # to overall interaction contribution dat$prop_int_offdiag <- mean(abs(int_sens[lower.tri(int_sens)])) / (mean(abs(diag(int_sens))) + mean(abs(int_sens[lower.tri(int_sens)]))) # Proportion of density dependent contributions to # to overall interaction contribution dat$prop_int_diag <- 1 - dat$prop_int_offdiag # Reactivity, measuring the degree to which the system moves # away from a stable equilibrium following a perturbation # values > 0 suggest the system is reactive, whereby a # perturbation of the system in one period can be amplified in the next period # Following Neubert et al 2009 Ecology (Detecting reactivity) dat$reactivity <- log(max(svd(B)$d)) # Return rate of transition distribution to the stationary distribution # Asymptotic return rate of the mean # lower values = more stability dat$mean_return_rate <- max(abs(eigen(B)$values)) # Asymptotic return rate of the variance # lower values = more stability dat$var_return_rate <- max(abs(eigen(B %x% B)$values)) dat }) ) return(metrics) } ================================================ FILE: R/stan_utils.R ================================================ #' Stan code and data objects for \pkg{mvgam} models #' #' Generate Stan code and data objects for \pkg{mvgam} models #' #' @param object An object of class `mvgam` or `mvgam_prefit`, #' returned from a call to \code{mvgam} #' @return Either a character string containing the fully commented \pkg{Stan} code #' to fit a \pkg{mvgam} model or a named list containing the data objects needed #' to fit the model in Stan. #' @export #' @examples #'\dontrun{ #' simdat <- sim_mvgam() #' mod <- mvgam(y ~ s(season) + #' s(time, by = series), #' family = poisson(), #' data = simdat$data_train, #' run_model = FALSE) #' #' # View Stan model code #' stancode(mod) #' #' # View Stan model data #' sdata <- standata(mod) #' str(sdata) #' } #' code = function(object) { if (!inherits(object, c('mvgam', 'mvgam_prefit'))) { stop('argument "object" must be of class "mvgam" or "mvgam_prefit"') } scode <- readLines(textConnection(object$model_file), n = -1) class(scode) <- c("character", "mvgammodel") scode } #' @export print.mvgammodel = function(x, ...) { cat(x, sep = '\n') invisible(x) } #' @export #' @importFrom brms stancode brms::stancode #' @export #' @param ... ignored #' @rdname code stancode.mvgam_prefit = function(object, ...) { code(object) } #' @export #' @rdname code stancode.mvgam = function(object, ...) { code(object) } #' @export #' @importFrom brms standata brms::standata #' @export #' @param ... ignored #' @rdname code standata.mvgam_prefit = function(object, ...) { object$model_data } #' @noRd remove_likelihood = function(model_file) { like_line <- grep('// likelihood functions', model_file) all_open_braces <- grep('{', model_file, fixed = TRUE) all_close_braces <- grep('}', model_file, fixed = TRUE) open_distances <- like_line - all_open_braces open_distances[open_distances < 0] <- NA start_remove <- all_open_braces[ which.min(open_distances) ] close_distances <- like_line - all_close_braces close_distances[close_distances > 0] <- NA end_remove <- all_close_braces[ which.max(close_distances) ] model_file[-(start_remove:end_remove)] } #### Replacement for MCMCvis functions to remove dependence on rstan for working # with stanfit objects #### #' @noRd mcmc_summary = function( object, params = 'all', excl = NULL, ISB = TRUE, exact = TRUE, probs = c(0.025, 0.5, 0.975), hpd_prob = 0.95, HPD = FALSE, pg0 = FALSE, digits = NULL, round = NULL, Rhat = TRUE, n.eff = TRUE, func = NULL, func_name = NULL, variational = FALSE ) { if (variational) { Rhat <- FALSE n.eff <- FALSE } # SORTING BLOCK if (methods::is(object, 'matrix')) { object2 <- mcmc_chains( object, params, excl, ISB, exact = exact, mcmc.list = FALSE ) } else { if (methods::is(object, 'stanfit')) { object2 <- object } else { # rstanarm if (methods::is(object, 'stanreg')) { object2 <- object$stanfit } else { # brms if (methods::is(object, 'brmsfit')) { object2 <- object$fit } else { #jagsUI if (methods::is(object, 'jagsUI')) { object2 <- mcmc_chains(object) } else { object2 <- mcmc_chains( object, params, excl, ISB, exact = exact, mcmc.list = TRUE ) } } } } } #-------------------------------------------------------------------------------------------------------------- # PROCESSING BLOCK - JAGS AND MATRIX MCMC OUTPUT if (coda::is.mcmc.list(object2) == TRUE | methods::is(object, 'matrix')) { if (methods::is(object, 'matrix')) { np <- NCOL(object2) ch_bind <- object2 } else { np <- NCOL(object2[[1]]) if (np > 1) { ch_bind <- do.call("rbind", object2) } else { ch_bind <- as.matrix(object2) } } x <- list() # mean, sd, and quantiles if (!is.null(digits)) { if (!is.null(round)) { warning( "'digits' and 'round' arguments cannot be used together. Using 'digits'." ) } bind_mn <- data.frame(signif(apply(ch_bind, 2, mean), digits = digits)) bind_sd <- data.frame(signif( apply(ch_bind, 2, stats::sd), digits = digits )) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(signif( apply(ch_bind, 2, stats::quantile, probs = probs), digits = digits )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(signif( apply(ch_bind, 2, stats::quantile, probs = probs), digits = digits ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(signif( coda::HPDinterval(coda::as.mcmc(ch_bind), prob = hpd_prob), digits = digits )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } if (is.null(digits) & !is.null(round)) { bind_mn <- data.frame(round(apply(ch_bind, 2, mean), digits = round)) bind_sd <- data.frame(round(apply(ch_bind, 2, stats::sd), digits = round)) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(round( apply(ch_bind, 2, stats::quantile, probs = probs), digits = round )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(round( apply(ch_bind, 2, stats::quantile, probs = probs), digits = round ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(round( coda::HPDinterval(coda::as.mcmc(ch_bind), prob = hpd_prob), digits = round )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } if (is.null(digits) & is.null(round)) { bind_mn <- data.frame(apply(ch_bind, 2, mean)) bind_sd <- data.frame(apply(ch_bind, 2, stats::sd)) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(apply( ch_bind, 2, stats::quantile, probs = probs )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(apply( ch_bind, 2, stats::quantile, probs = probs ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(coda::HPDinterval( coda::as.mcmc(ch_bind), prob = hpd_prob )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } x[[1]] <- cbind(bind_mn, bind_sd, bind_q) # rhat if (Rhat == TRUE) { if (!methods::is(object, 'matrix')) { if (length(object2) > 1) { # If > 750 params use loop to calculate Rhat if (NCOL(object2[[1]]) > 750) { r_hat <- c(rep(NA, NCOL(object2[[1]]))) for (v in 1:length(r_hat)) { r_hat[v] <- round( coda::gelman.diag(object2[, v])$psrf[, 1], digits = 2 ) } r_hat <- data.frame(r_hat) colnames(r_hat) <- "Rhat" } else { r_hat <- data.frame(round( coda::gelman.diag(object2, multivariate = FALSE)$psrf[, 1], digits = 2 )) colnames(r_hat) <- "Rhat" } } else { warning( "Rhat statistic cannot be calculated with one chain. NAs inserted." ) r_hat <- data.frame(rep(NA, np)) colnames(r_hat) <- "Rhat" } } else { warning( "Rhat statistic cannot be calculated with one chain (matrix input). NAs inserted." ) r_hat <- data.frame(rep(NA, np)) colnames(r_hat) <- "Rhat" } x[[(length(x) + 1)]] <- r_hat } # neff if (n.eff == TRUE) { if (!methods::is(object, 'matrix')) { neff <- data.frame(round(coda::effectiveSize(object2), digits = 0)) colnames(neff) <- "n_eff" } else { warning( 'Number of effective samples cannot be calculated without individual chains (matrix input). NAs inserted.' ) neff <- data.frame(rep(NA, np)) colnames(neff) <- "n_eff" } x[[(length(x) + 1)]] <- neff } # p>0 if (pg0 == TRUE) { tpg <- data.frame(apply( ch_bind, 2, function(x) round(sum(x > 0) / length(x), 2) )) colnames(tpg) <- 'p>0' x[[(length(x) + 1)]] <- tpg } # custom function if (!is.null(func)) { if (!is.null(digits)) { tmp <- signif(apply(ch_bind, 2, func), digits = digits) } if (is.null(digits) & !is.null(round)) { tmp <- round(apply(ch_bind, 2, func), digits = round) } if (is.null(digits) & is.null(round)) { tmp <- apply(ch_bind, 2, func) } if (!is.null(dim(tmp))) { tmp <- data.frame(t(tmp)) } else { tmp <- data.frame(tmp) } if (!is.null(func_name)) { if (length(func_name) != NCOL(tmp)) { stop("length(func_name) must equal number of func outputs") } colnames(tmp) <- func_name } else { colnames(tmp) <- 'func' } x[[(length(x) + 1)]] <- tmp } # bind them mcmc_summary <- do.call("cbind", x) } #-------------------------------------------------------------------------------------------------------------- # PROCESSING BLOCK - STAN OR JAGSUI MCMC OUTPUT if (methods::is(object2, 'stanfit') | methods::is(object, 'jagsUI')) { if (methods::is(object2, 'stanfit')) { # rhat and n_eff directly from rstan output all_params <- row.names(rstan::summary(object2)$summary) rs_df <- data.frame(rstan::summary(object2)$summary) #if brms, reassign names without b_ and r_ (as in MCMCchains) if (methods::is(object, 'brmsfit')) { sp_names_p <- names(object2@sim$samples[[1]]) #remove b_ and r_ st_nm <- substr(sp_names_p, start = 1, stop = 2) sp_names <- rep(NA, length(sp_names_p)) b_idx <- which(st_nm == 'b_') r_idx <- which(st_nm == 'r_') ot_idx <- which(st_nm != 'b_' & st_nm != 'r_') #fill names vec with b_ and r_ removed sp_names[b_idx] <- gsub('b_', '', sp_names_p[b_idx]) sp_names[r_idx] <- gsub('r_', '', sp_names_p[r_idx]) sp_names[ot_idx] <- sp_names_p[ot_idx] #assign names to df all_params <- sp_names row.names(rs_df) <- all_params } } if (methods::is(object, 'jagsUI')) { all_params <- row.names(object$summary) rs_df <- data.frame(object$summary) } # filtering of parameters from rstan/jagsUI object - from MCMCchains if (ISB == TRUE) { names <- vapply( strsplit(all_params, split = "[", fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { names <- all_params } x <- list() # INDEX BLOCK exclusions if (!is.null(excl)) { rm_ind <- c() for (i in 1:length(excl)) { if (ISB == TRUE) { n_excl <- vapply( strsplit(excl, split = "[", fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { n_excl <- excl } if (exact == TRUE) { ind_excl <- which(names %in% n_excl[i]) } else { ind_excl <- grep(n_excl[i], names, fixed = FALSE) } if (length(ind_excl) < 1) { warning(paste0( "\"", excl[i], "\"", " not found in MCMC output. Check 'ISB'' and 'exact' arguments to make sure the desired parsing methods are being used." )) } rm_ind <- c(rm_ind, ind_excl) } if (length(rm_ind) > 0) { dups <- which(duplicated(rm_ind)) if (length(dups) > 0) { rm_ind2 <- rm_ind[-dups] } else { rm_ind2 <- rm_ind } } else { excl <- NULL } } # selections if (length(params) == 1) { if (params == "all") { if (is.null(excl)) { f_ind <- 1:length(names) } else { f_ind <- (1:length(names))[-rm_ind2] } } else { if (exact == TRUE) { get_ind <- which(names %in% params) } else { get_ind <- grep(paste(params), names, fixed = FALSE) } if (length(get_ind) < 1) { stop(paste0( "\"", params, "\"", " not found in MCMC output. Check 'ISB' and 'exact' arguments to make sure the desired parsing methods are being used." )) } if (!is.null(excl)) { if (identical(get_ind, rm_ind2)) { stop("No parameters selected.") } matched <- stats::na.omit(match(rm_ind2, get_ind)) if (length(matched) > 0) { f_ind <- get_ind[-matched] } else { f_ind <- get_ind } } else { f_ind <- get_ind } } } else { grouped <- c() for (i in 1:length(params)) { if (exact == TRUE) { get_ind <- which(names %in% params[i]) } else { get_ind <- grep(paste(params[i]), names, fixed = FALSE) } if (length(get_ind) < 1) { warning(paste0( "\"", params[i], "\"", " not found in MCMC output. Check 'ISB' and 'exact' arguments to make sure the desired parsing methods are being used." )) (next)() } grouped <- c(grouped, get_ind) } if (!is.null(excl)) { if (identical(grouped, rm_ind2)) { stop("No parameters selected.") } matched <- stats::na.omit(match(rm_ind2, grouped)) if (length(matched) > 0) { t_ind <- grouped[-matched] } else { t_ind <- grouped } to.rm <- which(duplicated(t_ind)) if (length(to.rm) > 0) { f_ind <- t_ind[-to.rm] } else { f_ind <- t_ind } } else { to.rm <- which(duplicated(grouped)) if (length(to.rm) > 0) { f_ind <- grouped[-to.rm] } else { f_ind <- grouped } } } # end sort # convert object to matrix if computing non default intervals or using custom func if ( !is.null(func) | HPD == TRUE | identical(probs, c(0.025, 0.5, 0.975)) == FALSE | pg0 == TRUE ) { if (methods::is(object2, 'stanfit')) { #ensure is matrix, not vector ch_bind <- as.matrix(as.matrix(object2)[, f_ind]) } if (methods::is(object, 'jagsUI')) { ch_bind <- mcmc_chains(object, params, excl, ISB) } } # mean, sd, and quantiles if (!is.null(digits)) { if (!is.null(round)) { warning( "'digits' and 'round' arguments cannot be used together. Using 'digits'." ) } bind_mn <- data.frame(signif(rs_df["mean"][f_ind, 1], digits = digits)) bind_sd <- data.frame(signif(rs_df["sd"][f_ind, 1], digits = digits)) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(signif( apply(ch_bind, 2, stats::quantile, probs = probs), digits = digits )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { if (identical(probs, c(0.025, 0.5, 0.975)) == TRUE) { bind_LCI <- signif(rs_df["X2.5."][f_ind, 1], digits = digits) bind_med <- signif(rs_df["X50."][f_ind, 1], digits = digits) bind_UCI <- signif(rs_df["X97.5."][f_ind, 1], digits = digits) bind_q <- data.frame(cbind(bind_LCI, bind_med, bind_UCI)) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(signif( apply(ch_bind, 2, stats::quantile, probs = probs), digits = digits ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'Specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(signif( coda::HPDinterval(coda::as.mcmc(ch_bind), prob = hpd_prob), digits = digits )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } if (is.null(digits) & !is.null(round)) { bind_mn <- data.frame(round(rs_df["mean"][f_ind, 1], digits = round)) bind_sd <- data.frame(round(rs_df["sd"][f_ind, 1], digits = round)) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(round( apply(ch_bind, 2, stats::quantile, probs = probs), digits = round )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { if (identical(probs, c(0.025, 0.5, 0.975)) == TRUE) { bind_LCI <- round(rs_df["X2.5."][f_ind, 1], digits = round) bind_med <- round(rs_df["X50."][f_ind, 1], digits = round) bind_UCI <- round(rs_df["X97.5."][f_ind, 1], digits = round) bind_q <- data.frame(cbind(bind_LCI, bind_med, bind_UCI)) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(round( apply(ch_bind, 2, stats::quantile, probs = probs), digits = round ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'Specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(round( coda::HPDinterval(coda::as.mcmc(ch_bind), prob = hpd_prob), digits = round )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } if (is.null(digits) & is.null(round)) { bind_mn <- data.frame(rs_df["mean"][f_ind, 1]) bind_sd <- data.frame(rs_df["sd"][f_ind, 1]) colnames(bind_mn) <- "mean" colnames(bind_sd) <- "sd" if (HPD == FALSE) { if (length(probs) == 1) { bind_q <- data.frame(apply( ch_bind, 2, stats::quantile, probs = probs )) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { if (identical(probs, c(0.025, 0.5, 0.975)) == TRUE) { bind_LCI <- rs_df["X2.5."][f_ind, 1] bind_med <- rs_df["X50."][f_ind, 1] bind_UCI <- rs_df["X97.5."][f_ind, 1] bind_q <- data.frame(cbind(bind_LCI, bind_med, bind_UCI)) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } else { bind_q <- data.frame(t(apply( ch_bind, 2, stats::quantile, probs = probs ))) colnames(bind_q) <- paste0(signif(probs * 100, digits = 3), "%") } } } if (HPD == TRUE) { if (length(hpd_prob) > 1) { stop( 'Specify only a single probability for HPD interval computation.' ) } bind_q <- data.frame(coda::HPDinterval( coda::as.mcmc(ch_bind), prob = hpd_prob )) colnames(bind_q) <- c( paste0(signif(hpd_prob * 100, digits = 3), "%_HPDL"), paste0(signif(hpd_prob * 100, digits = 3), "%_HPDU") ) } } x[[1]] <- cbind(bind_mn, bind_sd, bind_q) # rhat - rhat in Stan calculated within chain (different than with coda package) if (Rhat == TRUE) { r_hat <- data.frame(round(rs_df["Rhat"][f_ind, 1], digits = 2)) colnames(r_hat) <- "Rhat" x[[(length(x) + 1)]] <- r_hat } # neff - neff in Stan is calculated within chain (different than with coda package) if (n.eff == TRUE) { if (methods::is(object2, 'stanfit')) { neff <- data.frame(round(rs_df["n_eff"][f_ind, 1], digits = 0)) } if (methods::is(object, 'jagsUI')) { neff <- data.frame(round(rs_df["n.eff"][f_ind, 1], digits = 0)) } colnames(neff) <- "n_eff" x[[(length(x) + 1)]] <- neff } # p>0 if (pg0 == TRUE) { tpg <- data.frame(apply( ch_bind, 2, function(x) round(sum(x > 0) / length(x), 2) )) colnames(tpg) <- 'p>0' x[[(length(x) + 1)]] <- tpg } # custom function if (!is.null(func)) { if (!is.null(digits)) { tmp <- signif(apply(ch_bind, 2, func), digits = digits) } if (is.null(digits) & !is.null(round)) { tmp <- round(apply(ch_bind, 2, func), digits = round) } if (is.null(digits) & is.null(round)) { tmp <- apply(ch_bind, 2, func) } if (!is.null(dim(tmp))) { tmp <- data.frame(t(tmp)) } else { tmp <- data.frame(tmp) } if (!is.null(func_name)) { if (length(func_name) != NCOL(tmp)) { stop("length(func_name) must equal number of func outputs") } colnames(tmp) <- func_name } else { colnames(tmp) <- 'func' } x[[(length(x) + 1)]] <- tmp } # bind them mcmc_summary <- do.call("cbind", x) row.names(mcmc_summary) <- all_params[f_ind] if (variational) { mcmc_summary$Rhat <- NaN mcmc_summary$n.eff <- NaN } } return(mcmc_summary) } #' @noRd mcmc_chains = function( object, params = 'all', excl = NULL, ISB = TRUE, exact = TRUE, mcmc.list = FALSE, chain_num = NULL ) { #for rstanarm/brms objects - set to NULL by default sp_names <- NULL #if mcmc object (from nimble) - convert to mcmc.list if (methods::is(object, 'mcmc')) { object <- coda::mcmc.list(object) } #if list object of matrices (from nimble) - convert to mcmc.list if (methods::is(object, 'list')) { object <- coda::mcmc.list(lapply(object, function(x) coda::mcmc(x))) } if ( coda::is.mcmc.list(object) != TRUE & !methods::is(object, 'matrix') & !methods::is(object, 'mcmc') & !methods::is(object, 'list') & !methods::is(object, 'rjags') & !methods::is(object, 'stanfit') & !methods::is(object, 'brmsfit') & !methods::is(object, 'jagsUI') & !methods::is(object, 'CmdStanMCMC') ) { stop( 'Invalid object type. Input must be stanfit object (rstan), CmdStanMCMC object (cmdstanr), stanreg object (rstanarm), brmsfit object (brms), mcmc.list object (coda/rjags), mcmc object (coda/nimble), list object (nimble), rjags object (R2jags), jagsUI object (jagsUI), or matrix with MCMC chains.' ) } #NAME SORTING BLOCK if (methods::is(object, 'stanfit')) { #convert to mcmc.list temp_in <- rstan::As.mcmc.list(object) #assign new colnames for mcmc.list object if object exists (for stanreg and brms objs so parameter names are interpretable) - do not rename params for model fit directly with Stan if (!is.null(sp_names)) { coda::varnames(temp_in) <- sp_names } if (ISB == TRUE) { names <- vapply( strsplit(colnames(temp_in[[1]]), split = '[', fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { names <- colnames(temp_in[[1]]) } } if (methods::is(object, 'jagsUI')) { object <- object$samples } if (methods::is(object, 'CmdStanMCMC')) { object <- cmdstanr::as_mcmc.list(object) } if (coda::is.mcmc.list(object) == TRUE) { temp_in <- object if (is.null(colnames(temp_in[[1]]))) { warning('No parameter names provided. Assigning arbitrary names.') sub_cn <- paste0('Param_', 1:NCOL(temp_in[[1]])) colnames(temp_in[[1]]) <- sub_cn } if (ISB == TRUE) { names <- vapply( strsplit(colnames(temp_in[[1]]), split = "[", fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { names <- colnames(temp_in[[1]]) } } if (methods::is(object, 'matrix')) { temp_in <- object if (is.null(colnames(temp_in))) { warning( 'No parameter names (column names) provided. Assigning arbitrary names.' ) sub_cn <- paste0('Param_', 1:NCOL(temp_in)) colnames(temp_in) <- sub_cn } if (ISB == TRUE) { names <- vapply( strsplit(colnames(temp_in), split = "[", fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { names <- colnames(temp_in) } } if (methods::is(object, 'rjags')) { temp_in <- object$BUGSoutput$sims.matrix if (ISB == TRUE) { names <- vapply( strsplit( rownames(object$BUGSoutput$summary), split = "[", fixed = TRUE ), `[`, 1, FUN.VALUE = character(1) ) } else { names <- rownames(object$BUGSoutput$summary) } } #INDEX BLOCK #exclusions if (!is.null(excl)) { rm_ind <- c() for (i in 1:length(excl)) { if (ISB == TRUE) { n_excl <- vapply( strsplit(excl, split = "[", fixed = TRUE), `[`, 1, FUN.VALUE = character(1) ) } else { n_excl <- excl } if (exact == TRUE) { ind_excl <- which(names %in% n_excl[i]) } else { ind_excl <- grep(n_excl[i], names, fixed = FALSE) } if (length(ind_excl) < 1) { warning(paste0( "\"", excl[i], "\"", " not found in MCMC output. Check 'ISB' and 'exact' arguments to make sure the desired parsing methods are being used." )) } rm_ind <- c(rm_ind, ind_excl) } if (length(rm_ind) > 0) { dups <- which(duplicated(rm_ind)) if (length(dups) > 0) { rm_ind2 <- rm_ind[-dups] } else { rm_ind2 <- rm_ind } } else { excl <- NULL } } #selections if (length(params) == 1) { if (params == 'all') { if (is.null(excl)) { f_ind <- 1:length(names) } else { f_ind <- (1:length(names))[-rm_ind2] } } else { if (exact == TRUE) { get_ind <- which(names %in% params) } else { get_ind <- grep(paste(params), names, fixed = FALSE) } if (length(get_ind) < 1) { stop(paste0( "\"", params, "\"", " not found in MCMC output. Check `ISB` and `exact` arguments to make sure the desired parsing methods are being used." )) } if (!is.null(excl)) { if (identical(get_ind, rm_ind2)) { stop('No parameters selected.') } matched <- stats::na.omit(match(rm_ind2, get_ind)) if (length(matched) > 0) { f_ind <- get_ind[-matched] } else { f_ind <- get_ind } } else { f_ind <- get_ind } } } else { grouped <- c() for (i in 1:length(params)) { if (exact == TRUE) { get_ind <- which(names %in% params[i]) } else { get_ind <- grep(paste(params[i]), names, fixed = FALSE) } if (length(get_ind) < 1) { warning(paste0( "\"", params[i], "\"", " not found in MCMC output. Check 'ISB' and 'exact' arguments to make sure the desired parsing methods are being used." )) next() } grouped <- c(grouped, get_ind) } if (!is.null(excl)) { if (identical(grouped, rm_ind2)) { stop('No parameters selected.') } matched <- stats::na.omit(match(rm_ind2, grouped)) if (length(matched) > 0) { t_ind <- grouped[-matched] } else { t_ind <- grouped } to.rm <- which(duplicated(t_ind)) if (length(to.rm) > 0) { f_ind <- t_ind[-to.rm] } else { f_ind <- t_ind } } else { to.rm <- which(duplicated(grouped)) if (length(to.rm) > 0) { f_ind <- grouped[-to.rm] } else { f_ind <- grouped } } } #PROCESSING BLOCK if (is.null(chain_num)) { if (coda::is.mcmc.list(object) == TRUE | typeof(object) == 'S4') { if (length(f_ind) > 1) { dsort_mcmc <- do.call(coda::mcmc.list, temp_in[, f_ind]) OUT <- do.call('rbind', dsort_mcmc) } else { dsort_mcmc <- do.call(coda::mcmc.list, temp_in[, f_ind, drop = FALSE]) OUT <- as.matrix( do.call(coda::mcmc.list, temp_in[, f_ind, drop = FALSE]), ncol = 1 ) } } if (methods::is(object, 'matrix')) { OUT <- temp_in[, f_ind, drop = FALSE] if (mcmc.list == TRUE) { stop('Cannot produce mcmc.list output with matrix input') } } if (methods::is(object, 'rjags')) { OUT <- temp_in[, f_ind, drop = FALSE] if (mcmc.list == TRUE) { #modified coda::as.mcmc (removing ordering of param names) x <- object$BUGSoutput mclist <- vector("list", x$n.chains) mclis <- vector("list", x$n.chains) ord <- dimnames(x$sims.array)[[3]] for (i in 1:x$n.chains) { tmp1 <- x$sims.array[, i, ord] mclis[[i]] <- coda::mcmc(tmp1, thin = x$n.thin) } temp2 <- coda::as.mcmc.list(mclis) #end mod as.mcmc dsort_mcmc <- do.call(coda::mcmc.list, temp2[, f_ind, drop = FALSE]) } } } if (!is.null(chain_num)) { if (coda::is.mcmc.list(object) == TRUE | typeof(object) == 'S4') { if (length(f_ind) > 1) { dsort <- do.call(coda::mcmc.list, temp_in[, f_ind]) if (chain_num > length(dsort)) { stop('Invalid value for chain_num specified.') } dsort_mcmc <- dsort[[chain_num]] OUT <- as.matrix(dsort_mcmc) } else { dsort <- do.call(coda::mcmc.list, temp_in[, f_ind, drop = FALSE]) if (chain_num > length(dsort)) { stop('Invalid value for chain_num specified.') } dsort_mcmc <- dsort[[chain_num]] OUT <- as.matrix(dsort_mcmc) } } if (methods::is(object, 'matrix')) { stop( 'Cannot extract posterior information for individual chains from matrix input.' ) } if (methods::is(object, 'rjags')) { #modified coda::as.mcmc (removing ordering of param names) x <- object$BUGSoutput mclist <- vector("list", x$n.chains) mclis <- vector("list", x$n.chains) ord <- dimnames(x$sims.array)[[3]] for (i in 1:x$n.chains) { tmp1 <- x$sims.array[, i, ord] mclis[[i]] <- coda::mcmc(tmp1, thin = x$n.thin) } temp2 <- coda::as.mcmc.list(mclis) #end mod as.mcmc dsort <- do.call(coda::mcmc.list, temp2[, f_ind, drop = FALSE]) if (chain_num > length(dsort)) { stop('Invalid value for chain_num specified.') } dsort_mcmc <- dsort[[chain_num]] OUT <- as.matrix(dsort_mcmc) } } if (mcmc.list == FALSE) { return(OUT) } if (mcmc.list == TRUE) { return(dsort_mcmc) } } #### Vectorise a stan model's likelihood for quicker computation #### #' @noRd #' @param model_file Stan model file to be edited #' @param model_data Prepared mvgam data for Stan modelling #' @param family \code{character} #' @param trend_model \code{character} specifying the time series dynamics for the latent trend. #' @param offset \code{logical} #' @param drift \code{logical} #' @param threads \code{integer} Experimental option to use multithreading for within-chain #'parallelisation in \code{Stan}. We recommend its use only if you are experienced with #'\code{Stan}'s `reduce_sum` function and have a slow running model that cannot be sped #'up by any other means #' @return A `list` containing the updated Stan model and model data vectorise_stan_lik = function( model_file, model_data, family = 'poisson', trend_model = 'None', use_lv = FALSE, offset = FALSE, drift = FALSE, threads = 1 ) { if (family %in% c('binomial', 'beta_binomial', 'bernoulli')) { family <- 'poisson' } if (use_lv & trend_model %in% c('None', 'ZMVN')) { trend_model <- 'RW' } # Hack for adding VAR1 models if (trend_model %in% c('VAR1', 'VAR1cor')) { VAR1 <- TRUE trend_model <- 'RW' } else { VAR1 <- FALSE } # Similar hack for adding piecewise trends if (trend_model %in% c('PWlinear', 'PWlogistic')) { trend_model <- 'RW' } #### Family specifications #### if (threads > 1) { if (family == 'gaussian') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(int[] seq, int start, int end,\n', ifelse( offset, 'data vector Y, matrix X, vector b, vector sigma_obs, vector alpha) {\n', 'data vector Y, matrix X, vector b, vector sigma_obs, real alpha) {\n' ), 'real ptarget = 0;\n', ifelse( offset, 'ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha[start:end], b, sigma_obs[start:end]);\n', 'ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha, b, sigma_obs[start:end]);\n' ), 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(int[] seq, int start, int end,\n', ifelse( offset, 'data vector Y, matrix X, vector b, vector sigma_obs, vector alpha) {\n', 'data vector Y, matrix X, vector b, vector sigma_obs, real alpha) {\n' ), 'real ptarget = 0;\n', ifelse( offset, 'ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha[start:end], b, sigma_obs[start:end]);\n', 'ptarget += normal_id_glm_lpdf(Y[start:end] | X[start:end], alpha, b, sigma_obs[start:end]);\n' ), 'return ptarget;\n', '}\n}\n' ) } } if (family == 'poisson') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(int[] seq, int start, int end,\n', ifelse( offset, 'data array[] int Y, matrix X, vector b, vector alpha) {\n', 'data array[] int Y, matrix X, vector b, real alpha) {\n' ), 'real ptarget = 0;\n', ifelse( offset, 'ptarget += poisson_log_glm_lpmf(Y[start:end] | X[start:end], alpha[start:end], b);\n', 'ptarget += poisson_log_glm_lpmf(Y[start:end] | X[start:end], alpha, b);\n' ), 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(int[] seq, int start, int end,\n', ifelse( offset, 'data array[] int Y, matrix X, vector b, vector alpha) {\n', 'data array[] int Y, matrix X, vector b, real alpha) {\n' ), 'real ptarget = 0;\n', ifelse( offset, 'ptarget += poisson_log_glm_lpmf(Y[start:end] | X[start:end], alpha[start:end], b);\n', 'ptarget += poisson_log_glm_lpmf(Y[start:end] | X[start:end], alpha, b);\n' ), 'return ptarget;\n', '}\n}\n' ) } } if (family == 'lognormal') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector sigma_obs) {\n', 'real ptarget = 0;\n', 'ptarget += lognormal_lpdf(Y[start:end] | mu[start:end],\n', 'sigma_obs[start:end]);\n', 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector sigma_obs) {\n', 'real ptarget = 0;\n', 'ptarget += lognormal_lpdf(Y[start:end] | mu[start:end],\n', 'sigma_obs[start:end]);\n', 'return ptarget;\n', '}\n}' ) } } if (family == 'beta') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector phi) {\n', 'real ptarget = 0;\n', 'ptarget += beta_lpdf(Y[start:end] | inv_logit(mu[start:end]) .* phi[start:end],\n', '(1 - inv_logit(mu[start:end])) .* phi[start:end]);\n', 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector phi) {\n', 'real ptarget = 0;\n', 'ptarget += beta_lpdf(Y[start:end] | inv_logit(mu[start:end]) .* phi[start:end],\n', '(1 - inv_logit(mu[start:end])) .* phi[start:end]);\n', 'return ptarget;\n', '}\n}' ) } } if (family == 'student') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector sigma_obs, vector nu) {\n', 'real ptarget = 0;\n', 'ptarget += student_t_lpdf(Y[start:end] | nu[start:end], mu[start:end],\n', 'sigma_obs[start:end]);\n', 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector sigma_obs, vector nu) {\n', 'real ptarget = 0;\n', 'ptarget += student_t_lpdf(Y[start:end] | nu[start:end], mu[start:end],\n', 'sigma_obs[start:end]);\n', 'return ptarget;\n', '}\n}' ) } } if (family == 'negative binomial') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data array[] int Y, vector mu, array[] real phi) {\n', 'real ptarget = 0;\n', 'ptarget += neg_binomial_2_lpmf(Y[start:end] | mu[start:end],\n', 'inv(phi[start:end]));\n', 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data array[] int Y, vector mu, array[] real phi) {\n', 'real ptarget = 0;\n', 'ptarget += neg_binomial_2_lpmf(Y[start:end] | mu[start:end],\n', 'inv(phi[start:end]));\n', 'return ptarget;\n', '}\n}' ) } } if (family == 'Gamma') { if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector shape) {\n', 'real ptarget = 0;\n', 'ptarget += gamma_lpdf(Y[start:end] | shape[start:end], shape[start:end] ./ mu[start:end]);\n', 'return ptarget;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'real partial_log_lik(array[] int seq, int start, int end,\n', 'data vector Y, vector mu, vector shape) {\n', 'real ptarget = 0;\n', 'ptarget += gamma_lpdf(Y[start:end] | shape[start:end], shape[start:end] ./ mu[start:end]);\n', 'return ptarget;\n', '}\n}' ) } } model_file <- readLines(textConnection(model_file), n = -1) } lik_line <- grep('// likelihood functions', model_file, fixed = TRUE) model_file <- model_file[-c(lik_line:(lik_line + 6))] if (family == 'gaussian') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', 'append_col(flat_xs, flat_trends),\n', 'append_row(b, 1.0),\n', 'flat_sigma_obs,\n', ifelse(offset, 'offset[obs_ind],\n);\n}\n', '0.0);\n}\n}\n') ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'flat_ys ~ normal_id_glm(append_col(flat_xs, flat_trends),\n', ifelse(offset, 'offset[obs_ind],', '0.0,'), 'append_row(b, 1.0),\n', 'flat_sigma_obs);\n}\n}\n' ) } } if (family == 'poisson') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', 'append_col(flat_xs, flat_trends),\n', 'append_row(b, 1.0),\n', ifelse(offset, 'offset[obs_ind]);\n}\n', '0.0);\n}\n}\n') ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),\n', ifelse(offset, 'offset[obs_ind],', '0.0,'), 'append_row(b, 1.0));\n}\n}\n' ) } } if (family == 'lognormal') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', ifelse( offset, 'append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind],\n', 'append_col(flat_xs, flat_trends) * append_row(b, 1.0),\n' ), 'flat_sigma_obs);\n}\n}\n' ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'flat_ys ~ lognormal(\n', ifelse( offset, 'append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind],\n', 'append_col(flat_xs, flat_trends) * append_row(b, 1.0),\n' ), 'flat_sigma_obs);\n}\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } if (family == 'beta') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_phis;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_phis = rep_each(phi, n)[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', ifelse( offset, 'append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind],\n', 'append_col(flat_xs, flat_trends) * append_row(b, 1.0),\n' ), 'flat_phis);\n}\n}\n' ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_phis;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_phis = rep_each(phi, n)[obs_ind];\n', 'flat_ys ~ beta(\n', ifelse( offset, 'inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind]) .* flat_phis,\n', 'inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0)) .* flat_phis,\n' ), ifelse( offset, '(1 - inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind])) .* flat_phis);\n}\n}\n', '(1 - inv_logit(append_col(flat_xs, flat_trends) * append_row(b, 1.0))) .* flat_phis);\n}\n}\n' ) ) } model_file <- readLines(textConnection(model_file), n = -1) } if (family == 'Gamma') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_shapes;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_shapes = rep_each(shape, n)[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', ifelse( offset, 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind]),\n', 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)),\n' ), 'flat_shapes);\n}\n}\n' ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_shapes;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_shapes = rep_each(shape, n)[obs_ind];\n', 'flat_ys ~ gamma(\n', 'flat_shapes, flat_shapes ./ ', ifelse( offset, 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind])', 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0))' ), ');\n}\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } if (family == 'student') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'vector[n_nonmissing] flat_nu;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'flat_nu = rep_each(nu, n)[obs_ind];\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', ifelse( offset, 'append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind],\n', 'append_col(flat_xs, flat_trends) * append_row(b, 1.0),\n' ), 'flat_sigma_obs, flat_nu);\n}\n}\n' ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'vector[n_nonmissing] flat_sigma_obs;\n', 'vector[n_nonmissing] flat_nu;\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];\n', 'flat_nu = rep_each(nu, n)[obs_ind];\n', 'flat_ys ~ student_t(flat_nu,\n', ifelse( offset, 'append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind],\n', 'append_col(flat_xs, flat_trends) * append_row(b, 1.0),\n' ), 'flat_sigma_obs);\n}\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } if (family == 'negative binomial') { if (threads > 1) { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'real flat_phis[n_nonmissing];\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);\n', 'target += reduce_sum(partial_log_lik, seq,\n', 'grainsize,\n', 'flat_ys,\n', ifelse( offset, 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind]),\n', 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)),\n' ), 'flat_phis);\n}\n}\n' ) } else { model_file[lik_line] <- paste0( '{\n// likelihood functions\n', 'vector[n_nonmissing] flat_trends;\n', 'real flat_phis[n_nonmissing];\n', 'flat_trends = (to_vector(trend))[obs_ind];\n', 'flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);\n', 'flat_ys ~ neg_binomial_2(\n', ifelse( offset, 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0) + offset[obs_ind]),\n', 'exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)),\n' ), 'inv(flat_phis));\n}\n}\n' ) } model_file <- readLines(textConnection(model_file), n = -1) } # Add the rep_each function to replicate series-varying parameters for particular families if ( family %in% c( 'negative binomial', 'gaussian', 'lognormal', 'student', 'Gamma', 'beta' ) ) { model_file <- readLines(textConnection(model_file), n = -1) if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', 'vector rep_each(vector x, int K) {\n', 'int N = rows(x);\n', 'vector[N * K] y;\n', 'int pos = 1;\n', 'for (n in 1:N) {\n', 'for (k in 1:K) {\n', 'y[pos] = x[n];\n', 'pos += 1;\n', '}\n', '}\n', 'return y;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', 'vector rep_each(vector x, int K) {\n', 'int N = rows(x);\n', 'vector[N * K] y;\n', 'int pos = 1;\n', 'for (n in 1:N) {\n', 'for (k in 1:K) {\n', 'y[pos] = x[n];\n', 'pos += 1;\n', '}\n', '}\n', 'return y;\n', '}\n}' ) } model_file <- readLines(textConnection(model_file), n = -1) } #### Data modifications #### # Gather the number of nonmissing observations model_data$n_nonmissing <- length(which(model_data$y_observed == 1)) # Grab indices of nonmissing ys and include reduced sets of ys and Xs model_data$obs_ind <- which(as.vector(model_data$y_observed) == 1) model_data$flat_ys <- as.vector(model_data$y)[which( as.vector(model_data$y_observed) == 1 )] model_data$X <- t(model_data$X) model_data$flat_xs <- as.matrix(model_data$X[ as.vector(model_data$ytimes)[model_data$obs_ind], ]) # Add a grainsize integer if (threads > 1) { model_data$seq <- 1:model_data$n_nonmissing model_data$grainsize <- max( 100, floor(length(as.vector(model_data$y)) / threads) ) } # Update the data statement obs_line <- grep( 'int y_observed[n, n_series]; // indices of missing vs observed', model_file, fixed = TRUE ) model_file <- model_file[-c(obs_line:(obs_line + 2))] obs_format <- 'int flat_ys[n_nonmissing];' if (family %in% c('gaussian', 'student')) { obs_format <- 'vector[n_nonmissing] flat_ys;' } if (family %in% c('Gamma', 'lognormal')) { obs_format <- 'vector[n_nonmissing] flat_ys;' } if (family == 'beta') { obs_format <- 'vector[n_nonmissing] flat_ys;' } if (threads > 1) { model_file[obs_line] <- paste0( 'int n_nonmissing;', ' // number of nonmissing observations\n', obs_format, ' // flattened nonmissing observations\n', 'matrix[n_nonmissing, num_basis] flat_xs;', ' // X values for nonmissing observations\n', 'int obs_ind[n_nonmissing];', ' // indices of nonmissing observations\n', 'int grainsize;', ' // grainsize for reduce_sum threading\n', 'int seq[n_nonmissing];', ' // an integer sequence for reduce_sum slicing\n', '}' ) } else { model_file[obs_line] <- paste0( 'int n_nonmissing;', ' // number of nonmissing observations\n', obs_format, ' // flattened nonmissing observations\n', 'matrix[n_nonmissing, num_basis] flat_xs;', ' // X values for nonmissing observations\n', 'int obs_ind[n_nonmissing];', ' // indices of nonmissing observations\n', '}' ) } # Some final edits to improve efficiency of the Stan models model_file <- gsub( 'row_vector[num_basis] b_raw;', 'vector[num_basis] b_raw;', model_file, fixed = TRUE ) model_file <- gsub( 'row_vector[num_basis] b;', 'vector[num_basis] b;', model_file, fixed = TRUE ) model_file <- gsub( 'matrix[num_basis, total_obs] X; // transposed mgcv GAM design matrix', 'matrix[total_obs, num_basis] X; // mgcv GAM design matrix', model_file, fixed = TRUE ) model_file <- model_file[ -(grep( '// GAM contribution to expectations (log scale)', model_file, fixed = TRUE ):(grep( '// GAM contribution to expectations (log scale)', model_file, fixed = TRUE ) + 5)) ] if (trend_model == 'GP') { model_file <- model_file[ -(grep('eta = to_vector(b * X);', model_file, fixed = TRUE)) ] model_file <- model_file[ -((grep( 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];', model_file, fixed = TRUE ) - 1):(grep( 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];', model_file, fixed = TRUE ) + 1)) ] } else { model_file <- model_file[ -(grep('eta = to_vector(b * X);', model_file, fixed = TRUE):(grep( 'eta = to_vector(b * X);', model_file, fixed = TRUE ) + 4)) ] } model_file <- model_file[ -((grep('// posterior predictions', model_file, fixed = TRUE) + 1):(grep( '// posterior predictions', model_file, fixed = TRUE ) + 3)) ] model_file[grep( 'generated quantities {', model_file, fixed = TRUE )] <- paste0( 'generated quantities {\n', 'vector[total_obs] eta;\n', 'matrix[n, n_series] mus;' ) if (family == 'poisson') { model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = poisson_log_rng(mus[1:n, s]);\n', '}' ) } if (family == 'negative binomial') { model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = neg_binomial_2_rng(exp(mus[1:n, s]), phi_vec[1:n, s]);\n', '}' ) } if (family == 'gaussian') { model_file[grep( 'array[n, n_series] int ypred;', model_file, fixed = TRUE )] <- 'array[n, n_series] real ypred;' model_file = readLines(textConnection(model_file), n = -1) model_file[grep( 'vector[num_basis] b_raw;', model_file, fixed = TRUE )] <- paste0( 'vector[num_basis] b_raw;\n', '// gaussian observation error\n', 'vector[n_series] sigma_obs;' ) model_file[ grep('// likelihood functions', model_file, fixed = TRUE) - 1 ] <- paste0( '// priors for observation error parameters\n', 'sigma_obs ~ student_t(3, 0, 2);\n', '{' ) model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] sigma_obs_vec;\n', 'matrix[n, n_series] mus;' ) model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for (s in 1:n_series) {\n', 'sigma_obs_vec[1:n,s] = rep_vector(sigma_obs[s], n);\n', '}\n', 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = normal_rng(mus[1:n, s], sigma_obs_vec[1:n, s]);\n', '}' ) } if (family == 'student') { model_file[grep( 'array[n, n_series] int ypred;', model_file, fixed = TRUE )] <- 'array[n, n_series] real ypred;' model_file = readLines(textConnection(model_file), n = -1) model_file[grep( 'vector[num_basis] b_raw;', model_file, fixed = TRUE )] <- paste0( 'vector[num_basis] b_raw;\n', '// student-t observation error\n', 'vector[n_series] sigma_obs;\n', '// student-t df parameters\n', 'vector[n_series] nu;' ) model_file[ grep('// likelihood functions', model_file, fixed = TRUE) - 1 ] <- paste0( '// priors for observation error parameters\n', 'sigma_obs ~ student_t(3, 0, 2);\n', '// priors for df parameters\n', 'nu ~ gamma(2, 0.1);\n', '{' ) model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] sigma_obs_vec;\n', 'matrix[n, n_series] nu_vec;\n', 'matrix[n, n_series] mus;' ) model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for (s in 1:n_series) {\n', 'sigma_obs_vec[1:n,s] = rep_vector(sigma_obs[s], n);\n', 'nu_vec[1:n,s] = rep_vector(nu[s], n);\n', '}\n', 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = student_t_rng(nu_vec[1:n, s], mus[1:n, s], sigma_obs_vec[1:n, s]);\n', '}' ) } if (family == 'lognormal') { model_file[grep( 'array[n, n_series] int ypred;', model_file, fixed = TRUE )] <- 'real ypred[n, n_series];' model_file = readLines(textConnection(model_file), n = -1) model_file[grep( 'vector[num_basis] b_raw;', model_file, fixed = TRUE )] <- paste0( 'vector[num_basis] b_raw;\n', '// lognormal observation error\n', 'vector[n_series] sigma_obs;' ) model_file[ grep('// likelihood functions', model_file, fixed = TRUE) - 1 ] <- paste0( '// priors for log(observation error) parameters\n', 'sigma_obs ~ student_t(3, 0, 1);\n', '{' ) model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0( 'matrix[n, n_series] sigma_obs_vec;\n', 'matrix[n, n_series] mus;' ) model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for (s in 1:n_series) {\n', 'sigma_obs_vec[1:n,s] = rep_vector(sigma_obs[s], n);\n', '}\n', 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = lognormal_rng(mus[1:n, s], sigma_obs_vec[1:n, s]);\n', '}' ) } if (family == 'beta') { model_file[grep( 'array[n, n_series] int ypred;', model_file, fixed = TRUE )] <- 'real ypred[n, n_series];' model_file = readLines(textConnection(model_file), n = -1) model_file[grep( 'vector[num_basis] b_raw;', model_file, fixed = TRUE )] <- paste0( 'vector[num_basis] b_raw;\n', '// Beta precision parameters\n', 'vector[n_series] phi;' ) model_file[ grep('// likelihood functions', model_file, fixed = TRUE) - 1 ] <- paste0( '// priors for precision parameters\n', 'phi ~ gamma(0.01, 0.01);\n', '{' ) model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0('matrix[n, n_series] phi_vec;\n', 'matrix[n, n_series] mus;') model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for (s in 1:n_series) {\n', 'phi_vec[1:n,s] = rep_vector(phi[s], n);\n', '}\n', 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = beta_rng(inv_logit(mus[1:n, s]) .* phi_vec[1:n, s], (1 - inv_logit(mus[1:n, s])) .* phi_vec[1:n, s]);\n', '}' ) } if (family == 'Gamma') { model_file[grep( 'array[n, n_series] int ypred;', model_file, fixed = TRUE )] <- 'real ypred[n, n_series];' model_file = readLines(textConnection(model_file), n = -1) model_file[grep( 'vector[num_basis] b_raw;', model_file, fixed = TRUE )] <- paste0( 'vector[num_basis] b_raw;\n', '// Gamma shape parameters\n', 'vector[n_series] shape;' ) model_file[ grep('// likelihood functions', model_file, fixed = TRUE) - 1 ] <- paste0( '// priors for shape parameters\n', 'shape ~ gamma(0.01, 0.01);\n', '{' ) model_file[grep( 'matrix[n, n_series] mus;', model_file, fixed = TRUE )] <- paste0('matrix[n, n_series] shape_vec;\n', 'matrix[n, n_series] mus;') model_file[grep( '// posterior predictions', model_file, fixed = TRUE )] <- paste0( '// posterior predictions\n', ifelse(offset, 'eta = X * b + offset;\n', 'eta = X * b;\n'), 'for (s in 1:n_series) {\n', 'shape_vec[1:n,s] = rep_vector(shape[s], n);\n', '}\n', 'for(s in 1:n_series){ \n', 'mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];\n', 'ypred[1:n, s] = gamma_rng(shape_vec[1:n, s], shape_vec[1:n, s] ./ exp(mus[1:n, s]));\n', '}' ) } #### Trend modifications #### # Vectorise trend models if (trend_model == 'RW') { if (any(grepl('// dynamic factor estimates', model_file, fixed = TRUE))) { init_trend_line <- grep( 'LV_raw[1, j] ~ normal(0, 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'LV_raw[1, 1:n_lv] ~ normal(0, 0.1);' if (drift) {} else { remainder_line <- grep( 'LV_raw[2:n, j] ~ normal(LV_raw[1:(n - 1), j], 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(j in 1:n_lv){\n', 'LV_raw[2:n, j] ~ normal(LV_raw[1:(n - 1), j], 0.1);\n', '}' ) } model_file = readLines(textConnection(model_file), n = -1) } else { init_trend_line <- grep( 'trend[1, s] ~ normal(0, sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'trend[1, 1:n_series] ~ normal(0, sigma);' if (drift) {} else { remainder_line <- grep( 'trend[2:n, s] ~ normal(trend[1:(n - 1), s], sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(s in 1:n_series){\n', 'trend[2:n, s] ~ normal(trend[1:(n - 1), s], sigma[s]);\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } } } if (trend_model == 'CAR1') { if (any(grepl('// dynamic factor estimates', model_file, fixed = TRUE))) { init_trend_line <- grep( 'LV_raw[1, j] ~ normal(0, 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'LV_raw[1, 1:n_lv] ~ normal(0, 0.1);' remainder_line <- grep( 'LV_raw[2:n, j] ~ normal(ar1[j] * LV_raw[1:(n - 1), j], 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(j in 1:n_lv){\n', 'LV_raw[2:n, j] ~ normal(pow(ar1[j], to_vector(time_dis[2:n, j])) .* LV_raw[1:(n - 1), j], 0.1);\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } else { init_trend_line <- grep( 'trend[1, s] ~ normal(0, sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'trend[1, 1:n_series] ~ normal(0, sigma);' remainder_line <- grep( 'trend[2:n, s] ~ normal(ar1[s] * trend[1:(n - 1), s], sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(s in 1:n_series){\n', 'trend[2:n, s] ~ normal(pow(ar1[s], to_vector(time_dis[2:n, s])) ', '.* trend[1:(n - 1), s], ', 'sigma[s] * (1 - ar1[s]^(2*to_vector(time_dis[2:n, s]))) / (1 - ar1[s]^2));\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } } if (trend_model == 'AR1') { if (any(grepl('// dynamic factor estimates', model_file, fixed = TRUE))) { init_trend_line <- grepws( 'LV_raw[1, j] ~ normal(0, 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'LV_raw[1, 1:n_lv] ~ normal(0, 0.1);' if (drift) {} else { remainder_line <- grepws( 'LV_raw[2:n, j] ~ normal(ar1[j] * LV_raw[1:(n - 1), j], 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(j in 1:n_lv){\n', 'LV_raw[2:n, j] ~ normal(ar1[j] * LV_raw[1:(n - 1), j], 0.1);\n', '}' ) } model_file = readLines(textConnection(model_file), n = -1) } else { init_trend_line <- grepws( 'trend[1, s] ~ normal(0, sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'trend[1, 1:n_series] ~ normal(0, sigma);' if (drift) {} else { remainder_line <- grepws( 'trend[2:n, s] ~ normal(ar1[s] * trend[1:(n - 1), s], sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(remainder_line:(remainder_line + 2))] model_file[remainder_line] <- paste0( 'for(s in 1:n_series){\n', 'trend[2:n, s] ~ normal(ar1[s] * trend[1:(n - 1), s], sigma[s]);\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } } } if (trend_model == 'AR2') { if (any(grepl('// dynamic factor estimates', model_file, fixed = TRUE))) { init_trend_line <- grepws( 'LV_raw[1, j] ~ normal(0, 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'LV_raw[1, 1:n_lv] ~ normal(0, 0.1);' if (drift) {} else { second_line <- grepws( 'LV_raw[2, j] ~ normal(LV_raw[1, j] * ar1[j], 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(second_line:(second_line + 2))] model_file[second_line] <- 'LV_raw[2, 1:n_lv] ~ normal(LV_raw[1, 1:n_lv] * ar1, 0.1);' remainder_line <- grepws( 'LV_raw[i, j] ~ normal(ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j]', model_file, fixed = TRUE ) - 2 model_file <- model_file[-c(remainder_line:(remainder_line + 3))] model_file[remainder_line] <- paste0( 'for(j in 1:n_lv){\n', 'LV_raw[3:n, j] ~ normal(ar1[j] * LV_raw[2:(n - 1), j] + ar2[j] * LV_raw[1:(n - 2), j], 0.1);\n', '}' ) } model_file = readLines(textConnection(model_file), n = -1) } else { init_trend_line <- grepws( 'trend[1, s] ~ normal(0, sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'trend[1, 1:n_series] ~ normal(0, sigma);' if (drift) {} else { second_line <- grep( 'trend[2, s] ~ normal(trend[1, s] * ar1[s], sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(second_line:(second_line + 2))] model_file[second_line] <- 'trend[2, 1:n_series] ~ normal(trend[1, 1:n_series] * ar1, sigma);' remainder_line <- grep( 'trend[i, s] ~ normal(ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s]', model_file, fixed = TRUE ) - 2 model_file <- model_file[-c(remainder_line:(remainder_line + 3))] model_file[remainder_line] <- paste0( 'for(s in 1:n_series){\n', 'trend[3:n, s] ~ normal(ar1[s] * trend[2:(n - 1), s] + ar2[s] * trend[1:(n - 2), s], sigma[s]);\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } } } if (trend_model == 'AR3') { if (any(grepl('// dynamic factor estimates', model_file, fixed = TRUE))) { init_trend_line <- grepws( 'LV_raw[1, j] ~ normal(0, 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'LV_raw[1, 1:n_lv] ~ normal(0, 0.1);' if (drift) {} else { second_line <- grep( 'LV_raw[2, j] ~ normal(LV_raw[1, j] * ar1[j], 0.1)', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(second_line:(second_line + 2))] model_file[second_line] <- 'LV_raw[2, 1:n_lv] ~ normal(LV_raw[1, 1:n_lv] * ar1, 0.1);' third_line <- grep( 'LV_raw[3, j] ~ normal(LV_raw[2, j] * ar1[j] + LV_raw[1, j] * ar2[j]', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(third_line:(third_line + 2))] model_file[third_line] <- 'LV_raw[3, 1:n_lv] ~ normal(LV_raw[2, 1:n_lv] * ar1 + LV_raw[1, 1:n_lv] * ar2, 0.1);' remainder_line <- grep( 'LV_raw[i, j] ~ normal(ar1[j] * LV_raw[i - 1, j] + ar2[j] * LV_raw[i - 2, j] + ar3[j] * LV_raw[i - 3, j]', model_file, fixed = TRUE ) - 2 model_file <- model_file[-c(remainder_line:(remainder_line + 3))] model_file[remainder_line] <- paste0( 'for(j in 1:n_lv){\n', 'LV_raw[4:n, j] ~ normal(ar1[j] * LV_raw[3:(n - 1), j] + ar2[j] * LV_raw[2:(n - 2), j] + ar3[j] * LV_raw[1:(n - 3), j], 0.1);\n', '}' ) } model_file = readLines(textConnection(model_file), n = -1) } else { init_trend_line <- grepws( 'trend[1, s] ~ normal(0, sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(init_trend_line:(init_trend_line + 2))] model_file[init_trend_line] <- 'trend[1, 1:n_series] ~ normal(0, sigma);' if (drift) {} else { second_line <- grepws( 'trend[2, s] ~ normal(trend[1, s] * ar1[s], sigma[s])', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(second_line:(second_line + 2))] model_file[second_line] <- 'trend[2, 1:n_series] ~ normal(trend[1, 1:n_series] * ar1, sigma);' third_line <- grepws( 'trend[3, s] ~ normal(trend[2, s] * ar1[s] + trend[1, s] * ar2[s]', model_file, fixed = TRUE ) - 1 model_file <- model_file[-c(third_line:(third_line + 2))] model_file[third_line] <- 'trend[3, 1:n_series] ~ normal(trend[2, 1:n_series] * ar1 + trend[1, 1:n_series] * ar2, sigma);' remainder_line <- grepws( 'trend[i, s] ~ normal(ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s] + ar3[s] * trend[i - 3, s]', model_file, fixed = TRUE ) - 2 model_file <- model_file[-c(remainder_line:(remainder_line + 3))] model_file[remainder_line] <- paste0( 'for(s in 1:n_series){\n', 'trend[4:n, s] ~ normal(ar1[s] * trend[3:(n - 1), s] + ar2[s] * trend[2:(n - 2), s] + ar3[s] * trend[1:(n - 3), s], sigma[s]);\n', '}' ) model_file = readLines(textConnection(model_file), n = -1) } } } # Clean to remove trend components if this is a 'None' trend model if (trend_model == 'None') { model_file = readLines(textConnection(model_file), n = -1) model_file <- gsub(' + trend[1:n, s]', '', model_file, fixed = TRUE) model_file <- gsub( 'exp(append_col(flat_xs, flat_trends)', 'exp(flat_xs', model_file, fixed = TRUE ) model_file <- gsub( 'append_col(flat_xs, flat_trends)', 'flat_xs', model_file, fixed = TRUE ) model_file <- gsub('append_row(b, 1.0)', 'b', model_file, fixed = TRUE) model_file <- model_file[ -grep('vector[n_nonmissing] flat_trends;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep( 'flat_trends = (to_vector(trend))[obs_ind];', model_file, fixed = TRUE ) ] } # New additions for VAR1 models if (VAR1) { model_file <- model_file[ -grep('vector[n_series] tau;', model_file, fixed = TRUE) ] model_file[grep('// latent trends', model_file, fixed = TRUE)] <- '// raw latent trends' model_file[grep('matrix[n, n_series] trend;', model_file, fixed = TRUE)] <- 'vector[n_series] trend_raw[n];' model_file[ grep('// latent trend variance parameters', model_file, fixed = TRUE) - 1 ] <- paste0( '\n// latent trend VAR1 terms\n', 'matrix[n_series, n_series] A;\n' ) model_file = readLines(textConnection(model_file), n = -1) model_file[grep('vector[num_basis] b;', model_file, fixed = TRUE)] <- paste0( 'vector[num_basis] b;', '\n// trend estimates in matrix-form\n', 'matrix[n, n_series] trend;\n', '\nfor(i in 1:n){\n', 'trend[i, 1:n_series] = to_row_vector(trend_raw[i]);\n', '}\n' ) model_file = readLines(textConnection(model_file), n = -1) model_file[grep('model {', model_file, fixed = TRUE)] <- paste0( 'model {\n', '// latent trend mean parameters\n', 'vector[n_series] mu[n - 1];\n' ) model_file[grep('sigma ~ exponential(2);', model_file, fixed = TRUE)] <- paste0( 'sigma ~ inv_gamma(2.3693353, 0.7311319);\n\n', '// VAR coefficients\n', 'to_vector(A) ~ normal(0, 0.5);\n\n', '// trend means\n', 'for(i in 2:n){\n', 'mu[i - 1] = A * trend_raw[i - 1];\n', '}\n\n', '// stochastic latent trends (contemporaneously uncorrelated)\n', 'trend_raw[1] ~ normal(0, sigma);\n', 'for(i in 2:n){\n', 'trend_raw[i] ~ normal(mu[i - 1], sigma);\n', '}\n' ) model_file = readLines(textConnection(model_file), n = -1) model_file <- model_file[ -c( (grep( "trend[1, 1:n_series] ~ normal(0, sigma);", model_file, fixed = TRUE ) - 2):(grep( "trend[1, 1:n_series] ~ normal(0, sigma);", model_file, fixed = TRUE ) + 3) ) ] model_file[grep("generated quantities {", model_file, fixed = TRUE)] <- paste0('generated quantities {\n', 'matrix[n_series, n_series] Sigma;') model_file = readLines(textConnection(model_file), n = -1) model_file <- model_file[ -c( (grep("tau[s] = pow(sigma[s], -2.0);", model_file, fixed = TRUE) - 1):(grep("tau[s] = pow(sigma[s], -2.0);", model_file, fixed = TRUE) + 1) ) ] model_file[ grep("// posterior predictions", model_file, fixed = TRUE) - 1 ] <- paste0('Sigma = diag_matrix(square(sigma));\n') model_file = readLines(textConnection(model_file), n = -1) } # Add time_dis array for tracking length between observations for # continuous time AR models if (trend_model == 'CAR1') { model_file[grep( 'int ytimes[n, n_series]; //', model_file, fixed = TRUE )] <- paste0( 'int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)\n', 'array[n, n_series] real time_dis; // temporal distances between observations' ) model_file = readLines(textConnection(model_file), n = -1) } # Change variable 'offset' to 'off_set' to avoid any issues with later # versions of cmdstan if (any(grepl('offset', model_file, fixed = TRUE))) { model_file <- gsub('offset', 'off_set', model_file) model_file <- gsub('off_set vector', 'offset vector', model_file) model_data$off_set <- model_data$offset model_data$offset <- NULL } # Tidying the representation if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file <- model_file[ -(grep( '// Stan model code generated by package mvgam', model_file, fixed = TRUE )) ] model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0('// Stan model code generated by package mvgam\n', 'functions {') } return(list( model_file = readLines(textConnection(model_file), n = -1), model_data = model_data )) } #### Modifications to Stan code for setting up trend mapping #### #' @noRd trend_map_mods = function( model_file, model_data, trend_map, trend_model, n_lv, data_train, ytimes ) { if (trend_model == 'ZMVN') { trend_model <- 'RW' } if (trend_model != 'VAR1') { # Model code should be modified to remove any priors and modelling for the # latent variable coefficients and sign corrections model_file <- model_file[ -c( grep( '// dynamic factor lower triangle loading coefficients', model_file, fixed = TRUE ):(grep( '// dynamic factor lower triangle loading coefficients', model_file, fixed = TRUE ) + 2) ) ] model_file <- model_file[ -c( grep( '// Number of non-zero lower triangular factor loadings', model_file, fixed = TRUE ):(grep( '// Number of non-zero lower triangular factor loadings', model_file, fixed = TRUE ) + 3) ) ] model_file <- model_file[ -c( grep( '// constraints allow identifiability of loadings', model_file, fixed = TRUE ):(grep( '// constraints allow identifiability of loadings', model_file, fixed = TRUE ) + 15) ) ] model_file <- model_file[ -grep('matrix[n_series, n_lv] lv_coefs_raw;', model_file, fixed = TRUE) ] model_file <- model_file[ -grep('matrix[n_series, n_lv] lv_coefs;', model_file, fixed = TRUE) ] model_file <- model_file[ -c( grep( '// priors for dynamic factor loading coefficients', model_file, fixed = TRUE ):(grep( '// priors for dynamic factor loading coefficients', model_file, fixed = TRUE ) + 2) ) ] model_file <- model_file[ -c( grep( '// Sign correct factor loadings and factors', model_file, fixed = TRUE ):(grep( '// Sign correct factor loadings and factors', model_file, fixed = TRUE ) + 9) ) ] model_file <- model_file[ -grep('matrix[n, n_lv] LV;', model_file, fixed = TRUE) ] model_file <- gsub('LV_raw', 'LV', model_file) model_file <- gsub('lv_coefs_raw', 'lv_coefs', model_file) model_file[grep("matrix[n, n_series] trend;", model_file, fixed = TRUE)] <- paste0('matrix[n, n_series] trend;\n', 'matrix[n_series, n_lv] lv_coefs;') model_file[grep("// derived latent trends", model_file, fixed = TRUE)] <- paste0('// derived latent trends\n', 'lv_coefs = Z;') model_file <- readLines(textConnection(model_file), n = -1) # We can estimate the variance parameters if a trend map is supplied if (trend_model %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'CAR1')) { model_file <- model_file[ -grep('vector[num_basis] b_raw;', model_file, fixed = TRUE) ] model_file[grep("// raw basis coefficients", model_file, fixed = TRUE)] <- paste0( '// raw basis coefficients\n', 'vector[num_basis] b_raw;\n\n', '// latent factor SD terms\n', 'vector[n_lv] sigma;' ) model_file[grep( "// dynamic factor estimates", model_file, fixed = TRUE )] <- paste0( '// priors for factor SD parameters\n', 'sigma ~ exponential(2);\n', '// dynamic factor estimates' ) model_file[grep( "penalty = rep_vector(100.0, n_lv);", model_file, fixed = TRUE )] <- "penalty = 1.0 / (sigma .* sigma);" model_file[grep( "LV[1, 1:n_lv] ~ normal(0, 0.1);", model_file, fixed = TRUE )] <- 'LV[1, 1:n_lv] ~ normal(0, sigma);' model_file <- readLines(textConnection(model_file), n = -1) model_file <- gsub('j], 0.1', 'j], sigma[j]', model_file) } } if (trend_model == 'VAR1') { model_file[grep("// raw latent trends", model_file, fixed = TRUE)] <- "// dynamic factors" model_file[grep( "vector[n_series] trend_raw[n];", model_file, fixed = TRUE )] <- "vector[n_lv] LV[n];" model_file[grep( "// trend estimates in matrix-form", model_file, fixed = TRUE )] <- "// trends and dynamic factor loading matrix" model_file[grep("matrix[n, n_series] trend;", model_file, fixed = TRUE)] <- paste0("matrix[n, n_series] trend;\n", "matrix[n_series, n_lv] lv_coefs;") model_file <- readLines(textConnection(model_file), n = -1) model_file <- model_file[ -c( (grep( "trend[i, 1:n_series] = to_row_vector(trend_raw[i]);", model_file, fixed = TRUE ) - 1):(grep( "trend[i, 1:n_series] = to_row_vector(trend_raw[i]);", model_file, fixed = TRUE ) + 1) ) ] model_file[grep( "matrix[n_series, n_lv] lv_coefs;", model_file, fixed = TRUE )] <- paste0( "matrix[n_series, n_lv] lv_coefs;\n", "// derived latent trends\n", "lv_coefs = Z;\n", "for (i in 1:n){\n", "for (s in 1:n_series){\n", "trend[i, s] = dot_product(lv_coefs[s,], LV[i]);\n", "}\n", "}\n" ) model_file <- readLines(textConnection(model_file), n = -1) model_file <- gsub('trend_raw', 'LV', model_file) model_file[grep( "vector[n_series] sigma;", model_file, fixed = TRUE )] <- "vector[n_lv] sigma;" model_file[grep( "matrix[n_series, n_series] P_real;", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] P_real;" model_file[grep( "matrix[n_series, n_series] A;", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] A;" model_file[grep("vector[n_series] mu[n - 1];", model_file, fixed = TRUE)] <- "vector[n_lv] mu[n];" model_file[grep( "array[n] vector[n_series] mu;", model_file, fixed = TRUE )] <- "array[n] vector[n_lv] mu;" model_file[grep( "matrix[n_series, n_series] Sigma;", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] Sigma;" model_file[grep( "matrix[n_series, n_series] P[1];", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] P[1];" model_file[grep( "matrix[n_series, n_series] phiGamma[2, 1];", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] phiGamma[2, 1];" model_file[ grep( "diagonal(P_real) ~ normal(Pmu[1], 1 / sqrt(Pomega[1]));", model_file, fixed = TRUE ) + 1 ] <- "for(i in 1:n_lv) {" model_file[ grep( "diagonal(P_real) ~ normal(Pmu[1], 1 / sqrt(Pomega[1]));", model_file, fixed = TRUE ) + 2 ] <- "for(j in 1:n_lv) {" model_file[grep( "int n; // number of timepoints per series", model_file, fixed = TRUE )] <- paste0( "int n; // number of timepoints per series\n", "int n_lv; // number of dynamic factors" ) model_file <- readLines(textConnection(model_file), n = -1) if ( any(grepl( "matrix[n_series, n_series] L_Sigma;", model_file, fixed = TRUE )) ) { model_file[grep( "matrix[n_series, n_series] L_Sigma;", model_file, fixed = TRUE )] <- "matrix[n_lv, n_lv] L_Sigma;" model_file[grep( "cov_matrix[n_series] Sigma;", model_file, fixed = TRUE )] <- "cov_matrix[n_lv] Sigma;" model_file[grep( "cov_matrix[n_series] Gamma;", model_file, fixed = TRUE )] <- "cov_matrix[n_lv] Gamma;" model_file[grep( "cholesky_factor_corr[n_series] L_Omega;", model_file, fixed = TRUE )] <- "cholesky_factor_corr[n_lv] L_Omega;" model_file[grep( "vector[n_series] trend_zeros = rep_vector(0.0, n_series);", model_file, fixed = TRUE )] <- "vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);" model_file <- readLines(textConnection(model_file), n = -1) } } # Need to formulate the lv_coefs matrix and # supply it as data model_file[grep( "int n_series; // number of series", model_file, fixed = TRUE )] <- paste0( "int n_series; // number of series\n", "matrix[n_series, n_lv] Z; // matrix mapping series to latent trends" ) model_file <- readLines(textConnection(model_file), n = -1) # Z <- matrix(0, NCOL(ytimes), n_lv) # for(i in 1:NROW(trend_map)){ # Z[as.numeric(data_train$series)[trend_map$series[i]], # trend_map$trend[i]] <- 1 # } Z <- matrix(0, NCOL(ytimes), n_lv) for (i in 1:NROW(trend_map)) { rowid <- which(levels(data_train$series) == trend_map$series[i]) Z[rowid, trend_map$trend[i]] <- 1 } model_data$Z <- Z return(list(model_file = model_file, model_data = model_data)) } #### Modifications to Stan code for adding predictors to trend models #### #' @noRd add_trend_predictors = function( trend_formula, trend_knots, trend_map, trend_model, data_train, data_test, model_file, model_data, drop_trend_int = TRUE, drift = FALSE ) { #### Creating the trend mvgam model file and data structures #### if (trend_model == 'ZMVN') { trend_model <- 'RW' } # Replace any terms labelled 'trend' with 'series' for creating the necessary # structures trend_formula <- formula(paste( gsub('trend', 'series', as.character(trend_formula), fixed = TRUE), collapse = " " )) if (missing(trend_knots)) { trend_knots <- rlang::missing_arg() } # Drop any intercept from the formula if this is not an N-mixture model # or a trend_map was supplied, as the intercept will almost surely be unidentifiable if (drop_trend_int) { if (attr(terms(trend_formula), 'intercept') == 1) { trend_formula <- update(trend_formula, trend_y ~ . - 1) } else { trend_formula <- update(trend_formula, trend_y ~ .) } } else { trend_formula <- update(trend_formula, trend_y ~ .) } trend_train <- data_train trend_train$time <- trend_train$index..time..index trend_train$trend_y <- rnorm(length(trend_train$time)) # Add indicators of trend names as factor levels using the trend_map trend_indicators <- vector(length = length(trend_train$time)) for (i in 1:length(trend_train$time)) { trend_indicators[i] <- trend_map$trend[which( trend_map$series == trend_train$series[i] )] } trend_indicators <- factor( paste0('trend', trend_indicators), levels = paste0('trend', 1:max(trend_map$trend)) ) trend_train$series <- trend_indicators trend_train$y <- NULL # Only keep one time observation per trend data.frame( series = trend_train$series, time = trend_train$time, row_num = 1:length(trend_train$time) ) %>% dplyr::group_by(series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> inds_keep if (inherits(trend_train, 'list')) { trend_train <- lapply(trend_train, function(x) { if (is.matrix(x)) { matrix(x[inds_keep, ], ncol = NCOL(x)) } else { x[inds_keep] } }) } else { trend_train <- trend_train[inds_keep, ] } if (!is.null(data_test)) { # If newdata supplied, also create a fake design matrix # for the test data trend_test <- data_test trend_test$time <- trend_test$index..time..index trend_test$trend_y <- rnorm(length(trend_test$time)) trend_indicators <- vector(length = length(trend_test$time)) for (i in 1:length(trend_test$time)) { trend_indicators[i] <- trend_map$trend[which( trend_map$series == trend_test$series[i] )] } trend_indicators <- as.factor(paste0('trend', trend_indicators)) trend_test$series <- trend_indicators trend_test$y <- NULL data.frame( series = trend_test$series, time = trend_test$time, row_num = 1:length(trend_test$time) ) %>% dplyr::group_by(series, time) %>% dplyr::slice_head(n = 1) %>% dplyr::pull(row_num) -> inds_keep if (inherits(trend_test, 'list')) { trend_test <- lapply(trend_test, function(x) { if (is.matrix(x)) { matrix(x[inds_keep, ], ncol = NCOL(x)) } else { x[inds_keep] } }) } else { trend_test <- trend_test[inds_keep, ] } # Construct the model file and data structures for testing and training trend_mvgam <- mvgam( trend_formula, knots = trend_knots, data = trend_train, newdata = trend_test, family = gaussian(), trend_model = 'None', return_model_data = TRUE, run_model = FALSE, autoformat = FALSE, noncentred = FALSE ) } else { # Construct the model file and data structures for training only trend_mvgam <- mvgam( trend_formula, knots = trend_knots, data = trend_train, family = gaussian(), trend_model = 'None', return_model_data = TRUE, run_model = FALSE, autoformat = FALSE, noncentred = FALSE ) } trend_model_file <- trend_mvgam$model_file #### Modifying the model_file and model_data #### # Add lines for the raw trend basis coefficients model_file[grep("vector[num_basis] b_raw;", model_file, fixed = TRUE)] <- paste0("vector[num_basis] b_raw;\n", "vector[num_basis_trend] b_raw_trend;") # Add lines to data declarations for trend design matrix model_file[grep( "matrix[total_obs, num_basis] X; // mgcv GAM design matrix", model_file, fixed = TRUE )] <- paste0( "matrix[total_obs, num_basis] X; // mgcv GAM design matrix\n", "matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix" ) model_file[grep( "int num_basis; // total number of basis coefficients", model_file, fixed = TRUE )] <- paste0( "int num_basis; // total number of basis coefficients\n", "int num_basis_trend; // number of trend basis coefficients" ) model_file[grep( "int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)", model_file, fixed = TRUE )] <- paste0( "int ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)\n", "int ytimes_trend[n, n_lv]; // time-ordered matrix for latent trends" ) model_data$ytimes_trend <- trend_mvgam$model_data$ytimes model_data$num_basis_trend <- trend_mvgam$model_data$num_basis model_data$X_trend <- trend_mvgam$model_data$X # Update names to reflect process models rather than latent factors model_file[grep( "// trends and dynamic factor loading matrix", model_file, fixed = TRUE )] <- "// latent states and loading matrix" if (trend_model %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'CAR1')) { model_file[grep("// latent factor SD terms", model_file, fixed = TRUE)] <- "// latent state SD terms" model_file[grep( "// priors for factor SD parameters", model_file, fixed = TRUE )] <- "// priors for latent state SD parameters" } model_file[grep( "// derived latent trends", model_file, fixed = TRUE )] <- "// derived latent states" # Add beta_trend lines b_trend_lines <- trend_model_file[grep('b[', trend_model_file, fixed = TRUE)] b_trend_lines <- gsub('\\bb\\b', 'b_trend', b_trend_lines) b_trend_lines <- gsub('raw', 'raw_trend', b_trend_lines) b_trend_lines <- gsub('num_basis', 'num_basis_trend', b_trend_lines) b_trend_lines <- gsub('idx', 'trend_idx', b_trend_lines) b_trend_lines <- gsub('l_gp', 'l_gp_trend', b_trend_lines) b_trend_lines <- gsub('k_gp', 'k_gp_trend', b_trend_lines) b_trend_lines <- gsub('alpha_gp', 'alpha_gp_trend', b_trend_lines) b_trend_lines <- gsub('rho_gp', 'rho_gp_trend', b_trend_lines) b_trend_lines <- gsub('z_gp', 'z_gp_trend', b_trend_lines) model_file[grep("// derived latent states", model_file, fixed = TRUE)] <- paste0( '// process model basis coefficients\n', paste(b_trend_lines, collapse = '\n'), '\n\n// derived latent states' ) model_file[grep("vector[num_basis] b;", model_file, fixed = TRUE)] <- paste0("vector[num_basis] b;\n", "vector[num_basis_trend] b_trend;") b1_lines <- model_file[min(grep('b[1', model_file, fixed = TRUE))] model_file[min(grep('b[1', model_file, fixed = TRUE))] <- paste0('// observation model basis coefficients\n', b1_lines) model_file <- readLines(textConnection(model_file), n = -1) trend_smooths_included <- FALSE # Add any multinormal smooth lines if ( any(grepl('multi_normal_prec', trend_model_file)) | any(grepl('// priors for smoothing parameters', trend_model_file)) | any(grepl('// prior for gp', trend_model_file)) ) { trend_smooths_included <- TRUE # Replace any indices from trend model so names aren't # conflicting with any possible indices in the observation model if (any(grepl('idx', trend_model_file))) { trend_model_file <- gsub('idx', 'trend_idx', trend_model_file) idx_data <- trend_mvgam$model_data[grep( 'idx', names(trend_mvgam$model_data) )] names(idx_data) <- gsub('idx', 'trend_idx', names(idx_data)) model_data <- append(model_data, idx_data) idx_lines <- c( grep('int trend_idx', trend_model_file), grep('// gp basis coefficient indices', trend_model_file), grep('// monotonic basis coefficient indices', trend_model_file) ) model_file[min(grep('data {', model_file, fixed = TRUE))] <- paste0('data {\n', paste(trend_model_file[idx_lines], collapse = '\n')) model_file <- readLines(textConnection(model_file), n = -1) } # Check for gp() terms if ( any(grepl('l_gp', trend_model_file)) & any(grepl('k_gp', trend_model_file)) & any(grepl('z_gp', trend_model_file)) ) { # Add spd_cov functions if (any(grepl('spd_gp_exp_quad', trend_model_file, fixed = TRUE))) { model_file <- add_gp_spd_funs(model_file, kernel = 'exp_quad') } if (any(grepl('spd_gp_exponential', trend_model_file, fixed = TRUE))) { model_file <- add_gp_spd_funs(model_file, kernel = 'exponential') } if (any(grepl('spd_gp_matern32', trend_model_file, fixed = TRUE))) { model_file <- add_gp_spd_funs(model_file, kernel = 'matern32') } if (any(grepl('spd_gp_matern52', trend_model_file, fixed = TRUE))) { model_file <- add_gp_spd_funs(model_file, kernel = 'matern52') } # Update gp param names to include 'trend' trend_model_file <- gsub('l_gp', 'l_gp_trend', trend_model_file) trend_model_file <- gsub('k_gp', 'k_gp_trend', trend_model_file) trend_model_file <- gsub('alpha_gp', 'alpha_gp_trend', trend_model_file) trend_model_file <- gsub('rho_gp', 'rho_gp_trend', trend_model_file) trend_model_file <- gsub('z_gp', 'z_gp_trend', trend_model_file) idx_data <- trend_mvgam$model_data[grep( 'l_gp', names(trend_mvgam$model_data) )] names(idx_data) <- gsub('l_gp', 'l_gp_trend', names(idx_data)) model_data <- append(model_data, idx_data) l_lines <- grep( '// approximate gp eigenvalues', trend_model_file, fixed = TRUE ) model_file[min(grep('data {', model_file, fixed = TRUE))] <- paste0('data {\n', paste(trend_model_file[l_lines], collapse = '\n')) model_file <- readLines(textConnection(model_file), n = -1) } if (any(grepl('k_gp', trend_model_file))) { idx_data <- trend_mvgam$model_data[grep( 'k_gp', names(trend_mvgam$model_data) )] names(idx_data) <- gsub('k_gp', 'k_gp_trend', names(idx_data)) model_data <- append(model_data, idx_data) k_lines <- grep( '// basis functions for approximate gp', trend_model_file, fixed = TRUE ) model_file[min(grep('data {', model_file, fixed = TRUE))] <- paste0('data {\n', paste(trend_model_file[k_lines], collapse = '\n')) model_file <- readLines(textConnection(model_file), n = -1) # Update the parameters block with gp params start <- grep("// gp term sd parameters", trend_model_file, fixed = TRUE) end <- grep( "// gp term latent variables", trend_model_file, fixed = TRUE ) + 1 last <- end for (i in end:(end + 50)) { if (grepl('vector[k_gp_trend', trend_model_file[i], fixed = TRUE)) { last <- i } else { break } } gp_params <- paste(trend_model_file[start:last], collapse = '\n') model_file[min(grep('parameters {', model_file, fixed = TRUE))] <- paste0('parameters {\n', gp_params) model_file <- readLines(textConnection(model_file), n = -1) } if ( any(grepl( "int n_sp; // number of smoothing parameters", model_file, fixed = TRUE )) ) { model_file[grep( "int n_sp; // number of smoothing parameters", model_file, fixed = TRUE )] <- paste0( "int n_sp; // number of smoothing parameters\n", "int n_sp_trend; // number of trend smoothing parameters" ) } else { model_file[grep( "int n; // number of timepoints per series", model_file, fixed = TRUE )] <- paste0( "int n; // number of timepoints per series\n", "int n_sp_trend; // number of trend smoothing parameters" ) } model_data$n_sp_trend <- trend_mvgam$model_data$n_sp spline_coef_headers <- trend_model_file[ grep('multi_normal_prec', trend_model_file) - 1 ] if (any(grepl('normal(0, lambda', trend_model_file, fixed = TRUE))) { idx_headers <- trend_model_file[ grep('normal(0, lambda', trend_model_file, fixed = TRUE) - 1 ] spline_coef_headers <- c( spline_coef_headers, grep('//', idx_headers, value = TRUE) ) } if (any(grepl('// prior for gp', trend_model_file))) { spline_coef_headers <- c( spline_coef_headers, trend_model_file[grep( '// prior for gp', trend_model_file, fixed = TRUE )] ) } spline_coef_headers <- gsub( '...', '_trend...', spline_coef_headers, fixed = TRUE ) spline_coef_lines <- trend_model_file[grepl( 'multi_normal_prec', trend_model_file )] if (any(grepl('normal(0, lambda', trend_model_file, fixed = TRUE))) { lambda_normals <- (grep( 'normal(0, lambda', trend_model_file, fixed = TRUE )) for (i in 1:length(lambda_normals)) { spline_coef_lines <- c( spline_coef_lines, paste(trend_model_file[lambda_normals[i]], collapse = '\n') ) } } all_gp_prior_lines = function(model_file, prior_line, max_break = 10) { last <- prior_line + max_break for (i in prior_line:(prior_line + max_break)) { if (!grepl('b_raw[', model_file[i], fixed = TRUE)) {} else { last <- i break } } (prior_line + 1):last } if (any(grepl('// prior for gp', trend_model_file))) { starts <- grep('// prior for gp', trend_model_file, fixed = TRUE) ends <- grep('// prior for gp', trend_model_file, fixed = TRUE) + 4 for (i in seq_along(starts)) { spline_coef_lines <- c( spline_coef_lines, paste( trend_model_file[all_gp_prior_lines( trend_model_file, starts[i], max_break = 10 )], collapse = '\n' ) ) } } spline_coef_lines <- gsub('_raw', '_raw_trend', spline_coef_lines) spline_coef_lines <- gsub('lambda', 'lambda_trend', spline_coef_lines) spline_coef_lines <- gsub('zero', 'zero_trend', spline_coef_lines) spline_coef_lines <- gsub('S', 'S_trend', spline_coef_lines, fixed = TRUE) for (i in seq_along(spline_coef_lines)) { spline_coef_lines[i] <- paste0( spline_coef_headers[i], '\n', spline_coef_lines[i] ) } lambda_prior_line <- sub( 'lambda', 'lambda_trend', trend_model_file[grep('lambda ~', trend_model_file, fixed = TRUE)] ) lambda_param_line <- sub( 'lambda', 'lambda_trend', trend_model_file[grep( 'vector[n_sp] lambda;', trend_model_file, fixed = TRUE )] ) lambda_param_line <- sub('n_sp', 'n_sp_trend', lambda_param_line) if (any(grepl('// dynamic process models', model_file, fixed = TRUE))) { model_file[ grep('// dynamic process models', model_file, fixed = TRUE) + 1 ] <- paste0( model_file[ grep('// dynamic process models', model_file, fixed = TRUE) + 1 ], '\n', paste(spline_coef_lines, collapse = '\n'), '\n', lambda_prior_line, '\n' ) } else { if (trend_model != 'VAR1') { model_file[grep( "// dynamic factor estimates", model_file, fixed = TRUE )] <- paste0( '// dynamic process models\n', paste(spline_coef_lines, collapse = '\n'), '\n', lambda_prior_line ) } else { model_file[grep( '// stochastic latent trends', model_file, fixed = TRUE )] <- paste0( '// dynamic process models\n', paste(spline_coef_lines, collapse = '\n'), '\n', lambda_prior_line ) } } if (any(grepl("vector[n_sp] lambda;", model_file, fixed = TRUE))) { model_file[grep("// dynamic factors", model_file, fixed = TRUE)] <- "// latent states" model_file[grep( "vector[n_sp] lambda;", model_file, fixed = TRUE )] <- paste0( "vector[n_sp] lambda;\n", "vector[n_sp_trend] lambda_trend;" ) } else { if (trend_model != 'VAR1') { model_file <- model_file[ -grep("matrix[n, n_lv] LV;", model_file, fixed = TRUE) ] model_file[grep("// dynamic factors", model_file, fixed = TRUE)] <- paste0( "// latent states\n", "matrix[n, n_lv] LV;\n\n", "// smoothing parameters\n", "vector[n_sp_trend] lambda_trend;" ) } else { model_file <- model_file[ -grep("vector[n_lv] LV[n];", model_file, fixed = TRUE) ] model_file[grep("// dynamic factors", model_file, fixed = TRUE)] <- paste0( "// latent states\n", "vector[n_lv] LV[n];\n\n", "// smoothing parameters\n", "vector[n_sp_trend] lambda_trend;" ) } } if ( any(grepl('mgcv smooth penalty matrix', trend_model_file, fixed = TRUE)) ) { S_lines <- trend_model_file[grep( 'mgcv smooth penalty matrix', trend_model_file, fixed = TRUE )] S_lines <- gsub('S', 'S_trend', S_lines, fixed = TRUE) model_file[grep( "int n_nonmissing; // number of nonmissing observations", model_file, fixed = TRUE )] <- paste0( "int n_nonmissing; // number of nonmissing observations\n", paste(S_lines, collapse = '\n') ) # Pull out S matrices (don't always start at 1!) S_mats <- trend_mvgam$model_data[grepl( "S[0-9]", names(trend_mvgam$model_data) )] names(S_mats) <- gsub('S', 'S_trend', names(S_mats)) model_data <- append(model_data, S_mats) } if (!is.null(trend_mvgam$model_data$zero)) { model_file[grep( "int num_basis_trend; // number of trend basis coefficients", model_file, fixed = TRUE )] <- paste0( "int num_basis_trend; // number of trend basis coefficients\n", "vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients" ) model_data$zero_trend <- trend_mvgam$model_data$zero } if (any(grepl("vector[n_sp] rho;", model_file, fixed = TRUE))) { model_file[grep("vector[n_sp] rho;", model_file, fixed = TRUE)] <- paste0("vector[n_sp] rho;\n", "vector[n_sp_trend] rho_trend;") model_file[grep("rho = log(lambda);", model_file, fixed = TRUE)] <- paste0("rho = log(lambda);\n", "rho_trend = log(lambda_trend);") } else { model_file[grep("matrix[n, n_series] mus;", model_file, fixed = TRUE)] <- paste0("matrix[n, n_series] mus;\n", "vector[n_sp_trend] rho_trend;") model_file[grep("// posterior predictions", model_file, fixed = TRUE)] <- paste0("rho_trend = log(lambda_trend);\n\n", "// posterior predictions") } model_file <- readLines(textConnection(model_file), n = -1) } # Add any parametric effect beta lines if ( length(attr(trend_mvgam$mgcv_model$pterms, 'term.labels')) != 0L || attr(terms(trend_formula), 'intercept') == 1 ) { trend_parametrics <- TRUE smooth_labs <- do.call( rbind, lapply(seq_along(trend_mvgam$mgcv_model$smooth), function(x) { data.frame( label = trend_mvgam$mgcv_model$smooth[[x]]$label, term = paste(trend_mvgam$mgcv_model$smooth[[x]]$term, collapse = ','), class = class(trend_mvgam$mgcv_model$smooth[[x]])[1] ) }) ) lpmat <- predict( trend_mvgam$mgcv_model, type = 'lpmatrix', exclude = smooth_labs$label ) pindices <- which(apply(lpmat, 2, function(x) !all(x == 0)) == TRUE) pnames <- names(pindices) pnames <- gsub('series', 'trend', pnames) # pnames <- attr(trend_mvgam$mgcv_model$pterms, 'term.labels') # pindices <- colnames(attr(trend_mvgam$mgcv_model$terms, 'factors')) plines <- vector() for (i in seq_along(pnames)) { plines[i] <- paste0( '// prior for ', pnames[i], '_trend...', '\n', 'b_raw_trend[', pindices[i], '] ~ student_t(3, 0, 2);\n' ) } if (any(grepl('// dynamic process models', model_file, fixed = TRUE))) { model_file[grep("// dynamic process models", model_file, fixed = TRUE)] <- paste0( '// dynamic process models\n', paste0(paste(plines, collapse = '\n')) ) } else { if (any(grepl("// dynamic factor estimates", model_file, fixed = TRUE))) { model_file[grep( "// dynamic factor estimates", model_file, fixed = TRUE )] <- paste0( '// dynamic process models\n', paste0(paste(plines, collapse = '\n')) ) } if (any(grepl("// trend means", model_file, fixed = TRUE))) { model_file[grep("// trend means", model_file, fixed = TRUE)] <- paste0( '// dynamic process models\n', paste0(paste(plines, collapse = '\n'), '// trend means') ) } } } model_file <- readLines(textConnection(model_file), n = -1) # Add any random effect beta lines trend_random_included <- FALSE if (any(grepl('mu_raw[', trend_model_file, fixed = TRUE))) { trend_random_included <- TRUE smooth_labs <- do.call( rbind, lapply(seq_along(trend_mvgam$mgcv_model$smooth), function(x) { data.frame( label = trend_mvgam$mgcv_model$smooth[[x]]$label, first.para = trend_mvgam$mgcv_model$smooth[[x]]$first.para, last.para = trend_mvgam$mgcv_model$smooth[[x]]$last.para, class = class(trend_mvgam$mgcv_model$smooth[[x]])[1] ) }) ) random_inds <- vector() for (i in 1:NROW(smooth_labs)) { if (smooth_labs$class[i] == 'random.effect') { random_inds[i] <- paste0( smooth_labs$first.para[i], ':', smooth_labs$last.para[i] ) } } random_inds <- random_inds[!is.na(random_inds)] trend_rand_idxs <- unlist(lapply(seq_along(random_inds), function(x) { seq( as.numeric(sub("\\:.*", "", random_inds[x])), sub(".*\\:", "", random_inds[x]) ) })) model_data$trend_rand_idxs <- trend_rand_idxs model_file[grep( "int obs_ind[n_nonmissing]; // indices of nonmissing observations", model_file, fixed = TRUE )] <- paste0( "int obs_ind[n_nonmissing]; // indices of nonmissing observations\n", paste0( "int trend_rand_idxs[", length(trend_rand_idxs), ']; // trend random effect indices' ) ) random_param_lines <- trend_model_file[c( grep("// random effect variances", trend_model_file, fixed = TRUE) + 1, grep("// random effect means", trend_model_file, fixed = TRUE) + 1 )] random_param_lines <- gsub('raw', 'raw_trend', random_param_lines) model_file[grepws( "vector[num_basis_trend] b_raw_trend;", model_file, fixed = TRUE )] <- paste0( "vector[num_basis_trend] b_raw_trend;\n\n", "// trend random effects\n", paste(random_param_lines, collapse = '\n') ) if (trend_model %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'CAR1')) { model_file[grepws( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "sigma_raw_trend ~ exponential(0.5);\n", "mu_raw_trend ~ std_normal();\n", paste0("b_raw_trend[", 'trend_rand_idxs', "] ~ std_normal();\n"), "LV[1, 1:n_lv] ~ normal(0, sigma);" ) } if (trend_model == 'VAR1') { if ( any(grepl( "cholesky_factor_corr[n_lv] L_Omega;", model_file, fixed = TRUE )) ) { model_file[grep( "LV[1] ~ multi_normal(trend_zeros, Gamma);", model_file, fixed = TRUE )] <- paste0( "sigma_raw_trend ~ exponential(0.5);\n", "mu_raw_trend ~ std_normal();\n", paste0("b_raw_trend[", 'trend_rand_idxs', "] ~ std_normal();\n"), "LV[1] ~ multi_normal(trend_zeros, Gamma);" ) } else { model_file[grep( "LV[1] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "sigma_raw_trend ~ exponential(0.5);\n", "mu_raw_trend ~ std_normal();\n", paste0("b_raw_trend[", 'trend_rand_idxs', "] ~ std_normal();\n"), "LV[1] ~ normal(0, sigma);" ) } } model_file <- readLines(textConnection(model_file), n = -1) } # Update the trend model statements model_file[grep( "// latent states and loading matrix", model_file, fixed = TRUE )] <- paste0( "// latent states and loading matrix\n", "vector[n * n_lv] trend_mus;" ) model_file[grep("// derived latent states", model_file, fixed = TRUE)] <- paste0( "// latent process linear predictors\n", "trend_mus = X_trend * b_trend;\n\n", "// derived latent states" ) model_file <- readLines(textConnection(model_file), n = -1) #### Trend model specific updates #### if (trend_model == 'None') { model_file <- model_file[ -c( grep("for(j in 1:n_lv){", model_file, fixed = TRUE):(grep( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "for(i in 1:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]], sigma[j]);\n", "}\n}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'RW') { model_file <- model_file[ -c( grep("for(j in 1:n_lv){", model_file, fixed = TRUE):(grep( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] if (drift) { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "for(i in 2:n){\n", "LV[i, j] ~ normal(drift[j] * (i - 1) + trend_mus[ytimes_trend[i, j]] + LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]], sigma[j]);\n", "}\n}" ) } else { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "for(i in 2:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]], sigma[j]);\n", "}\n}" ) } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'CAR1') { model_file[grepws( '// latent factor AR1 terms', model_file, fixed = TRUE )] <- '// latent state AR1 terms' model_file <- model_file[ -c( grepws("for(j in 1:n_lv){", model_file, fixed = TRUE):(grepws( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] model_file[grepws( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "for(i in 2:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + pow(ar1[j], time_dis[i, j]) * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]), sigma[j]);\n", "}\n}" ) model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR1') { model_file[grep('// latent factor AR1 terms', model_file, fixed = TRUE)] <- '// latent state AR1 terms' model_file <- model_file[ -c( grep("for(j in 1:n_lv){", model_file, fixed = TRUE):(grep( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] if (drift) { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "for(i in 2:n){\n", "LV[i, j] ~ normal(drift[j] * (i - 1) + trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]), sigma[j]);\n", "}\n}" ) } else { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "for(i in 2:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]), sigma[j]);\n", "}\n}" ) } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR2') { model_file[grep('// latent factor AR1 terms', model_file, fixed = TRUE)] <- '// latent state AR1 terms' model_file[grep('// latent factor AR2 terms', model_file, fixed = TRUE)] <- '// latent state AR2 terms' model_file <- model_file[ -c( grep("for(j in 1:n_lv){", model_file, fixed = TRUE):(grep( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] if (drift) { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal([ytimes_trend[1, j]], sigma[j]);\n", "LV[2, j] ~ normal(drift[j] + trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "for(i in 3:n){\n", "LV[i, j] ~ normal(drift[j] * (i - 1) + trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]), sigma[j]);\n", "}\n}" ) model_file <- model_file[ -grep( "LV[2, 1:n_lv] ~ normal(drift + LV[1, 1:n_lv] * ar1, 0.1);", model_file, fixed = TRUE ) ] } else { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "LV[2, j] ~ normal(trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "for(i in 3:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]), sigma[j]);\n", "}\n}" ) model_file <- model_file[ -grep( "LV[2, 1:n_lv] ~ normal(LV[1, 1:n_lv] * ar1, 0.1);", model_file, fixed = TRUE ) ] } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'AR3') { model_file[grep('// latent factor AR1 terms', model_file, fixed = TRUE)] <- '// latent state AR1 terms' model_file[grep('// latent factor AR2 terms', model_file, fixed = TRUE)] <- '// latent state AR2 terms' model_file[grep('// latent factor AR3 terms', model_file, fixed = TRUE)] <- '// latent state AR3 terms' model_file <- model_file[ -c( grep("for(j in 1:n_lv){", model_file, fixed = TRUE):(grep( "for(j in 1:n_lv){", model_file, fixed = TRUE ) + 2) ) ] if (drift) { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal([ytimes_trend[1, j]], sigma[j]);\n", "LV[2, j] ~ normal(drift[j] + trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "LV[3, j] ~ normal(drift[j] * 2 + trend_mus[ytimes_trend[3, j]] + ar1[j] * (LV[2, j] - trend_mus[ytimes_trend[2, j]]) + ar2[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "for(i in 4:n){\n", "LV[i, j] ~ normal(drift[j] * (i - 1) + trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]) + ar3[j] * (LV[i - 3, j] - trend_mus[ytimes_trend[i - 3, j]]), sigma[j]);\n", "}\n}" ) model_file <- model_file[ -grep( "LV[2, 1:n_lv] ~ normal(drift + LV[1, 1:n_lv] * ar1, 0.1);", model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'LV[3, 1:n_lv] ~ normal(drift * 2 + LV[2, 1:n_lv] * ar1 + LV[1, 1:n_lv] * ar2, 0.1);', model_file, fixed = TRUE ) ] } else { model_file[grep( "LV[1, 1:n_lv] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- paste0( "for(j in 1:n_lv){\n", "LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]);\n", "LV[2, j] ~ normal(trend_mus[ytimes_trend[2, j]] + ar1[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "LV[3, j] ~ normal(trend_mus[ytimes_trend[3, j]] + ar1[j] * (LV[2, j] - trend_mus[ytimes_trend[2, j]]) + ar2[j] * (LV[1, j] - trend_mus[ytimes_trend[1, j]]), sigma[j]);\n", "for(i in 4:n){\n", "LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]) + ar2[j] * (LV[i - 2, j] - trend_mus[ytimes_trend[i - 2, j]]) + ar3[j] * (LV[i - 3, j] - trend_mus[ytimes_trend[i - 3, j]]), sigma[j]);\n", "}\n}" ) model_file <- model_file[ -grep( "LV[2, 1:n_lv] ~ normal(LV[1, 1:n_lv] * ar1, 0.1);", model_file, fixed = TRUE ) ] model_file <- model_file[ -grep( 'LV[3, 1:n_lv] ~ normal(LV[2, 1:n_lv] * ar1 + LV[1, 1:n_lv] * ar2, 0.1);', model_file, fixed = TRUE ) ] } model_file <- readLines(textConnection(model_file), n = -1) } if (trend_model == 'VAR1') { model_file <- gsub('trend means', 'latent state means', model_file) model_file[grep('mu[i - 1] = A * LV[i - 1];', model_file, fixed = TRUE)] <- 'mu[i] = A * (LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1:n_lv]]);' model_file[grep('vector[n_series] mu[n - 1];', model_file, fixed = TRUE)] <- "vector[n_series] mu[n];" if ( any(grepl( "cholesky_factor_corr[n_lv] L_Omega;", model_file, fixed = TRUE )) ) { model_file <- model_file[ -grep( "vector[n_lv] trend_zeros = rep_vector(0.0, n_lv);", model_file, fixed = TRUE ) ] model_file[grep( "LV[1] ~ multi_normal(trend_zeros, Gamma);", model_file, fixed = TRUE )] <- "LV[1] ~ multi_normal(trend_mus[ytimes_trend[1, 1:n_lv]], Gamma);" model_file[grep( "LV[i] ~ multi_normal_cholesky(mu[i - 1], L_Sigma);", model_file, fixed = TRUE )] <- "LV[i] ~ multi_normal_cholesky(trend_mus[ytimes_trend[i, 1:n_lv]] + mu[i], L_Sigma);" } else { model_file[grep("LV[1] ~ normal(0, sigma);", model_file, fixed = TRUE)] <- "LV[1] ~ normal(trend_mus[ytimes_trend[1, 1:n_lv]], sigma);" model_file[grep( "LV[i] ~ normal(mu[i - 1], sigma);", model_file, fixed = TRUE )] <- "LV[i] ~ normal(trend_mus[ytimes_trend[i, 1:n_lv]] + mu[i], sigma);" } model_file <- readLines(textConnection(model_file), n = -1) } model_file <- gsub('latent trend', 'latent state', model_file) # Any final tidying for trend_level terms model_file <- gsub('byseriestrend', 'bytrendtrend', model_file) model_file <- gsub(':seriestrend', ':trendtrend', model_file) names(model_data) <- gsub('byseriestrend', 'bytrendtrend', names(model_data)) names(model_data) <- gsub(':seriestrend', ':trendtrend', names(model_data)) names(trend_mvgam$mgcv_model$coefficients) <- gsub( 'byseriestrend', 'bytrendtrend', names(trend_mvgam$mgcv_model$coefficients) ) names(trend_mvgam$mgcv_model$coefficients) <- gsub( ':seriestrend', ':trendtrend', names(trend_mvgam$mgcv_model$coefficients) ) return(list( model_file = model_file, model_data = model_data, trend_mgcv_model = trend_mvgam$mgcv_model, trend_sp_names = trend_mvgam$sp_names, trend_smooths_included = trend_smooths_included, trend_random_included = trend_random_included )) } #### Stan diagnostic checks #### #' Check transitions that ended with a divergence #' @param fit A stanfit object #' @param quiet Logical (verbose or not?) #' @details Utility function written by Michael Betancourt (https://betanalpha.github.io/) #' @noRd check_div <- function(fit, quiet = FALSE, sampler_params) { if (missing(sampler_params)) { sampler_params <- rstan::get_sampler_params(fit, inc_warmup = FALSE) } divergent <- do.call(rbind, sampler_params)[, 'divergent__'] n = sum(divergent) N = length(divergent) if (round(100 * n / N, 4) > 2) { if (!quiet) { insight::print_color( sprintf( '\u2716 %s of %s iterations ended with a divergence (%s%%)\n', n, N, round(100 * n / N, 4) ), "bred" ) } insight::print_color( ' Try a larger adapt_delta to remove divergences\n', "bred" ) if (quiet) return(FALSE) } else { if (!quiet) { insight::print_color('\u2714', "green") cat(' No issues with divergences\n') } if (quiet) return(TRUE) } } #' Check transitions that ended prematurely due to maximum tree depth limit #' @param fit A stanfit object #' @param quiet Logical (verbose or not?) #' @details Utility function written by Michael Betancourt (https://betanalpha.github.io/) #' @noRd check_treedepth <- function( fit, max_depth = 10, quiet = FALSE, sampler_params ) { if (missing(sampler_params)) { sampler_params <- rstan::get_sampler_params(fit, inc_warmup = FALSE) } treedepths <- do.call(rbind, sampler_params)[, 'treedepth__'] n = length(treedepths[sapply(treedepths, function(x) x >= max_depth)]) N = length(treedepths) if (round(100 * n / N, 4) > 2) { if (!quiet) { insight::print_color( sprintf( '\u2716 %s of %s iterations saturated the maximum tree depth of %s (%s%%)\n', n, N, max_depth, round(100 * n / N, 4) ), "bred" ) } insight::print_color( ' Try a larger max_treedepth to avoid saturation\n', "bred" ) if (quiet) return(FALSE) } else { if (!quiet) { insight::print_color('\u2714', "green") cat(' No issues with maximum tree depth\n') } if (quiet) return(TRUE) } } #' Check the effective sample size per iteration #' @param fit A stanfit object #' @param quiet Logical (verbose or not?) #' @details Utility function written by Michael Betancourt (https://betanalpha.github.io/) #' @noRd check_n_eff <- function( fit, quiet = FALSE, fit_summary, ignore_b_trend = FALSE ) { if (missing(fit_summary)) { fit_summary <- rstan::summary(fit, probs = c(0.5))$summary } fit_summary <- fit_summary[-grep('ypred', rownames(fit_summary)), ] if (any(grep('LV', rownames(fit_summary)))) { fit_summary <- fit_summary[-grep('LV', rownames(fit_summary)), ] fit_summary <- fit_summary[-grep('lv_coefs', rownames(fit_summary)), ] if (any(grepl('L[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('L_diag[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L_diag[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('L_lower[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L_lower[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('LV_raw[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('LV_raw[', rownames(fit_summary), fixed = TRUE), ] } } if (ignore_b_trend) { if (any(grepl('_trend', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('_trend', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('trend_mus[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('trend_mus[', rownames(fit_summary), fixed = TRUE), ] } } iter <- dim(rstan::extract(fit)[[1]])[[1]] neffs <- fit_summary[, 'n_eff'] ratios <- neffs / iter no_warning <- TRUE if (min(ratios, na.rm = TRUE) < 0.001) { no_warning <- FALSE } if (no_warning) { if (!quiet) { insight::print_color('\u2714', "green") cat(' No issues with effective samples per iteration\n') } if (quiet) return(TRUE) } else { if (!quiet) { insight::print_color( paste0( '\u2716 n_eff / iter below 0.001 found for ', length(which(ratios < 0.001)), ' parameters\n Effective sample size is inaccurate for these parameters\n' ), "bred" ) } if (quiet) return(FALSE) } } #' Check the potential scale reduction factors #' @param fit A stanfit object #' @param quiet Logical (verbose or not?) #' @details Utility function written by Michael Betancourt (https://betanalpha.github.io/) #' @noRd check_rhat <- function( fit, quiet = FALSE, fit_summary, ignore_b_trend = FALSE ) { if (missing(fit_summary)) { fit_summary <- rstan::summary(fit, probs = c(0.5))$summary } fit_summary <- fit_summary[-grep('ypred', rownames(fit_summary)), ] if (any(grep('LV', rownames(fit_summary)))) { fit_summary <- fit_summary[-grep('LV', rownames(fit_summary)), ] fit_summary <- fit_summary[-grep('lv_coefs', rownames(fit_summary)), ] if (any(grepl('L[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('L_diag[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L_diag[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('L_lower[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('L_lower[', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('LV_raw[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('LV_raw[', rownames(fit_summary), fixed = TRUE), ] } } if (ignore_b_trend) { if (any(grepl('_trend', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('_trend', rownames(fit_summary), fixed = TRUE), ] } if (any(grepl('trend_mus[', rownames(fit_summary), fixed = TRUE))) { fit_summary <- fit_summary[ -grep('trend_mus[', rownames(fit_summary), fixed = TRUE), ] } } no_warning <- TRUE rhats <- fit_summary[, 'Rhat'] N = length(rhats[!is.na(rhats)]) n = length(which(rhats > 1.05)) if (round(100 * n / N, 4) > 2) { no_warning <- FALSE } if (no_warning) { if (!quiet) { insight::print_color('\u2714', "green") cat(' Rhat looks good for all parameters\n') } if (quiet) return(TRUE) } else { if (!quiet) { insight::print_color( paste0( '\u2716 Rhats above 1.05 found for some', ' parameters\n', ' Use pairs() and mcmc_plot() to investigate\n' ), "bred" ) } if (quiet) return(FALSE) } } #' Run all diagnostic checks #' @param fit A stanfit object #' @param quiet Logical (verbose or not?) #' @details Utility function written by Michael Betancourt (https://betanalpha.github.io/) #' @noRd check_all_diagnostics <- function( fit, max_treedepth = 10, ignore_b_trend = FALSE ) { sampler_params <- rstan::get_sampler_params(fit, inc_warmup = FALSE) fit_summary <- rstan::summary(fit, probs = c(0.5))$summary check_n_eff(fit, fit_summary = fit_summary, ignore_b_trend = ignore_b_trend) check_rhat(fit, fit_summary = fit_summary, ignore_b_trend = ignore_b_trend) check_div(fit, sampler_params = sampler_params) check_treedepth( fit, max_depth = max_treedepth, sampler_params = sampler_params ) } #' @noRd is_try_error = function(x) { inherits(x, "try-error") } #' evaluate an expression without printing output or messages #' @param expr expression to be evaluated #' @param type type of output to be suppressed (see ?sink) #' @param try wrap evaluation of expr in 'try' and #' not suppress outputs if evaluation fails? #' @param silent actually evaluate silently? #' @noRd eval_silent <- function( expr, type = "output", try = FALSE, silent = TRUE, ... ) { try <- as_one_logical(try) silent <- as_one_logical(silent) type <- match.arg(type, c("output", "message")) expr <- substitute(expr) envir <- parent.frame() if (silent) { if (try && type == "message") { try_out <- try(utils::capture.output( out <- eval(expr, envir), type = type, ... )) if (is_try_error(try_out)) { # try again without suppressing error messages out <- eval(expr, envir) } } else { utils::capture.output(out <- eval(expr, envir), type = type, ...) } } else { out <- eval(expr, envir) } out } #' @noRd nlist = function(...) { m <- match.call() dots <- list(...) no_names <- is.null(names(dots)) has_name <- if (no_names) FALSE else nzchar(names(dots)) if (all(has_name)) { return(dots) } nms <- as.character(m)[-1] if (no_names) { names(dots) <- nms } else { names(dots)[!has_name] <- nms[!has_name] } dots } #' @noRd `c<-` = function(x, value) { c(x, value) } #' @noRd grepws = function(pattern, x, fixed = TRUE, ...) { grep(trimws(tolower(pattern)), trimws(tolower(x)), fixed = fixed, ...) } ================================================ FILE: R/stationarise_VAR.R ================================================ #### Modifications to Stan code for stationary VAR1 processes #### # All functions and reparameterisations use code supplied generously by Sarah Heaps: # Heaps, Sarah E. "Enforcing stationarity through the prior in vector autoregressions." # Journal of Computational and Graphical Statistics (2022): 1-10. #' @noRd stationarise_VAR = function(model_file) { # Remove previous prior for VAR coefficients model_file <- model_file[ -c( (grep('// VAR coefficients', model_file, fixed = TRUE)):(grep( '// VAR coefficients', model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -c( (grep("// latent trend VAR1 terms", model_file, fixed = TRUE)):(grep( "// latent trend VAR1 terms", model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -grep("matrix[n_series, n_series] Sigma;", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("Sigma = diag_matrix(square(sigma));", model_file, fixed = TRUE) ] # Add Heaps' functions for constrained VAR1 process priors if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', '/* Function to compute the matrix square root */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix sqrtm(matrix A) {\n', 'int m = rows(A);\n', 'vector[m] root_root_evals = sqrt(sqrt(eigenvalues_sym(A)));\n', 'matrix[m, m] evecs = eigenvectors_sym(A);\n', 'matrix[m, m] eprod = diag_post_multiply(evecs, root_root_evals);\n', 'return tcrossprod(eprod);\n', '}\n', '/* Function to transform P_real to P */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix P_realtoP(matrix P_real) {\n', 'int m = rows(P_real);\n', 'matrix[m, m] B = tcrossprod(P_real);\n', 'for(i in 1:m) B[i, i] += 1.0;\n', 'return mdivide_left_spd(sqrtm(B), P_real);\n', '}\n', '/* Function to perform the reverse mapping*/\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix[,] rev_mapping(matrix[] P, matrix Sigma) {\n', 'int p = size(P);\n', 'int m = rows(Sigma);\n', 'matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];\n', 'matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];\n', 'matrix[m, m] S_for; matrix[m, m] S_rev;\n', 'matrix[m, m] S_for_list[p+1];\n', 'matrix[m, m] Gamma_trans[p+1];\n', 'matrix[m, m] phiGamma[2, p];\n', '// Step 1:\n', 'Sigma_for[p+1] = Sigma;\n', 'S_for_list[p+1] = sqrtm(Sigma);\n', 'for(s in 1:p) {\n', '// In this block of code S_rev is B^{-1} and S_for is a working matrix\n', 'S_for = - tcrossprod(P[p-s+1]);\n', 'for(i in 1:m) S_for[i, i] += 1.0;\n', 'S_rev = sqrtm(S_for);\n', 'S_for_list[p-s+1] = mdivide_right_spd(mdivide_left_spd(S_rev,\n', 'sqrtm(quad_form_sym(Sigma_for[p-s+2], S_rev))), S_rev);\n', 'Sigma_for[p-s+1] = tcrossprod(S_for_list[p-s+1]);\n', '}\n', '// Step 2:\n', 'Sigma_rev[1] = Sigma_for[1];\n', 'Gamma_trans[1] = Sigma_for[1];\n', 'for(s in 0:(p-1)) {\n', 'S_for = S_for_list[s+1];\n', 'S_rev = sqrtm(Sigma_rev[s+1]);\n', 'phi_for[s+1, s+1] = mdivide_right_spd(S_for * P[s+1], S_rev);\n', "phi_rev[s+1, s+1] = mdivide_right_spd(S_rev * P[s+1]', S_for);\n", 'Gamma_trans[s+2] = phi_for[s+1, s+1] * Sigma_rev[s+1];\n', 'if(s>=1) {\n', 'for(k in 1:s) {\n', 'phi_for[s+1, k] = phi_for[s, k] - phi_for[s+1, s+1] * phi_rev[s, s-k+1];\n', 'phi_rev[s+1, k] = phi_rev[s, k] - phi_rev[s+1, s+1] * phi_for[s, s-k+1];\n', '}\n', 'for(k in 1:s) Gamma_trans[s+2] = Gamma_trans[s+2] + phi_for[s, k] * Gamma_trans[s+2-k];\n', '}\n', "Sigma_rev[s+2] = Sigma_rev[s+1] - quad_form_sym(Sigma_for[s+1],phi_rev[s+1, s+1]');\n", '}\n', 'for(i in 1:p) phiGamma[1, i] = phi_for[p, i];\n', "for(i in 1:p) phiGamma[2, i] = Gamma_trans[i]';\n", 'return phiGamma;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', '/* Function to compute the matrix square root */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix sqrtm(matrix A) {\n', 'int m = rows(A);\n', 'vector[m] root_root_evals = sqrt(sqrt(eigenvalues_sym(A)));\n', 'matrix[m, m] evecs = eigenvectors_sym(A);\n', 'matrix[m, m] eprod = diag_post_multiply(evecs, root_root_evals);\n', 'return tcrossprod(eprod);\n', '}\n', '/* Function to transform P_real to P */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix P_realtoP(matrix P_real) {\n', 'int m = rows(P_real);\n', 'matrix[m, m] B = tcrossprod(P_real);\n', 'for(i in 1:m) B[i, i] += 1.0;\n', 'return mdivide_left_spd(sqrtm(B), P_real);\n', '}\n', '/* Function to perform the reverse mapping*/\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix[,] rev_mapping(matrix[] P, matrix Sigma) {\n', 'int p = size(P);\n', 'int m = rows(Sigma);\n', 'matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];\n', 'matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];\n', 'matrix[m, m] S_for; matrix[m, m] S_rev;\n', 'matrix[m, m] S_for_list[p+1];\n', 'matrix[m, m] Gamma_trans[p+1];\n', 'matrix[m, m] phiGamma[2, p];\n', '// Step 1:\n', 'Sigma_for[p+1] = Sigma;\n', 'S_for_list[p+1] = sqrtm(Sigma);\n', 'for(s in 1:p) {\n', '// In this block of code S_rev is B^{-1} and S_for is a working matrix\n', 'S_for = - tcrossprod(P[p-s+1]);\n', 'for(i in 1:m) S_for[i, i] += 1.0;\n', 'S_rev = sqrtm(S_for);\n', 'S_for_list[p-s+1] = mdivide_right_spd(mdivide_left_spd(S_rev,\n', 'sqrtm(quad_form_sym(Sigma_for[p-s+2], S_rev))), S_rev);\n', 'Sigma_for[p-s+1] = tcrossprod(S_for_list[p-s+1]);\n', '}\n', '// Step 2:\n', 'Sigma_rev[1] = Sigma_for[1];\n', 'Gamma_trans[1] = Sigma_for[1];\n', 'for(s in 0:(p-1)) {\n', 'S_for = S_for_list[s+1];\n', 'S_rev = sqrtm(Sigma_rev[s+1]);\n', 'phi_for[s+1, s+1] = mdivide_right_spd(S_for * P[s+1], S_rev);\n', "phi_rev[s+1, s+1] = mdivide_right_spd(S_rev * P[s+1]', S_for);\n", 'Gamma_trans[s+2] = phi_for[s+1, s+1] * Sigma_rev[s+1];\n', 'if(s>=1) {\n', 'for(k in 1:s) {\n', 'phi_for[s+1, k] = phi_for[s, k] - phi_for[s+1, s+1] * phi_rev[s, s-k+1];\n', 'phi_rev[s+1, k] = phi_rev[s, k] - phi_rev[s+1, s+1] * phi_for[s, s-k+1];\n', '}\n', 'for(k in 1:s) Gamma_trans[s+2] = Gamma_trans[s+2] + phi_for[s, k] * Gamma_trans[s+2-k];\n', '}\n', "Sigma_rev[s+2] = Sigma_rev[s+1] - quad_form_sym(Sigma_for[s+1],phi_rev[s+1, s+1]');\n", '}\n', 'for(i in 1:p) phiGamma[1, i] = phi_for[p, i];\n', "for(i in 1:p) phiGamma[2, i] = Gamma_trans[i]';\n", 'return phiGamma;\n', '}\n', '}' ) } model_file <- readLines(textConnection(model_file), n = -1) # Add transformed data lines for Heaps stationarity constraints if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '// exchangeable partial autocorrelation hyperparameters\n', '// see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)\n', 'vector[2] es;\n', 'vector[2] fs;\n', 'vector[2] gs;\n', 'vector[2] hs;\n', 'es[1] = 0;\nes[2] = 0;\n', 'fs[1] = sqrt(0.455);\nfs[2] = sqrt(0.455);\n', 'gs[1] = 1.365;\ngs[2] = 1.365;\n', 'hs[1] = 0.071175;\nhs[2] = 0.071175;\n' ) } else { params_line <- min(which(grepl('parameters {', model_file, fixed = TRUE))) model_file[params_line] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '// exchangeable partial autocorrelation hyperparameters\n', '// see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)\n', 'vector[2] es;\n', 'vector[2] fs;\n', 'vector[2] gs;\n', 'vector[2] hs;\n', 'es[1] = 0;\nes[2] = 0;\n', 'fs[1] = sqrt(0.455);\nfs[2] = sqrt(0.455);\n', 'gs[1] = 1.365;\ngs[2] = 1.365;\n', 'hs[1] = 0.071175;\nhs[2] = 0.071175;\n', '}\n\nparameters {' ) model_file <- readLines(textConnection(model_file), n = -1) } # Add parameters for real-valued partial autocorrelations model_file[grep( "vector[n_series] sigma;", model_file, fixed = TRUE )] <- paste0( "vector[n_series] sigma;\n\n", '// unconstrained VAR1 partial autocorrelations\n', 'matrix[n_series, n_series] P_real;\n', '// partial autocorrelation hyperparameters\n', 'vector[2] Pmu;\n', 'vector[2] Pomega;\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Add transformed parameters for partial autocorrelation reverse mapping model_file[grep( 'transformed parameters {', model_file, fixed = TRUE )] <- paste0( 'transformed parameters {\n', "// latent trend VAR1 autoregressive terms\n", "matrix[n_series, n_series] A;\n", '// stationary trend covariance\n', "matrix[n_series, n_series] Sigma;" ) model_file[ grep( 'trend[i, 1:n_series] = to_row_vector(trend_raw[i]);', model_file, fixed = TRUE ) + 1 ] <- paste0( '}\n\n', "Sigma = diag_matrix(square(sigma));\n", '// stationary VAR reparameterisation\n', '{\n', 'matrix[n_series, n_series] P[1];\n', 'matrix[n_series, n_series] phiGamma[2, 1];\n', 'P[1] = P_realtoP(P_real);\n', 'phiGamma = rev_mapping(P, Sigma);\n', 'A = phiGamma[1, 1];\n', '}' ) model_file <- readLines(textConnection(model_file), n = -1) # Add priors for partial autocorrelations model_file[grep("// trend means", model_file, fixed = TRUE)] <- paste0( '// partial autocorrelation hyperpriors\n', 'Pmu ~ normal(es, fs);\n', 'Pomega ~ gamma(gs, hs);\n', '// unconstrained partial autocorrelations\n', 'diagonal(P_real) ~ normal(Pmu[1], 1 / sqrt(Pomega[1]));\n', 'for(i in 1:n_series) {\n', 'for(j in 1:n_series) {\n', 'if(i != j) P_real[i, j] ~ normal(Pmu[2], 1 / sqrt(Pomega[2]));\n', '}\n', '}\n', "// trend means" ) model_file <- readLines(textConnection(model_file), n = -1) # Return the updated model file return(model_file) } #' Modifications for a VAR1 with possible contemporaneously correlated trend #' errors #' @noRd stationarise_VARcor = function(model_file) { # Remove previous prior for VAR coefficients model_file <- model_file[ -c( (grep('// VAR coefficients', model_file, fixed = TRUE)):(grep( '// VAR coefficients', model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -c( (grep("// latent trend VAR1 terms", model_file, fixed = TRUE)):(grep( "// latent trend VAR1 terms", model_file, fixed = TRUE ) + 1) ) ] model_file <- model_file[ -grep("matrix[n_series, n_series] Sigma;", model_file, fixed = TRUE) ] model_file <- model_file[ -grep("Sigma = diag_matrix(square(sigma));", model_file, fixed = TRUE) ] # Add Heaps' functions for constrained VAR1 process priors if (any(grepl('functions {', model_file, fixed = TRUE))) { model_file[grep('functions {', model_file, fixed = TRUE)] <- paste0( 'functions {\n', '/* Function to compute the matrix square root */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix sqrtm(matrix A) {\n', 'int m = rows(A);\n', 'vector[m] root_root_evals = sqrt(sqrt(eigenvalues_sym(A)));\n', 'matrix[m, m] evecs = eigenvectors_sym(A);\n', 'matrix[m, m] eprod = diag_post_multiply(evecs, root_root_evals);\n', 'return tcrossprod(eprod);\n', '}\n', '/* Function to transform P_real to P */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix P_realtoP(matrix P_real) {\n', 'int m = rows(P_real);\n', 'matrix[m, m] B = tcrossprod(P_real);\n', 'for(i in 1:m) B[i, i] += 1.0;\n', 'return mdivide_left_spd(sqrtm(B), P_real);\n', '}\n', '/* Function to perform the reverse mapping*/\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix[,] rev_mapping(matrix[] P, matrix Sigma) {\n', 'int p = size(P);\n', 'int m = rows(Sigma);\n', 'matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];\n', 'matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];\n', 'matrix[m, m] S_for; matrix[m, m] S_rev;\n', 'matrix[m, m] S_for_list[p+1];\n', 'matrix[m, m] Gamma_trans[p+1];\n', 'matrix[m, m] phiGamma[2, p];\n', '// Step 1:\n', 'Sigma_for[p+1] = Sigma;\n', 'S_for_list[p+1] = sqrtm(Sigma);\n', 'for(s in 1:p) {\n', '// In this block of code S_rev is B^{-1} and S_for is a working matrix\n', 'S_for = - tcrossprod(P[p-s+1]);\n', 'for(i in 1:m) S_for[i, i] += 1.0;\n', 'S_rev = sqrtm(S_for);\n', 'S_for_list[p-s+1] = mdivide_right_spd(mdivide_left_spd(S_rev,\n', 'sqrtm(quad_form_sym(Sigma_for[p-s+2], S_rev))), S_rev);\n', 'Sigma_for[p-s+1] = tcrossprod(S_for_list[p-s+1]);\n', '}\n', '// Step 2:\n', 'Sigma_rev[1] = Sigma_for[1];\n', 'Gamma_trans[1] = Sigma_for[1];\n', 'for(s in 0:(p-1)) {\n', 'S_for = S_for_list[s+1];\n', 'S_rev = sqrtm(Sigma_rev[s+1]);\n', 'phi_for[s+1, s+1] = mdivide_right_spd(S_for * P[s+1], S_rev);\n', "phi_rev[s+1, s+1] = mdivide_right_spd(S_rev * P[s+1]', S_for);\n", 'Gamma_trans[s+2] = phi_for[s+1, s+1] * Sigma_rev[s+1];\n', 'if(s>=1) {\n', 'for(k in 1:s) {\n', 'phi_for[s+1, k] = phi_for[s, k] - phi_for[s+1, s+1] * phi_rev[s, s-k+1];\n', 'phi_rev[s+1, k] = phi_rev[s, k] - phi_rev[s+1, s+1] * phi_for[s, s-k+1];\n', '}\n', 'for(k in 1:s) Gamma_trans[s+2] = Gamma_trans[s+2] + phi_for[s, k] * Gamma_trans[s+2-k];\n', '}\n', "Sigma_rev[s+2] = Sigma_rev[s+1] - quad_form_sym(Sigma_for[s+1],phi_rev[s+1, s+1]');\n", '}\n', 'for(i in 1:p) phiGamma[1, i] = phi_for[p, i];\n', "for(i in 1:p) phiGamma[2, i] = Gamma_trans[i]';\n", 'return phiGamma;\n', '}\n' ) } else { model_file[grep('Stan model code', model_file)] <- paste0( '// Stan model code generated by package mvgam\n', 'functions {\n', '/* Function to compute the matrix square root */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix sqrtm(matrix A) {\n', 'int m = rows(A);\n', 'vector[m] root_root_evals = sqrt(sqrt(eigenvalues_sym(A)));\n', 'matrix[m, m] evecs = eigenvectors_sym(A);\n', 'matrix[m, m] eprod = diag_post_multiply(evecs, root_root_evals);\n', 'return tcrossprod(eprod);\n', '}\n', '/* Function to transform P_real to P */\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix P_realtoP(matrix P_real) {\n', 'int m = rows(P_real);\n', 'matrix[m, m] B = tcrossprod(P_real);\n', 'for(i in 1:m) B[i, i] += 1.0;\n', 'return mdivide_left_spd(sqrtm(B), P_real);\n', '}\n', '/* Function to perform the reverse mapping*/\n', '/* see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)*/\n', 'matrix[,] rev_mapping(matrix[] P, matrix Sigma) {\n', 'int p = size(P);\n', 'int m = rows(Sigma);\n', 'matrix[m, m] phi_for[p, p]; matrix[m, m] phi_rev[p, p];\n', 'matrix[m, m] Sigma_for[p+1]; matrix[m, m] Sigma_rev[p+1];\n', 'matrix[m, m] S_for; matrix[m, m] S_rev;\n', 'matrix[m, m] S_for_list[p+1];\n', 'matrix[m, m] Gamma_trans[p+1];\n', 'matrix[m, m] phiGamma[2, p];\n', '// Step 1:\n', 'Sigma_for[p+1] = Sigma;\n', 'S_for_list[p+1] = sqrtm(Sigma);\n', 'for(s in 1:p) {\n', '// In this block of code S_rev is B^{-1} and S_for is a working matrix\n', 'S_for = - tcrossprod(P[p-s+1]);\n', 'for(i in 1:m) S_for[i, i] += 1.0;\n', 'S_rev = sqrtm(S_for);\n', 'S_for_list[p-s+1] = mdivide_right_spd(mdivide_left_spd(S_rev,\n', 'sqrtm(quad_form_sym(Sigma_for[p-s+2], S_rev))), S_rev);\n', 'Sigma_for[p-s+1] = tcrossprod(S_for_list[p-s+1]);\n', '}\n', '// Step 2:\n', 'Sigma_rev[1] = Sigma_for[1];\n', 'Gamma_trans[1] = Sigma_for[1];\n', 'for(s in 0:(p-1)) {\n', 'S_for = S_for_list[s+1];\n', 'S_rev = sqrtm(Sigma_rev[s+1]);\n', 'phi_for[s+1, s+1] = mdivide_right_spd(S_for * P[s+1], S_rev);\n', "phi_rev[s+1, s+1] = mdivide_right_spd(S_rev * P[s+1]', S_for);\n", 'Gamma_trans[s+2] = phi_for[s+1, s+1] * Sigma_rev[s+1];\n', 'if(s>=1) {\n', 'for(k in 1:s) {\n', 'phi_for[s+1, k] = phi_for[s, k] - phi_for[s+1, s+1] * phi_rev[s, s-k+1];\n', 'phi_rev[s+1, k] = phi_rev[s, k] - phi_rev[s+1, s+1] * phi_for[s, s-k+1];\n', '}\n', 'for(k in 1:s) Gamma_trans[s+2] = Gamma_trans[s+2] + phi_for[s, k] * Gamma_trans[s+2-k];\n', '}\n', "Sigma_rev[s+2] = Sigma_rev[s+1] - quad_form_sym(Sigma_for[s+1],phi_rev[s+1, s+1]');\n", '}\n', 'for(i in 1:p) phiGamma[1, i] = phi_for[p, i];\n', "for(i in 1:p) phiGamma[2, i] = Gamma_trans[i]';\n", 'return phiGamma;\n', '}\n', '}' ) } model_file <- readLines(textConnection(model_file), n = -1) # Add transformed data lines for Heaps stationarity constraints if (any(grepl('transformed data {', model_file, fixed = TRUE))) { model_file[grep('transformed data {', model_file, fixed = TRUE)] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '// exchangeable partial autocorrelation hyperparameters\n', '// see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)\n', 'vector[2] es;\n', 'vector[2] fs;\n', 'vector[2] gs;\n', 'vector[2] hs;\n', 'es[1] = 0;\nes[2] = 0;\n', 'fs[1] = sqrt(0.455);\nfs[2] = sqrt(0.455);\n', 'gs[1] = 1.365;\ngs[2] = 1.365;\n', 'hs[1] = 0.071175;\nhs[2] = 0.071175;\n' ) } else { params_line <- min(which(grepl('parameters {', model_file, fixed = TRUE))) model_file[params_line] <- paste0( 'transformed data {\n', 'vector[n_series] trend_zeros = rep_vector(0.0, n_series);\n', '// exchangeable partial autocorrelation hyperparameters\n', '// see Heaps 2022 for details (https://doi.org/10.1080/10618600.2022.2079648)\n', 'vector[2] es;\n', 'vector[2] fs;\n', 'vector[2] gs;\n', 'vector[2] hs;\n', 'es[1] = 0;\nes[2] = 0;\n', 'fs[1] = sqrt(0.455);\nfs[2] = sqrt(0.455);\n', 'gs[1] = 1.365;\ngs[2] = 1.365;\n', 'hs[1] = 0.071175;\nhs[2] = 0.071175;\n', '}\n\nparameters {' ) model_file <- readLines(textConnection(model_file), n = -1) } # Add parameters for real-valued partial autocorrelations model_file[grep( "vector[n_series] sigma;", model_file, fixed = TRUE )] <- paste0( "cholesky_factor_corr[n_series] L_Omega;\n", "vector[n_series] sigma;\n\n", '// unconstrained VAR1 partial autocorrelations\n', 'matrix[n_series, n_series] P_real;\n', '// partial autocorrelation hyperparameters\n', 'vector[2] Pmu;\n', 'vector[2] Pomega;\n' ) model_file <- readLines(textConnection(model_file), n = -1) # Add transformed parameters for partial autocorrelation reverse mapping model_file[grep( 'transformed parameters {', model_file, fixed = TRUE )] <- paste0( 'transformed parameters {\n', "// latent trend VAR1 autoregressive terms\n", "matrix[n_series, n_series] A;\n", '// LKJ form of covariance matrix\n', "matrix[n_series, n_series] L_Sigma;\n", '// computed error covariance matrix\n', 'cov_matrix[n_series] Sigma;\n', '// initial trend covariance\n', 'cov_matrix[n_series] Gamma;' ) model_file[ grep( 'trend[i, 1:n_series] = to_row_vector(trend_raw[i]);', model_file, fixed = TRUE ) + 1 ] <- paste0( '}\n\n', 'L_Sigma = diag_pre_multiply(sigma, L_Omega);\n', 'Sigma = multiply_lower_tri_self_transpose(L_Sigma);\n', '// stationary VAR reparameterisation\n', '{\n', 'matrix[n_series, n_series] P[1];\n', 'matrix[n_series, n_series] phiGamma[2, 1];\n', 'P[1] = P_realtoP(P_real);\n', 'phiGamma = rev_mapping(P, Sigma);\n', 'A = phiGamma[1, 1];\n', 'Gamma = phiGamma[2, 1];\n', '}' ) model_file <- readLines(textConnection(model_file), n = -1) # Add priors for partial autocorrelations model_file[grep("// trend means", model_file, fixed = TRUE)] <- paste0( '// LKJ error correlation prior\n', 'L_Omega ~ lkj_corr_cholesky(2);\n', '// partial autocorrelation hyperpriors\n', 'Pmu ~ normal(es, fs);\n', 'Pomega ~ gamma(gs, hs);\n', '// unconstrained partial autocorrelations\n', 'diagonal(P_real) ~ normal(Pmu[1], 1 / sqrt(Pomega[1]));\n', 'for(i in 1:n_series) {\n', 'for(j in 1:n_series) {\n', 'if(i != j) P_real[i, j] ~ normal(Pmu[2], 1 / sqrt(Pomega[2]));\n', '}\n', '}\n', "// trend means" ) model_file <- readLines(textConnection(model_file), n = -1) # Update prior for error SD parameters sigma model_file[ grep( "// priors for latent trend variance parameters", model_file, fixed = TRUE ) + 1 ] <- paste0( 'sigma ~ inv_gamma(2.3693353, 0.7311319);' ) model_file <- readLines(textConnection(model_file), n = -1) # Update trend model to use multinormal model_file[grep( "trend_raw[1] ~ normal(0, sigma);", model_file, fixed = TRUE )] <- "trend_raw[1] ~ multi_normal(trend_zeros, Gamma);" model_file[grep( "trend_raw[i] ~ normal(mu[i - 1], sigma);", model_file, fixed = TRUE )] <- "trend_raw[i] ~ multi_normal_cholesky(mu[i - 1], L_Sigma);" model_file <- gsub( 'contemporaneously uncorrelated', 'contemporaneously correlated', model_file ) model_file <- readLines(textConnection(model_file), n = -1) # Return the updated model file return(model_file) } ================================================ FILE: R/summary.mvgam.R ================================================ #' Summary for a fitted \pkg{mvgam} models #' #' These functions take a fitted \code{mvgam} or \code{jsdgam} object and #' return various useful summaries #' #' @importFrom stats printCoefmat #' #' @param object \code{list} object of class `mvgam` #' #' @param include_betas Logical. Print a summary that includes posterior #' summaries of all linear predictor beta coefficients (including spline #' coefficients)? Defaults to \code{TRUE} but use \code{FALSE} for a more #' concise summary #' #' @param smooth_test Logical. Compute estimated degrees of freedom and #' approximate p-values for smooth terms? Defaults to \code{TRUE}, but users #' may wish to set to \code{FALSE} for complex models with many smooth or #' random effect terms #' #' @param digits The number of significant digits for printing out the summary; #' defaults to \code{2}. #' #' @param ... Ignored #' #' @author Nicholas J Clark #' #' @details `summary.mvgam` and `summary.mvgam_prefit` return brief summaries of #' the model's call, along with posterior intervals for some of the key #' parameters in the model. Note that some smooths have extra penalties on the #' null space, so summaries for the \code{rho} parameters may include more #' penalty terms than the number of smooths in the original model formula. #' Approximate p-values for smooth terms are also returned, with methods used #' for their calculation following those used for `mgcv` equivalents (see #' \code{\link[mgcv]{summary.gam}} for details). The Estimated Degrees of #' Freedom (edf) for smooth terms is computed using either `edf.type = 1` for #' models with no trend component, or `edf.type = 0` for models with trend #' components. These are described in the documentation for #' \code{\link[mgcv]{jagam}}. Experiments suggest these p-values tend to be #' more conservative than those that might be returned from an equivalent model #' fit with \code{\link[mgcv]{summary.gam}} using `method = 'REML'` #' #' `coef.mvgam` returns either summaries or full posterior estimates for `GAM` #' component coefficients #' #' @return For `summary.mvgam`, an object of class \code{mvgam_summary} containing: #' \itemize{ #' \item \code{model_spec}: Model specification details (formulas, family, dimensions) #' \item \code{parameters}: Parameter estimates and significance tests #' \item \code{diagnostics}: MCMC convergence diagnostics #' \item \code{sampling_info}: Sampling algorithm details #' } #' #' For `summary.mvgam_prefit`, a \code{list} is printed on-screen showing #' the model specifications #' #' For `coef.mvgam`, either a \code{matrix} of posterior coefficient #' distributions (if \code{summarise == FALSE} or \code{data.frame} of #' coefficient summaries) #' @examples #' \dontrun{ #' simdat <- sim_mvgam(seasonality = "hierarchical") #' #' mod <- mvgam( #' y ~ series + #' s(season, bs = "cc", k = 6) + #' s(season, series, bs = "fs", k = 4), #' data = simdat$data_train, #' chains = 2, #' silent = 2 #' ) #' #' mod_summary <- summary(mod) #' mod_summary #' } #' @export summary.mvgam = function( object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ... ) { #### Some adjustments for cleaner summaries #### if ( attr(object$model_data, 'trend_model') == 'None' & object$use_lv & object$family != 'nmix' ) { attr(object$model_data, 'trend_model') <- 'RW' } variational <- object$algorithm %in% c('fullrank', 'meanfield', 'laplace', 'pathfinder') #### Smooth tests #### if (smooth_test) { if (inherits(object$trend_model, 'mvgam_trend')) { trend_model <- object$trend_model$label } else { trend_model <- object$trend_model } object$mgcv_model <- compute_edf( object$mgcv_model, object, 'rho', 'sigma_raw', conservative = trend_model == 'None' ) if (!is.null(object$trend_call) & !inherits(object, 'jsdgam')) { object$trend_mgcv_model <- compute_edf( object$trend_mgcv_model, object, 'rho_trend', 'sigma_raw_trend' ) } } #### Create structured summary object using extractors #### summary_obj <- structure( list( model_spec = extract_model_spec(object), parameters = extract_parameters( object, include_betas, smooth_test, digits, variational ), diagnostics = extract_diagnostics(object, digits, variational), sampling_info = extract_sampling_info(object) ), class = c("mvgam_summary", "list") ) return(summary_obj) } #' Print method for mvgam_summary objects #' #' @param x An object of class \code{mvgam_summary} #' @param ... Additional arguments (ignored) #' @return Invisibly returns the input object after printing #' @export print.mvgam_summary <- function(x, ...) { print_model_specification(x$model_spec) print_sampling_information(x$sampling_info) print_parameters(x$parameters) print_diagnostics(x$diagnostics) cat('\nUse how_to_cite() to get started describing this model') invisible(x) } #' Print model specification section #' @param model_spec Model specification from mvgam_summary #' @noRd print_model_specification <- function(model_spec) { # Print formulas if (!is.null(model_spec$formulas$process)) { cat("GAM observation formula:\n") print(model_spec$formulas$observation) cat("\nGAM process formula:\n") print(model_spec$formulas$process) } else { cat("GAM formula:\n") print(model_spec$formulas$observation) } # Print family and link cat("\nFamily:\n") cat(paste0(model_spec$family, '\n')) cat("\nLink function:\n") cat(paste0(model_spec$link, '\n')) # Print trend model if (!model_spec$is_jsdgam) { cat("\nTrend model:\n") if (is.call(model_spec$trend_model)) { print(model_spec$trend_model) } else { cat(paste0(model_spec$trend_model, '\n')) } } # Print latent variable info if (!is.null(model_spec$latent_variables)) { if (model_spec$latent_variables$type == "process_models") { cat("\nN process models:\n") cat(model_spec$latent_variables$count, '\n') } else { cat("\nN latent factors:\n") cat(model_spec$latent_variables$count, '\n') } } # Print dimensions if (model_spec$is_jsdgam) { cat('\nN species:\n') cat(model_spec$dimensions$n_species, '\n') cat('\nN sites:\n') cat(model_spec$dimensions$n_sites, '\n') } else { cat('\nN series:\n') cat(model_spec$dimensions$n_series, '\n') cat('\nN timepoints:\n') cat(model_spec$dimensions$n_timepoints, '\n') } # Print upper bounds if present if (!is.null(model_spec$upper_bounds)) { cat('\nUpper bounds:\n') cat(model_spec$upper_bounds, '\n') } } #' Print sampling information section #' @param sampling_info Sampling information from mvgam_summary #' @noRd print_sampling_information <- function(sampling_info) { cat('\nStatus:\n') if (sampling_info$fit_engine == 'jags') { cat('Fitted using JAGS', '\n') } else if (sampling_info$fit_engine == 'stan') { cat('Fitted using Stan', '\n') if (!is.null(sampling_info$chains)) { cat( sampling_info$chains, " chains, each with iter = ", sampling_info$iter, "; warmup = ", sampling_info$warmup, "; thin = ", sampling_info$thin, " \n", "Total post-warmup draws = ", sampling_info$total_draws, "\n", sep = '' ) } } } #' Print parameters section #' @param parameters Parameters from mvgam_summary #' @noRd print_parameters <- function(parameters) { # Print family parameters family_param_labels <- c( "observation_error", "log_observation_error", "observation_df", "observation_shape", "observation_precision", "observation_dispersion" ) for (label in family_param_labels) { if (!is.null(parameters[[label]])) { cat(paste0("\n", format_param_header(label), ":\n")) print(parameters[[label]]) } } # Print GAM coefficients if (!is.null(parameters$gam_coefficients)) { cat("\nGAM coefficient (beta) estimates:\n") print(parameters$gam_coefficients) } if (!is.null(parameters$gam_obs_coefficients)) { cat("\nGAM observation model coefficient (beta) estimates:\n") print(parameters$gam_obs_coefficients) } # Print group-level parameters if (!is.null(parameters$gam_group_level)) { cat("\nGAM group-level estimates:\n") print(parameters$gam_group_level) } if (!is.null(parameters$gam_obs_group_level)) { cat("\nGAM observation model group-level estimates:\n") print(parameters$gam_obs_group_level) } # Print GP parameters if (!is.null(parameters$gam_gp_parameters)) { cat( "\nGAM gp term marginal deviation (alpha) and length scale (rho) estimates:\n" ) print(parameters$gam_gp_parameters) } if (!is.null(parameters$gam_obs_gp_parameters)) { cat( "\nGAM observation model gp term marginal deviation (alpha) and length scale (rho) estimates:\n" ) print(parameters$gam_obs_gp_parameters) } # Print smooth tests if (!is.null(parameters$gam_smooth_tests)) { cat("\nApproximate significance of GAM smooths:\n") suppressWarnings(printCoefmat( parameters$gam_smooth_tests, digits = 4, signif.stars = getOption("show.signif.stars"), has.Pvalue = TRUE, na.print = "NA", cs.ind = 1 )) } if (!is.null(parameters$gam_obs_smooth_tests)) { cat("\nApproximate significance of GAM observation smooths:\n") suppressWarnings(printCoefmat( parameters$gam_obs_smooth_tests, digits = 4, signif.stars = getOption("show.signif.stars"), has.Pvalue = TRUE, na.print = "NA", cs.ind = 1 )) } # Print trend parameters (using labels from param_info) trend_param_patterns <- c( "drift_parameter", "standard_deviation", "precision_parameter", "autoregressive_coef", "var_coefficient", "marginal_deviation", "length_scale", "growth_rate", "offset_parameter" ) for (pattern in trend_param_patterns) { matching_params <- names(parameters)[grepl(pattern, names(parameters))] for (param_name in matching_params) { if (!is.null(parameters[[param_name]])) { cat(paste0("\n", format_trend_header(param_name), ":\n")) print(parameters[[param_name]]) } } } # Print hierarchical correlation if (!is.null(parameters$hierarchical_correlation)) { cat( "\nHierarchical correlation weighting parameter (alpha_cor) estimates:\n" ) print(parameters$hierarchical_correlation) } # Print trend GAM parameters if (!is.null(parameters$gam_process_coefficients)) { cat("\nGAM process model coefficient (beta) estimates:\n") print(parameters$gam_process_coefficients) } if (!is.null(parameters$gam_process_group_level)) { cat("\nGAM process model group-level estimates:\n") print(parameters$gam_process_group_level) } if (!is.null(parameters$gam_process_gp_parameters)) { cat( "\nGAM process model gp term marginal deviation (alpha) and length scale (rho) estimates:\n" ) print(parameters$gam_process_gp_parameters) } if (!is.null(parameters$gam_process_smooth_tests)) { cat("\nApproximate significance of GAM process smooths:\n") suppressWarnings(printCoefmat( parameters$gam_process_smooth_tests, digits = 4, signif.stars = getOption("show.signif.stars"), has.Pvalue = TRUE, na.print = "NA", cs.ind = 1 )) } } #' Print diagnostics section #' @param diagnostics Diagnostics from mvgam_summary #' @noRd print_diagnostics <- function(diagnostics) { if (diagnostics$fit_engine == 'stan' && diagnostics$algorithm == 'sampling') { if (diagnostics$stan_diagnostics_available) { cat('\nStan MCMC diagnostics:\n') if (!is.null(diagnostics$sampler_message)) { cat(insight::format_message(diagnostics$sampler_message, indent = "")) cat('\n') } } } else if (diagnostics$algorithm != 'sampling') { if (!is.null(diagnostics$message)) { cat(paste0('\n', diagnostics$message, '\n')) } } else if (diagnostics$fit_engine == 'jags') { cat('\nJAGS MCMC diagnostics:\n') if (!is.null(diagnostics$jags_diagnostics)) { if (diagnostics$jags_diagnostics$rhat_ok) { cat('\nRhat looks reasonable for all parameters\n') } else { cat( '\nRhats above 1.05 found for', diagnostics$jags_diagnostics$n_high_rhat, 'parameters\n* Use pairs() to investigate\n' ) } } } } #' Format parameter header for display #' @param label Parameter label #' @return Formatted header string #' @noRd format_param_header <- function(label) { switch( label, "observation_error" = "Observation error parameter estimates", "log_observation_error" = "log(observation error) parameter estimates", "observation_df" = "Observation df parameter estimates", "observation_shape" = "Observation shape parameter estimates", "observation_precision" = "Observation precision parameter estimates", "observation_dispersion" = "Observation dispersion parameter estimates", label # fallback ) } #' Format trend parameter header for display #' @param param_name Parameter name #' @return Formatted header string #' @noRd format_trend_header <- function(param_name) { # This would need more sophisticated logic to match the original formatting # For now, return a simple transformation gsub("_", " ", param_name) } #' @rdname summary.mvgam #' #' @export summary.mvgam_prefit = function(object, ...) { if (!is.null(object$trend_call)) { cat("\nGAM observation formula:\n") print(object$call) cat("\nGAM process formula:\n") print(object$trend_call) } else { cat("\nGAM formula:\n") print(object$call) } cat("\n\nFamily:\n") cat(paste0(object$family, '\n')) cat("\nLink function:\n") cat(paste0(family_links(object$family), '\n')) if (!inherits(object, 'jsdgam')) { cat("\nTrend model:\n") if (inherits(object$trend_model, 'mvgam_trend')) { print(object$trend_model$label) cat('\n') } else { cat(paste0(object$trend_model, '\n')) } } if (object$use_lv) { if (!is.null(object$trend_call)) { cat("\nN process models:\n") cat(object$n_lv, '\n') } else { cat("\nN latent factors:\n") cat(object$n_lv, '\n') } } if (inherits(object, 'jsdgam')) { cat('\nN species:\n') cat(NCOL(object$ytimes), '\n') } else { cat('\nN series:\n') cat(NCOL(object$ytimes), '\n') } if (inherits(object, 'jsdgam')) { cat('\nN sites:\n') cat(NROW(object$ytimes), '\n') } else { cat('\nN timepoints:\n') cat(NROW(object$ytimes), '\n') } cat('\nStatus:') cat('Not fitted', '\n') } #' @rdname summary.mvgam #' #' @export #' #' @title Extract mvgam beta coefficients from the GAM component #' #' @param object \code{list} object returned from \code{mvgam} #' #' @param summarise \code{logical}. Summaries of coefficients will be returned #' if \code{TRUE}. Otherwise the full posterior distribution will be returned #' #' @method coef mvgam #' #' @export coef.mvgam = function(object, summarise = TRUE, ...) { coef_names <- names(object$mgcv_model$coefficients) if (summarise) { mvgam_coefs <- mcmc_summary(object$model_output, 'b')[, c(3:7)] rownames(mvgam_coefs) <- coef_names } else { mvgam_coefs <- mcmc_chains(object$model_output, 'b') colnames(mvgam_coefs) <- coef_names } return(mvgam_coefs) } #' Extract a clean mcmc_summary table of params #' @param object An `mvgam` or `jsdgam` object #' @param params A string of parameters to extract #' @param digits The number of significant digits for printing out the summary #' @param variational Logical indicating whether a variational approximation was used #' @noRd clean_summary_table = function( object, params, digits = 2, variational = FALSE ) { mcmc_summary( object$model_output, params, ISB = TRUE, digits = digits, variational = variational )[, c(3:7)] } #' Calculate and return summary table for GP parameters #' @param object An `mvgam` or `jsdgam` object #' @param mgcv_model A `gam` object containing GP effects #' @param trend_effects Logical indicating whether this is a trend_mgcv_model #' @param digits The number of significant digits for printing out the summary #' @param variational Logical indicating whether a variational approximation was used #' @noRd gp_param_summary = function( object, mgcv_model, trend_effects = FALSE, digits = 2, variational = FALSE ) { # Extract GP name and isotropic information gp_names <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'name'), use.names = FALSE ) gp_isos <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'iso'), use.names = FALSE ) gp_dims <- unlist( purrr::map(attr(mgcv_model, 'gp_att_table'), 'dim'), use.names = FALSE ) # Create full list of rho parameter names full_names <- vector(mode = 'list', length = length(gp_names)) for (i in seq_len(length(gp_names))) { if (gp_isos[i]) { full_names[[i]] <- gp_names[i] } else { full_names[[i]] <- paste0(gp_names[i], '[', 1:gp_dims[i], ']') } } full_names <- unlist(full_names, use.names = FALSE) # Determine which parameters to extract if (trend_effects) { alpha_params <- gsub( 'gp_', 'gp_trend_', gsub( 'series', 'trend', paste0('alpha_', clean_gpnames(gp_names)), fixed = TRUE ), fixed = TRUE ) rho_params <- gsub( 'gp_', 'gp_trend_', gsub( 'series', 'trend', paste0('rho_', clean_gpnames(gp_names)), fixed = TRUE ), fixed = TRUE ) } else { alpha_params <- paste0('alpha_', clean_gpnames(gp_names)) rho_params <- paste0('rho_', clean_gpnames(gp_names)) } # Create summary tables alpha_summary <- clean_summary_table( object = object, params = alpha_params, digits = digits, variational = variational ) rownames(alpha_summary) <- paste0('alpha_', gp_names) rho_summary <- clean_summary_table( object = object, params = rho_params, digits = digits, variational = variational ) rownames(rho_summary) <- paste0('rho_', full_names) # Return as a list return(list(alpha_summary = alpha_summary, rho_summary = rho_summary)) } #' Extract model specification information from mvgam object #' @param object An mvgam object #' @return List containing model specification details #' @noRd extract_model_spec <- function(object) { # Extract formulas - always use same structure formulas <- list( observation = object$call, process = if (!is.null(object$trend_call)) object$trend_call else NULL ) # Extract trend model information if (!inherits(object, 'jsdgam')) { if (inherits(object$trend_model, 'mvgam_trend')) { trend_model <- object$trend_model$label } else { trend_model <- object$trend_model } } else { trend_model <- NULL } # Extract dimensions and counts if (object$use_lv) { if (!is.null(object$trend_call)) { lv_info <- list( type = "process_models", count = object$n_lv ) } else { lv_info <- list( type = "latent_factors", count = object$n_lv ) } } else { lv_info <- NULL } # Extract series/species and timepoints/sites information if (inherits(object, 'jsdgam')) { dimensions <- list( n_species = NCOL(object$ytimes), n_sites = NROW(object$ytimes) ) } else { dimensions <- list( n_series = NCOL(object$ytimes), n_timepoints = NROW(object$ytimes) ) } # Compile model specification model_spec <- list( formulas = formulas, family = object$family, link = family_links(object$family), trend_model = trend_model, upper_bounds = object$upper_bounds, latent_variables = lv_info, dimensions = dimensions, is_jsdgam = inherits(object, 'jsdgam') ) return(model_spec) } #' Extract sampling information from mvgam object #' @param object An mvgam object #' @return List containing sampling details #' @noRd extract_sampling_info <- function(object) { sampling_info <- list( fit_engine = object$fit_engine, algorithm = object$algorithm ) if (object$fit_engine == 'stan') { n_kept <- object$model_output@sim$n_save - object$model_output@sim$warmup2 sampling_info$chains <- object$model_output@sim$chains sampling_info$iter <- object$model_output@sim$iter sampling_info$warmup <- object$model_output@sim$warmup sampling_info$thin <- object$model_output@sim$thin sampling_info$total_draws <- sum(n_kept) if (object$algorithm == 'sampling') { sampler <- attr(object$model_output@sim$samples[[1]], "args")$sampler_t if (sampler == "NUTS(diag_e)") { sampler <- 'sampling(hmc)' } sampling_info$sampler <- sampler } } return(sampling_info) } #' Extract diagnostic information from mvgam object #' @param object An mvgam object #' @param digits Number of digits for summary statistics #' @param variational Logical indicating if variational approximation was used #' @return List containing diagnostic information #' @noRd extract_diagnostics <- function(object, digits = 2, variational = FALSE) { diagnostics <- list( fit_engine = object$fit_engine, algorithm = object$algorithm ) if (object$fit_engine == 'stan' & object$algorithm == 'sampling') { diagnostics$stan_diagnostics_available <- TRUE diagnostics$max_treedepth <- object$max_treedepth diagnostics$ignore_b_trend <- inherits(object, 'jsdgam') # Get sampler information for message sampler <- attr(object$model_output@sim$samples[[1]], "args")$sampler_t if (sampler == "NUTS(diag_e)") { sampler <- 'sampling(hmc)' } # Capture Stan diagnostic messages diag_output <- utils::capture.output({ check_all_diagnostics( object$model_output, max_treedepth = object$max_treedepth, ignore_b_trend = diagnostics$ignore_b_trend ) }) diagnostics$sampler_message <- c( diag_output, paste0( "\n", "Samples were drawn using ", sampler, ". For each parameter, n_eff is", " a crude measure of effective", " sample size, and Rhat is the", " potential scale reduction factor", " on split MCMC chains (at", " convergence, Rhat = 1)" ) ) } else if (object$algorithm != 'sampling') { diagnostics$message <- "Posterior approximation used: no diagnostics to compute" } else if (object$fit_engine == 'jags') { # Extract JAGS diagnostics rhats <- mcmc_summary( object$model_output, digits = digits, variational = variational )[, 6] diagnostics$jags_diagnostics <- list( rhats = rhats, n_high_rhat = length(which(rhats > 1.05)), rhat_ok = !any(rhats > 1.05) ) } return(diagnostics) } #' Helper function to extract parameter summary with consistent columns #' @param model_output MCMC output object #' @param param_name Parameter name to extract #' @param digits Number of digits #' @param variational Logical for variational approximation #' @param ISB Logical for ISB parameter (used for group-level effects) #' @return Parameter summary matrix #' @noRd extract_param_summary <- function( model_output, param_name, digits = 2, variational = FALSE, ISB = TRUE ) { mcmc_summary( model_output, param_name, ISB = ISB, digits = digits, variational = variational )[, c(3:7)] } #' Extract family-specific parameters #' @param object An mvgam object #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of family parameter summaries #' @noRd extract_family_parameters <- function(object, digits = 2, variational = FALSE) { family_info <- family_param_info(object$family) family_params <- list() if (length(family_info$param_names) > 0) { for (i in seq_along(family_info$param_names)) { param_name <- family_info$param_names[i] param_label <- family_info$labels[i] family_params[[param_label]] <- extract_param_summary( object$model_output, param_name, digits, variational ) } } return(family_params) } #' Extract GAM coefficient parameters #' @param object An mvgam object #' @param include_betas Logical to include all coefficients #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of coefficient summaries #' @noRd extract_gam_coefficients <- function( object, include_betas = TRUE, digits = 2, variational = FALSE ) { gam_params <- list() # Determine coefficient subset if (include_betas) { coef_indices <- seq_along(object$mgcv_model$coefficients) } else { coef_indices <- if (object$mgcv_model$nsdf > 0) { 1:object$mgcv_model$nsdf } else { integer(0) } } if (length(coef_indices) > 0) { coef_names <- names(object$mgcv_model$coefficients)[coef_indices] mvgam_coefs <- extract_param_summary( object$model_output, 'b', digits, variational ) if (nrow(mvgam_coefs) >= max(coef_indices)) { mvgam_coefs <- mvgam_coefs[coef_indices, , drop = FALSE] rownames(mvgam_coefs) <- coef_names # Choose appropriate label based on model structure coef_label <- if (!is.null(object$trend_call)) { "gam_obs_coefficients" } else { "gam_coefficients" } gam_params[[coef_label]] <- mvgam_coefs } } return(gam_params) } #' Extract parameter estimates from mvgam object #' @param object An mvgam object #' @param include_betas Logical, include all beta coefficients #' @param smooth_test Logical, compute significance tests for smooths #' @param digits Number of digits for summaries #' @param variational Logical indicating if variational approximation was used #' @return List containing all parameter estimates #' @noRd extract_parameters <- function( object, include_betas = TRUE, smooth_test = TRUE, digits = 2, variational = FALSE ) { parameters <- list() # Extract family-specific parameters parameters <- c( parameters, extract_family_parameters(object, digits, variational) ) # Extract GAM coefficients parameters <- c( parameters, extract_gam_coefficients(object, include_betas, digits, variational) ) # Extract remaining parameter types parameters <- c( parameters, extract_group_level_parameters(object, digits, variational) ) parameters <- c( parameters, extract_gp_parameters(object, digits, variational) ) parameters <- c(parameters, extract_smooth_tests(object, smooth_test, digits)) parameters <- c( parameters, extract_trend_parameters( object, include_betas, smooth_test, digits, variational ) ) return(parameters) } #' Extract group-level (random effect) parameters #' @param object An mvgam object #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of group-level parameter summaries #' @noRd extract_group_level_parameters <- function( object, digits = 2, variational = FALSE ) { group_params <- list() if (!all(is.na(object$sp_names))) { has_random_effects <- any(unlist(purrr::map( object$mgcv_model$smooth, inherits, 'random.effect' ))) if (has_random_effects) { re_labs <- unlist(lapply( purrr::map(object$mgcv_model$smooth, 'label'), paste, collapse = ',' ))[unlist(purrr::map( object$mgcv_model$smooth, inherits, 'random.effect' ))] re_sds <- extract_param_summary( object$model_output, 'sigma_raw', digits = digits, variational = variational, ISB = TRUE ) re_mus <- extract_param_summary( object$model_output, 'mu_raw', digits = digits, variational = variational, ISB = TRUE ) rownames(re_sds) <- paste0('sd(', re_labs, ')') rownames(re_mus) <- paste0('mean(', re_labs, ')') param_label <- if (!is.null(object$trend_call)) { "gam_obs_group_level" } else { "gam_group_level" } group_params[[param_label]] <- rbind(re_mus, re_sds) } } return(group_params) } #' Extract Gaussian Process parameters #' @param object An mvgam object #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of GP parameter summaries #' @noRd extract_gp_parameters <- function(object, digits = 2, variational = FALSE) { gp_params <- list() if (!is.null(attr(object$mgcv_model, 'gp_att_table'))) { gp_summaries <- gp_param_summary( object = object, mgcv_model = object$mgcv_model, digits = digits, variational = variational ) param_label <- if (!is.null(object$trend_call)) { "gam_obs_gp_parameters" } else { "gam_gp_parameters" } gp_params[[param_label]] <- rbind( gp_summaries$alpha_summary, gp_summaries$rho_summary ) } return(gp_params) } #' Extract smooth significance tests #' @param object An mvgam object #' @param smooth_test Logical to compute tests #' @param digits Number of digits #' @return List with smooth test results #' @noRd extract_smooth_tests <- function(object, smooth_test = TRUE, digits = 2) { smooth_params <- list() if (any(!is.na(object$sp_names)) & smooth_test) { gam_sig_table <- try( suppressWarnings(summary(object$mgcv_model)$s.table[, c(1, 2, 3, 4), drop = FALSE ]), silent = TRUE ) if (inherits(gam_sig_table, 'try-error')) { object$mgcv_model$R <- NULL gam_sig_table <- suppressWarnings(summary(object$mgcv_model)$s.table[, c(1, 2, 3, 4), drop = FALSE ]) gam_sig_table[, 2] <- unlist( purrr::map(object$mgcv_model$smooth, 'df'), use.names = FALSE ) } # Handle GP terms if (!is.null(attr(object$mgcv_model, 'gp_att_table'))) { gp_names <- unlist(purrr::map( attr(object$mgcv_model, 'gp_att_table'), 'name' )) if ( !all( rownames(gam_sig_table) %in% gsub('gp(', 's(', gp_names, fixed = TRUE) ) ) { gam_sig_table <- gam_sig_table[ !rownames(gam_sig_table) %in% gsub('gp(', 's(', gp_names, fixed = TRUE), , drop = FALSE ] } else { gam_sig_table <- NULL } } if (!is.null(gam_sig_table) && nrow(gam_sig_table) > 0) { param_label <- if (!is.null(object$trend_call)) { "gam_obs_smooth_tests" } else { "gam_smooth_tests" } smooth_params[[param_label]] <- gam_sig_table } } return(smooth_params) } #' Extract trend model parameters using param_info from trend objects #' @param object An mvgam object #' @param include_betas Logical to include all coefficients #' @param smooth_test Logical to compute smooth tests #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of trend parameter summaries #' @noRd extract_trend_parameters <- function( object, include_betas = TRUE, smooth_test = TRUE, digits = 2, variational = FALSE ) { trend_params <- list() # Get trend model information if (inherits(object$trend_model, 'mvgam_trend')) { trend_info <- attr(object$trend_model, 'param_info') if (!is.null(trend_info)) { # Extract parameters that are available in the model available_params <- get_available_trend_params( object, trend_info$param_names ) if (length(available_params) > 0) { for (i in seq_along(available_params)) { param_name <- available_params[i] param_label <- trend_info$labels[match( param_name, trend_info$param_names )] # Skip trend estimates as they're not summary statistics if (param_name == 'trend') { next } tryCatch( { param_summary <- extract_param_summary( object$model_output, param_name, digits, variational ) if (!is.null(param_summary) && nrow(param_summary) > 0) { trend_params[[param_label]] <- param_summary } }, error = function(e) { # Parameter not available in this model, skip silently } ) } } } } # Handle hierarchical correlation parameters if (grepl('hiercor', validate_trend_model(object$trend_model))) { tryCatch( { trend_params[["hierarchical_correlation"]] <- extract_param_summary( object$model_output, 'alpha_cor', digits, variational ) }, error = function(e) { # Parameter not available, skip } ) } # Process model coefficients (trend_call section) if (!is.null(object$trend_call) && !inherits(object, 'jsdgam')) { trend_params <- c( trend_params, extract_trend_gam_parameters( object, include_betas, smooth_test, digits, variational ) ) } return(trend_params) } #' Get available trend parameters for a given model #' @param object An mvgam object #' @param param_names Vector of parameter names from trend param_info #' @return Vector of available parameter names #' @noRd get_available_trend_params <- function(object, param_names) { available <- character(0) # Check which parameters are likely to exist based on model characteristics trend_model_attr <- attr(object$model_data, 'trend_model') for (param in param_names) { # Always include basic parameters that most trend models have if (param %in% c('sigma', 'tau', 'theta', 'Sigma', 'drift')) { available <- c(available, param) } # AR parameters - check if AR model if (grepl('^ar[0-9]', param) && grepl('^AR', trend_model_attr)) { available <- c(available, param) } # VAR parameters if (param == 'A' && trend_model_attr == 'VAR1') { available <- c(available, param) } # GP parameters if ( param %in% c('alpha_gp', 'rho_gp', 'b_gp') && trend_model_attr == 'GP' ) { available <- c(available, param) } # Piecewise parameters if ( param %in% c('k_trend', 'm_trend', 'delta_trend') && trend_model_attr %in% c('PWlinear', 'PWlogistic') ) { available <- c(available, param) } # Other specific parameters can be added as needed } return(available) } #' Extract trend GAM parameters (coefficients, group-level, GP, smooth tests) #' @param object An mvgam object #' @param include_betas Logical to include all coefficients #' @param smooth_test Logical to compute smooth tests #' @param digits Number of digits #' @param variational Logical for variational approximation #' @return List of trend GAM parameter summaries #' @noRd extract_trend_gam_parameters <- function( object, include_betas = TRUE, smooth_test = TRUE, digits = 2, variational = FALSE ) { trend_gam_params <- list() # Extract trend GAM coefficients if (include_betas) { coef_names <- paste0(names(object$trend_mgcv_model$coefficients), '_trend') mvgam_coefs <- extract_param_summary( object$model_output, 'b_trend', digits, variational ) rownames(mvgam_coefs) <- gsub('series', 'trend', coef_names, fixed = TRUE) trend_gam_params[["gam_process_coefficients"]] <- mvgam_coefs } else { if (object$trend_mgcv_model$nsdf > 0) { coefs_include <- 1:object$trend_mgcv_model$nsdf coef_names <- paste0( names(object$trend_mgcv_model$coefficients), '_trend' )[coefs_include] mvgam_coefs <- extract_param_summary( object$model_output, 'b_trend', digits, variational )[coefs_include, , drop = FALSE] rownames(mvgam_coefs) <- gsub('series', 'trend', coef_names, fixed = TRUE) trend_gam_params[["gam_process_coefficients"]] <- mvgam_coefs } } # Extract trend group-level parameters if (!all(is.na(object$trend_sp_names))) { has_random_effects <- any(unlist(purrr::map( object$trend_mgcv_model$smooth, inherits, 'random.effect' ))) if (has_random_effects) { re_labs <- unlist(lapply( purrr::map(object$trend_mgcv_model$smooth, 'label'), paste, collapse = ',' ))[unlist(purrr::map( object$trend_mgcv_model$smooth, inherits, 'random.effect' ))] re_labs <- gsub('series', 'trend', re_labs) re_sds <- extract_param_summary( object$model_output, 'sigma_raw_trend', digits = digits, variational = variational, ISB = TRUE ) re_mus <- extract_param_summary( object$model_output, 'mu_raw_trend', digits = digits, variational = variational, ISB = TRUE ) rownames(re_sds) <- paste0('sd(', re_labs, ')_trend') rownames(re_mus) <- paste0('mean(', re_labs, ')_trend') trend_gam_params[["gam_process_group_level"]] <- rbind(re_mus, re_sds) } } # Extract trend GP parameters if (!is.null(attr(object$trend_mgcv_model, 'gp_att_table'))) { gp_summaries <- gp_param_summary( object = object, mgcv_model = object$trend_mgcv_model, trend_effects = TRUE, digits = digits, variational = variational ) trend_gam_params[["gam_process_gp_parameters"]] <- rbind( gp_summaries$alpha_summary, gp_summaries$rho_summary ) } # Extract trend smooth tests if (any(!is.na(object$trend_sp_names)) && smooth_test) { gam_sig_table <- try( suppressWarnings(summary(object$trend_mgcv_model)$s.table[, c(1, 2, 3, 4), drop = FALSE ]), silent = TRUE ) if (inherits(gam_sig_table, 'try-error')) { object$trend_mgcv_model$R <- NULL gam_sig_table <- suppressWarnings(summary( object$trend_mgcv_model )$s.table[, c(1, 2, 3, 4), drop = FALSE]) gam_sig_table[, 2] <- unlist( purrr::map(object$trend_mgcv_model$smooth, 'df'), use.names = FALSE ) } # Handle trend GP terms if (!is.null(attr(object$trend_mgcv_model, 'gp_att_table'))) { gp_names <- unlist(purrr::map( attr(object$trend_mgcv_model, 'gp_att_table'), 'name' )) if ( !all( rownames(gam_sig_table) %in% gsub('gp(', 's(', gp_names, fixed = TRUE) ) ) { gam_sig_table <- gam_sig_table[ !rownames(gam_sig_table) %in% gsub('gp(', 's(', gp_names, fixed = TRUE), , drop = FALSE ] } else { gam_sig_table <- NULL } } if (!is.null(gam_sig_table) && nrow(gam_sig_table) > 0) { trend_gam_params[["gam_process_smooth_tests"]] <- gam_sig_table } } return(trend_gam_params) } ================================================ FILE: R/tidier_methods.R ================================================ #' @importFrom generics tidy #' @export generics::tidy #' @importFrom generics augment #' @export generics::augment #' Tidy an `mvgam` object's parameter posteriors #' #' Get parameters' posterior statistics, implementing the generic `tidy` from #' the package \pkg{broom}. #' #' The parameters are categorized by the column "type". For instance, the #' intercept of the observation model (i.e. the "formula" arg to `mvgam()`) has #' the "type" "observation_beta". The possible "type"s are: #' #' * observation_family_extra_param: any extra parameters for your observation #' model, e.g. sigma for a gaussian observation model. These parameters are #' not directly derived from the latent trend components (contrast to mu). #' #' * observation_beta: betas from your observation model, excluding any #' smooths. If your formula was `y ~ x1 + s(x2, bs='cr')`, then your #' intercept and `x1`'s beta would be categorized as this. #' #' * random_effect_group_level: Group-level random effects parameters, i.e. #' the mean and sd of the distribution from which the specific random #' intercepts/slopes are considered to be drawn from. #' #' * random_effect_beta: betas for the individual random intercepts/slopes. #' #' * trend_model_param: parameters from your `trend_model`. #' #' * trend_beta: analog of "observation_beta", but for any `trend_formula`. #' #' * trend_random_effect_group_level: analog of #' "random_effect_group_level", but for any `trend_formula`. #' #' * trend_random_effect_beta: analog of "random_effect_beta", but for any #' `trend_formula`. #' #' Additionally, GP terms can be incorporated in several ways, leading to #' different "type"s (or absence!): #' #' * `s(bs = "gp")`: No parameters returned. #' #' * `gp()` in `formula`: "type" of "observation_param". #' #' * `gp()` in `trend_formula`: "type" of "trend_formula_param". #' #' * `GP()` in `trend_model`: "type" of "trend_model_param". #' #' @param x An object of class `mvgam`. #' #' @param probs The desired probability levels of the parameters' posteriors. #' Defaults to `c(0.025, 0.5, 0.975)`, i.e. 2.5%, 50%, and 97.5%. #' #' @param ... Unused, included for generic consistency only. #' #' @returns A `tibble` containing: #' #' * "parameter": The parameter in question. #' #' * "type": The component of the model that the parameter belongs to (see #' details). #' #' * "mean": The posterior mean. #' #' * "sd": The posterior standard deviation. #' #' * percentile(s): Any percentiles of interest from these posteriors. #' #' @family tidiers #' #' @examples #' \dontrun{ #' set.seed(0) #' simdat <- sim_mvgam( #' T = 100, #' n_series = 3, #' trend_model = AR(), #' prop_trend = 0.75, #' family = gaussian() #' ) #' #' simdat$data_train$x <- rnorm(nrow(simdat$data_train)) #' simdat$data_train$year_fac <- factor(simdat$data_train$year) #' #' mod <- mvgam( #' y ~ -1 + s(time, by = series, bs = 'cr', k = 20) + x, #' trend_formula = ~ s(year_fac, bs = 're') - 1, #' trend_model = AR(cor = TRUE), #' family = gaussian(), #' data = simdat$data_train, #' silent = 2 #' ) #' #' tidy(mod, probs = c(0.2, 0.5, 0.8)) #' } #' #' @export tidy.mvgam <- function(x, probs = c(0.025, 0.5, 0.975), ...) { object <- x obj_vars <- variables(object) digits <- 2 # TODO: Let user change? partialized_mcmc_summary <- purrr::partial( mcmc_summary, object$model_output, ... = , ISB = FALSE, # Matches `x[i]`'s rather than `x`. probs = probs, digits = digits, Rhat = FALSE, n.eff = FALSE ) out <- tibble::tibble() # Helper to add parameter column from row names before tibble operations add_param_col <- function(df, aliases = NULL) { df$parameter <- if (!is.null(aliases)) aliases else row.names(df) row.names(df) <- NULL df } # Observation family extra parameters -------- xp_names_all <- obj_vars$observation_pars$orig_name # no matches -> length(xp_names) == 0, even if xp_names_all is NULL xp_names <- grep("vec", xp_names_all, value = TRUE, invert = TRUE) if (length(xp_names) > 0) { extra_params_out <- partialized_mcmc_summary(params = xp_names) extra_params_out <- add_param_col(extra_params_out) extra_params_out <- tibble::add_column( extra_params_out, type = "observation_family_extra_param", .before = 1 ) out <- dplyr::bind_rows(out, extra_params_out) } # END Observation family extra parameters # obs non-smoother betas -------- if (object$mgcv_model$nsdf > 0) { obs_beta_name_map <- dplyr::slice_head( obj_vars$observation_betas, n = object$mgcv_model$nsdf ) # df("orig_name", "alias") obs_betas_out <- partialized_mcmc_summary( params = obs_beta_name_map$orig_name ) obs_betas_out <- add_param_col(obs_betas_out, obs_beta_name_map$alias) obs_betas_out <- tibble::add_column( obs_betas_out, type = "observation_beta", .before = 1 ) out <- dplyr::bind_rows(out, obs_betas_out) } # END obs non-smoother betas # random effects -------- # TODO: names for random slopes re_param_name_map <- obj_vars$observation_re_params if (!is.null(re_param_name_map)) { re_params_out <- partialized_mcmc_summary( params = re_param_name_map$orig_name ) re_params_out <- add_param_col(re_params_out, re_param_name_map$alias) re_params_out <- tibble::add_column( re_params_out, type = "random_effect_group_level", .before = 1 ) out <- dplyr::bind_rows(out, re_params_out) # specific betas for (sp in object$mgcv_model$smooth) { if (inherits(sp, "random.effect")) { re_label <- sp$label betas_all <- obj_vars$observation_betas re_beta_idxs <- grep(re_label, betas_all$alias, fixed = TRUE) re_beta_name_map <- dplyr::slice(betas_all, re_beta_idxs) re_betas_out <- partialized_mcmc_summary( params = re_beta_name_map$orig_name ) re_betas_out <- add_param_col(re_betas_out, re_beta_name_map$alias) re_betas_out <- tibble::add_column( re_betas_out, type = "random_effect_beta", .before = 1 ) out <- dplyr::bind_rows(out, re_betas_out) } } } # END random effects # GPs -------- if (!is.null(obj_vars$trend_pars)) { tm_param_names_all <- obj_vars$trend_pars$orig_name gp_param_names <- grep( "^alpha_gp|^rho_gp", tm_param_names_all, value = TRUE ) if (length(gp_param_names) > 0) { gp_params_out <- partialized_mcmc_summary(params = gp_param_names) gp_params_out <- add_param_col(gp_params_out) # where is GP? can be in formula, trend_formula, or trend_model if (grepl("^(alpha|rho)_gp_trend", gp_param_names[[1]])) { param_type <- "trend_formula_param" } else if (grepl("^(alpha|rho)_gp_", gp_param_names[[1]])) { # hmph. param_type <- "observation_param" } else { param_type <- "trend_model_param" } gp_params_out <- tibble::add_column( gp_params_out, type = param_type, .before = 1 ) out <- dplyr::bind_rows(out, gp_params_out) } } # END GPs # RW, AR, CAR, VAR, ZMVN -------- # TODO: split out Sigma for heircor? trend_model_name <- ifelse( inherits(object$trend_model, "mvgam_trend"), object$trend_model$trend_model, object$trend_model ) # str vs called obj as arg to mvgam if (grepl("^VAR|^CAR|^AR|^RW|^ZMVN", trend_model_name)) { # theta = MA terms # alpha_cor = heirarchical corr term # A = VAR auto-regressive matrix # Sigma = correlated errors matrix # sigma = errors # setting up the params to extract if (trend_model_name == "VAR") { trend_model_params <- c("^A\\[", "^alpha_cor", "^theta", "^Sigma") } else if (grepl("^CAR|^AR|^RW", trend_model_name)) { cor <- inherits(object$trend_model, "mvgam_trend") && object$trend_model$cor sigma_name <- ifelse(cor, "^Sigma", "^sigma") trend_model_params <- c("^ar", "^alpha_cor", "^theta", sigma_name) } else if (grepl("^ZMVN", trend_model_name)) { trend_model_params <- c("^alpha_cor", "^Sigma") } # extracting the params trend_model_params <- paste(trend_model_params, collapse = "|") tm_param_names_all <- obj_vars$trend_pars$orig_name tm_param_names <- grep(trend_model_params, tm_param_names_all, value = TRUE) tm_params_out <- partialized_mcmc_summary(params = tm_param_names) tm_params_out <- add_param_col(tm_params_out) tm_params_out <- tibble::add_column( tm_params_out, type = "trend_model_param", .before = 1 ) out <- dplyr::bind_rows(out, tm_params_out) } # END RW, AR, CAR, VAR # 'None' trend_model with a trend_formula -------- if (trend_model_name == "None" && !is.null(object$trend_call)) { trend_pars_names_all <- obj_vars$trend_pars$orig_name trend_pars_names <- grep("sigma", trend_pars_names_all, value = TRUE) if (length(trend_pars_names) > 0) { trend_params_out <- partialized_mcmc_summary(params = trend_pars_names) trend_params_out <- add_param_col(trend_params_out) trend_params_out <- tibble::add_column( trend_params_out, type = "trend_model_param", .before = 1 ) out <- dplyr::bind_rows(out, trend_params_out) } } # END 'None' trend_model with a trend_formula # Piecewise -------- # TODO: potentially lump into AR section, above; how to handle change points? # to lump in, just add an # `else if (grepl("^PW", trend_model_name)`, then # `trend_model_params <- c("^k_trend", "^m_trend", "^delta_trend")` # and change initial grep(ar car var) call if (grepl("^PW", trend_model_name)) { trend_model_params <- "^k_trend|^m_trend|^delta_trend" tm_param_names_all <- obj_vars$trend_pars$orig_name tm_param_names <- grep(trend_model_params, tm_param_names_all, value = TRUE) tm_params_out <- partialized_mcmc_summary(params = tm_param_names) tm_params_out <- add_param_col(tm_params_out) tm_params_out <- tibble::add_column( tm_params_out, type = "trend_model_param", .before = 1 ) out <- dplyr::bind_rows(out, tm_params_out) } # END Piecewise # Trend formula betas -------- if (!is.null(object$trend_call) && object$trend_mgcv_model$nsdf > 0) { trend_beta_name_map <- dplyr::slice_head( obj_vars$trend_betas, n = object$trend_mgcv_model$nsdf ) # df("orig_name", "alias") trend_betas_out <- partialized_mcmc_summary( params = trend_beta_name_map$orig_name ) trend_betas_out <- add_param_col( trend_betas_out, trend_beta_name_map$alias ) trend_betas_out <- tibble::add_column( trend_betas_out, type = "trend_beta", .before = 1 ) out <- dplyr::bind_rows(out, trend_betas_out) } # END Trend formula betas # trend random effects -------- trend_re_param_name_map <- obj_vars$trend_re_params if (!is.null(trend_re_param_name_map)) { trend_re_params_out <- partialized_mcmc_summary( params = trend_re_param_name_map$orig_name ) trend_re_params_out <- add_param_col( trend_re_params_out, trend_re_param_name_map$alias ) trend_re_params_out <- tibble::add_column( trend_re_params_out, type = "trend_random_effect_group_level", .before = 1 ) out <- dplyr::bind_rows(out, trend_re_params_out) # specific betas for (sp in object$trend_mgcv_model$smooth) { if (inherits(sp, "random.effect")) { trend_re_label <- sp$label trend_betas_all <- obj_vars$trend_betas trend_re_beta_idxs <- grep( trend_re_label, trend_betas_all$alias, fixed = TRUE ) trend_re_beta_name_map <- dplyr::slice( trend_betas_all, trend_re_beta_idxs ) trend_re_betas_out <- partialized_mcmc_summary( params = trend_re_beta_name_map$orig_name ) trend_re_betas_out <- add_param_col( trend_re_betas_out, trend_re_beta_name_map$alias ) trend_re_betas_out <- tibble::add_column( trend_re_betas_out, type = "trend_random_effect_beta", .before = 1 ) out <- dplyr::bind_rows(out, trend_re_betas_out) } } } # END trend random effects # Cleanup output -------- # Reorder columns to put parameter first out <- out[c("parameter", setdiff(names(out), "parameter"))] # Split Sigma in case of hierarchical residual correlations alpha_cor_matches <- grep("alpha_cor", out$parameter, fixed = TRUE) if (length(alpha_cor_matches) > 0) { out <- split_hier_Sigma(object, out) } # END Cleanup output out } #' Helper function to split apart Sigma into its constituent sub-matrixes in #' the case of a hierarchical latent process. #' #' The default MCMC output has dummy parameters filling out Sigma to make it #' an nxn matrix. This removes those, and renames the remaining sub-matrixes #' to align with the `gr` and `subgr` sizes from `mvgam()`'s `trend_model` argument. #' #' @param object An object of class `mvgam`. #' #' @param params `tibble` The parameters that are going to be returned by #' `tidy.mvgam()`. Assumed that the columns match what `tidy.mvgam()` will return. #' Specifically, that there is a "parameter" column. #' #' @returns `tibble` The `params`, but with the Sigma parameters split up by `gr`. #' #' @noRd split_hier_Sigma <- function(object, params) { params_nonSigma <- dplyr::filter(params, !grepl("^Sigma", parameter)) params_Sigma <- dplyr::filter(params, grepl("^Sigma", parameter)) gr <- object$trend_model$gr subgr <- object$trend_model$subgr gr_levels <- levels(object$obs_data[[gr]]) subgr_levels <- levels(object$obs_data[[subgr]]) n_gr <- length(gr_levels) n_subgr <- length(subgr_levels) # anything besides the dummy params should have non-zero sd params_Sigma <- dplyr::filter(params_Sigma, mean != 0, sd != 0) index_strs <- sub("Sigma", "", params_Sigma$parameter)[1:(n_subgr**2)] # new names new_names <- paste0( "Sigma_", rep(seq_len(n_gr), each = n_subgr**2), index_strs ) params_Sigma["parameter"] <- new_names dplyr::bind_rows(params_nonSigma, params_Sigma) } #' Augment an `mvgam` object's data #' #' Add fits and residuals to the data, implementing the generic `augment` from #' the package \pkg{broom}. #' #' A `list` is returned if `class(x$obs_data) == 'list'`, otherwise a `tibble` #' is returned, but the contents of either object is the same. #' #' The arguments `robust` and `probs` are applied to both the fit and residuals #' calls (see [fitted.mvgam()] and [residuals.mvgam()] for details). #' #' @importFrom stats residuals #' #' @param x An object of class `mvgam`. #' #' @param robust If `FALSE` (the default) the mean is used as the measure of #' central tendency and the standard deviation as the measure of variability. #' If `TRUE`, the median and the median absolute deviation (MAD) are applied #' instead. #' #' @param probs The percentiles to be computed by the quantile function. #' #' @param ... Unused, included for generic consistency only. #' #' @returns A `list` or `tibble` (see details) combining: #' #' * The data supplied to `mvgam()`. #' #' * The outcome variable, named as `.observed`. #' #' * The fitted backcasts, along with their variability and credible bounds. #' #' * The residuals, along with their variability and credible bounds. #' #' @seealso #' \code{\link{residuals.mvgam}}, #' \code{\link{fitted.mvgam}} #' #' @family tidiers #' #' @examples #' \dontrun{ #' set.seed(0) #' dat <- sim_mvgam( #' T = 80, #' n_series = 3, #' mu = 2, #' trend_model = AR(p = 1), #' prop_missing = 0.1, #' prop_trend = 0.6 #' ) #' #' mod1 <- mvgam( #' formula = y ~ s(season, bs = 'cc', k = 6), #' data = dat$data_train, #' trend_model = AR(), #' family = poisson(), #' noncentred = TRUE, #' chains = 2, #' silent = 2 #' ) #' #' augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) #' } #' #' #' @export augment.mvgam <- function(x, robust = FALSE, probs = c(0.025, 0.975), ...) { obs_data <- x$obs_data obs_data$.observed <- obs_data$y obs_data <- purrr::discard_at( obs_data, c("index..orig..order", "index..time..index") ) resids <- residuals(x, robust = robust, probs = probs) %>% tibble::as_tibble() fits <- fitted(x, robust = robust, probs = probs) %>% tibble::as_tibble() hc_fits <- fits %>% dplyr::slice_head(n = NROW(resids)) # fits can include fcs colnames(resids) <- c( ".resid", ".resid.variability", ".resid.cred.low", ".resid.cred.high" ) colnames(hc_fits) <- c( ".fitted", ".fit.variability", ".fit.cred.low", ".fit.cred.high" ) augmented <- c(obs_data, hc_fits, resids) # coerces to list if (!identical(class(x$obs_data), "list")) { # data.frame augmented <- tibble::as_tibble(augmented) } augmented } ================================================ FILE: R/trends.R ================================================ #' Supported latent trend models in \pkg{mvgam} #' #' @importFrom utils tail #' @importFrom stats rnorm #' #' @details #' \code{mvgam} currently supports the following dynamic trend models: #' \itemize{ #' \item `None` (no latent trend component; i.e. the GAM component is all #' that contributes to the linear predictor, and the observation process is #' the only source of error; similar to what is estimated by #' \code{\link[mgcv]{gam}}) #' \item `ZMVN()` (zero-mean correlated errors, useful for modelling time #' series where no autoregressive terms are needed or for modelling data #' that are not sampled as time series) #' \item `RW()` #' \item `AR(p = 1, 2, or 3)` #' \item `CAR(p = 1)` (continuous time autoregressive trends; only available #' in \code{Stan}) #' \item `VAR()` (only available in \code{Stan}) #' \item `PW()` (piecewise linear or logistic trends; only available in #' \code{Stan}) #' \item `GP()` (Gaussian Process with squared exponential kernel; only #' available in \code{Stan}) #' } #' #' For most dynamic trend types available in `mvgam` (see argument #' `trend_model`), time should be measured in discrete, regularly spaced #' intervals (i.e. `c(1, 2, 3, ...)`). However, you can use irregularly spaced #' intervals if using `trend_model = CAR(1)`, though note that any temporal #' intervals that are exactly `0` will be adjusted to a very small number #' (`1e-12`) to prevent sampling errors. #' #' For all autoregressive trend types apart from `CAR()`, moving average and/or #' correlated process error terms can also be estimated (for example, #' `RW(cor = TRUE)` will set up a multivariate Random Walk if `data` contains #' `>1` series). Hierarchical process error correlations can also be handled if #' the data contain relevant observation units that are nested into relevant #' grouping and subgrouping levels (i.e. using #' `AR(gr = region, subgr = species)`). #' #' Note that only `RW`, `AR1`, `AR2` and `AR3` are available if using `JAGS`. #' All trend models are supported if using `Stan`. #' #' Dynamic factor models can be used in which the latent factors evolve as #' either `RW`, `AR1-3`, `VAR` or `GP`. For `VAR` models (i.e. `VAR` and #' `VARcor` models), users can either fix the trend error covariances to be `0` #' (using `VAR`) or estimate them and potentially allow for contemporaneously #' correlated errors using `VARcor`. #' #' For all `VAR` models, stationarity of the latent process is enforced through #' the prior using the parameterisation given by Heaps (2022). Stationarity is #' not enforced when using `AR1`, `AR2` or `AR3` models, though this can be #' changed by the user by specifying lower and upper bounds on autoregressive #' parameters using functionality in [get_mvgam_priors] and the `priors` #' argument in [mvgam]. #' #' Piecewise trends follow the formulation in the popular `prophet` package #' produced by `Facebook`, where users can allow for changepoints to control #' the potential flexibility of the trend. See Taylor and Letham (2018) for #' details. #' #' @seealso #' \code{\link{RW}}, #' \code{\link{AR}}, #' \code{\link{CAR}}, #' \code{\link{VAR}}, #' \code{\link{PW}}, #' \code{\link{GP}}, #' \code{\link{ZMVN}} #' #' @references #' Sarah E. Heaps (2022) Enforcing stationarity through the prior in Vector #' Autoregressions. Journal of Computational and Graphical Statistics. 32:1, #' 1–10. #' #' Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American #' Statistician 72.1, 37–45. #' #' @name mvgam_trends NULL #### Generic trend information #### #' @noRd trend_model_choices = function() { # Will make the commented out versions available soon c( "RW", "RWMA", "RWcor", "RWhiercor", "RWMAcor", "GP", 'AR1', 'AR1MA', 'AR1cor', 'AR1hiercor', 'AR1MAcor', 'AR2', 'AR2MA', 'AR2cor', 'AR2hiercor', 'AR2MAcor', 'AR3', 'AR3MA', 'AR3cor', 'AR3hiercor', 'AR3MAcor', 'CAR1', 'VAR', 'VARcor', 'VARhiercor', 'VAR1', 'VAR1cor', 'VAR1hiercor', 'VARMA', 'VARMAcor', 'VARMA1,1cor', 'PWlinear', 'PWlogistic', 'ZMVN', 'ZMVNcor', 'ZMVNhiercor', 'None' ) } # Additions needed for adding moving average / correlated process errors #' @noRd ma_cor_additions = function(trend_model) { use_var1 <- use_var1cor <- add_ma <- add_cor <- FALSE if (grepl('MA', trend_model, fixed = TRUE)) { add_ma <- TRUE } if (trend_model == 'RWMA') { trend_model <- 'RW' } if (trend_model == 'AR1MA') { trend_model <- 'AR1' } if (trend_model == 'AR2MA') { trend_model <- 'AR2' } if (trend_model == 'AR3MA') { trend_model <- 'AR3' } if (trend_model %in% c('RWcor', 'RWhiercor', 'RWMAcor')) { add_cor <- TRUE trend_model <- 'RW' } if (trend_model %in% c('ZMVNcor', 'ZMVNhiercor')) { add_cor <- TRUE trend_model <- 'ZMVN' } if (trend_model %in% c('AR1cor', 'AR1hiercor', 'AR1MAcor')) { add_cor <- TRUE trend_model <- 'AR1' } if (trend_model %in% c('AR2cor', 'AR2hiercor', 'AR2MAcor')) { add_cor <- TRUE trend_model <- 'AR2' } if (trend_model %in% c('AR3cor', 'AR3hiercor', 'AR3MAcor')) { add_cor <- TRUE trend_model <- 'AR3' } if (trend_model == 'VAR1') { use_var1 <- TRUE } if (trend_model %in% c('VAR1cor', 'VAR1hiercor', 'VARMA1,1cor')) { use_var1cor <- TRUE if (trend_model == 'VAR1hiercor') { add_cor <- TRUE } trend_model <- 'VAR1' } return(list( trend_model = trend_model, use_var1 = use_var1, use_var1cor = use_var1cor, add_ma = add_ma, add_cor = add_cor )) } #' Evaluate the piecewise linear function #' This code is borrowed from the {prophet} R package #' All credit goes directly to the prophet development team #' https://github.com/facebook/prophet/blob/main/R/R/prophet.R #' #' @param t Vector of times on which the function is evaluated #' #' @param deltas Vector of rate changes at each changepoint #' #' @param k Float initial rate #' #' @param m Float initial offset #' #' @param changepoint_ts Vector of changepoint times #' #' @return Vector y(t) #' #' @noRd piecewise_linear <- function(t, deltas, k, m = 0, changepoint_ts) { # Intercept changes gammas <- -changepoint_ts * deltas # Get cumulative slope and intercept at each t k_t <- rep(k, length(t)) m_t <- rep(m, length(t)) for (s in 1:length(changepoint_ts)) { indx <- t >= changepoint_ts[s] k_t[indx] <- k_t[indx] + deltas[s] m_t[indx] <- m_t[indx] + gammas[s] } y <- k_t * t + m_t return(y) } #' Evaluate the piecewise logistic function. #' This code is borrowed from the {prophet} R package #' All credit goes directly to the prophet development team #' https://github.com/facebook/prophet/blob/main/R/R/prophet.R #' @param t Vector of times on which the function is evaluated. #' @param cap Vector of capacities at each t. #' @param deltas Vector of rate changes at each changepoint. #' @param k Float initial rate. #' @param m Float initial offset. #' @param changepoint_ts Vector of changepoint times. #' #' @return Vector y(t). #' #' @noRd piecewise_logistic <- function(t, cap, deltas, k, m, changepoint_ts) { # Compute offset changes k.cum <- c(k, cumsum(deltas) + k) gammas <- rep(0, length(changepoint_ts)) for (i in 1:length(changepoint_ts)) { gammas[i] <- ((changepoint_ts[i] - m - sum(gammas)) * (1 - k.cum[i] / k.cum[i + 1])) } # Get cumulative rate and offset at each t k_t <- rep(k, length(t)) m_t <- rep(m, length(t)) for (s in 1:length(changepoint_ts)) { indx <- t >= changepoint_ts[s] k_t[indx] <- k_t[indx] + deltas[s] m_t[indx] <- m_t[indx] + gammas[s] } y <- cap / (1 + exp(-k_t * (t - m_t))) return(y) } #' Squared exponential GP simulation function #' @param last_trends Vector of trend estimates leading up to the current timepoint #' @param h \code{integer} specifying the forecast horizon #' @param rho_gp length scale parameter #' @param alpha_gp marginal variation parameter #' @noRd sim_gp = function(last_trends, h, rho_gp, alpha_gp) { t <- as.numeric(1:length(last_trends)) t_new <- as.numeric(1:(length(last_trends) + h)) Sigma_new <- alpha_gp^2 * exp(-0.5 * ((outer(t, t_new, "-") / rho_gp)^2)) Sigma_star <- alpha_gp^2 * exp(-0.5 * ((outer(t_new, t_new, "-") / rho_gp)^2)) + diag(1e-4, length(t_new)) Sigma <- alpha_gp^2 * exp(-0.5 * ((outer(t, t, "-") / rho_gp)^2)) + diag(1e-4, length(t)) as.vector( tail(t(Sigma_new) %*% solve(Sigma, last_trends), h) + tail( mvnfast::rmvn( 1, mu = rep(0, length(t_new)), sigma = Sigma_star - t(Sigma_new) %*% solve(Sigma, Sigma_new) )[1, ], h ) ) } #' AR3 simulation function #' @param last_trends Vector of trend estimates leading up to the current timepoint #' @param h \code{integer} specifying the forecast horizon #' @param drift drift parameter #' @param ar1 AR1 parameter #' @param ar2 AR2 parameter #' @param ar3 AR3 parameter #' @param tau precision parameter #' @param Xp_trend optional linear predictor matrix #' @param betas_trend optional coefficients associated with lp matrix #' @noRd sim_ar3 = function( drift = 0, ar1 = 1, ar2 = 0, ar3 = 0, tau = 1, Xp_trend = NULL, betas_trend = NULL, last_trends = rnorm(3), h = 50 ) { # Draw errors errors <- rnorm(h + 3, sd = sqrt(1 / tau)) # Prepare linear predictors (if necessary) if (!is.null(Xp_trend)) { linpreds <- c( rep(0, 3), as.vector( ((matrix(Xp_trend, ncol = NCOL(Xp_trend)) %*% betas_trend)) + attr(Xp_trend, 'model.offset') ) ) } else { linpreds <- rep(0, h + 3) } # Propagate the process ar3_recursC( drift = drift, ar1 = ar1, ar2 = ar2, ar3 = ar3, linpreds = linpreds, h = h, errors = errors, last_trends = tail(last_trends, 3) ) } #' VAR1 simulation function #' @noRd sim_var1 = function( drift, A, Sigma, last_trends, Xp_trend = NULL, betas_trend = NULL, h ) { if (NCOL(A) != NCOL(Sigma)) { stop( 'VAR coefficient matrix "A" and error matrix "Sigma" must have equal dimensions', call. = FALSE ) } if (NROW(A) != NROW(Sigma)) { stop( 'VAR coefficient matrix "A" and error matrix "Sigma" must have equal dimensions', call. = FALSE ) } if (missing(drift)) { drift <- rep(0, NROW(A)) } if (length(drift) != NROW(A)) { stop( 'Number of drift parameters must match number of rows in VAR coefficient matrix "A"', call. = FALSE ) } # Linear predictor, if supplied if (!is.null(Xp_trend)) { linpreds <- as.vector( ((matrix(Xp_trend, ncol = NCOL(Xp_trend)) %*% betas_trend)) + attr(Xp_trend, 'model.offset') ) linpreds <- matrix(linpreds, ncol = NROW(A), byrow = TRUE) linpreds <- rbind(rep(0, NROW(A)), linpreds) } else { linpreds <- matrix(0, nrow = h + 1, ncol = NROW(A)) } # Draw errors errors <- mvnfast::rmvn(h + 1, mu = rep(0, NROW(A)), sigma = Sigma) # Stochastic realisations var1_recursC( A = A, drift = drift, linpreds = linpreds, errors = errors, last_trends = last_trends, h = h ) } #' VARMA(1-3, 0-1) simulation function #' @noRd sim_varma = function( drift, A, A2, A3, theta, Sigma, last_trends, last_errors, Xp_trend = NULL, betas_trend = NULL, h ) { # Validate dimensions validate_equaldims(A, Sigma) validate_equaldims(A2, Sigma) validate_equaldims(A3, Sigma) validate_equaldims(theta, Sigma) if (NROW(last_trends) != 3) { stop('Last 3 state estimates are required, in matrix form', call. = FALSE) } if (NROW(last_errors) != 3) { stop('Last 3 error estimates are required, in matrix form', call. = FALSE) } if (missing(drift)) { drift <- rep(0, NROW(A)) } if (length(drift) != NROW(A)) { stop( 'Number of drift parameters must match number of rows in VAR coefficient matrix "A"', call. = FALSE ) } # Linear predictor, if supplied if (!is.null(Xp_trend)) { linpreds <- as.vector( ((matrix(Xp_trend, ncol = NCOL(Xp_trend)) %*% betas_trend)) + attr(Xp_trend, 'model.offset') ) linpreds <- matrix(linpreds, ncol = NROW(A), byrow = TRUE) if (NROW(linpreds) != h + 3) { stop( 'trend linear predictor matrix should be h + 3 rows in dimension', call. = FALSE ) } } else { linpreds <- matrix(0, nrow = h + 3, ncol = NROW(A)) } # Draw forecast errors errors <- rbind( last_errors, mvnfast::rmvn(h, mu = rep(0, NROW(A)), sigma = Sigma) ) # Stochastic realisations varma_recursC( A = A, A2 = A2, A3 = A3, theta = theta, drift = drift, linpreds = linpreds, errors = errors, last_trends = last_trends, h = h ) } #' Continuous time AR1 simulation function #' @noRd sim_corcar1 = function( drift, A, A2, A3, theta, Sigma, last_trends, last_errors, Xp_trend = NULL, betas_trend = NULL, h, time_dis ) { # Validate dimensions validate_equaldims(A, Sigma) validate_equaldims(A2, Sigma) validate_equaldims(A3, Sigma) validate_equaldims(theta, Sigma) if (NROW(last_trends) != 3) { stop('Last 3 state estimates are required, in matrix form', call. = FALSE) } if (NROW(last_errors) != 3) { stop('Last 3 error estimates are required, in matrix form', call. = FALSE) } if (missing(drift)) { drift <- rep(0, NROW(A)) } if (length(drift) != NROW(A)) { stop( 'Number of drift parameters must match number of rows in VAR coefficient matrix "A"', call. = FALSE ) } # Linear predictor, if supplied if (!is.null(Xp_trend)) { linpreds <- as.vector( ((matrix(Xp_trend, ncol = NCOL(Xp_trend)) %*% betas_trend)) + attr(Xp_trend, 'model.offset') ) linpreds <- matrix(linpreds, ncol = NROW(A), byrow = TRUE) if (NROW(linpreds) != h + 3) { stop( 'trend linear predictor matrix should be h + 3 rows in dimension', call. = FALSE ) } } else { linpreds <- matrix(0, nrow = h + 3, ncol = NROW(A)) } # Draw forecast errors errors <- rbind( last_errors, mvnfast::rmvn(h, mu = rep(0, NROW(A)), sigma = Sigma) ) # Stochastic realisations (will move to c++ eventually) d_A <- diag(A) states <- matrix(NA, nrow = h + 3, ncol = NCOL(A)) states[1, ] <- last_trends[1, ] states[2, ] <- last_trends[2, ] states[3, ] <- last_trends[3, ] for (t in 4:NROW(states)) { states[t, ] <- # autoregressive means (states[t - 1, ] - linpreds[t - 1, ]) %*% (A^time_dis[t - 3, ]) + # linear predictor contributions linpreds[t, ] + # drift terms drift + # stochastic errors errors[t, ] * (1 - d_A^(2 * time_dis[t - 3, ])) / (1 - d_A^2) } states[4:NROW(states), ] } #' Generic function to take outputs from different trend models #' and prepare the objects needed to propagate VARMA(3,1) processes #' so that only a single Rcpp function is needed for propagating #' autoregressive trends #' @noRd prep_varma_params = function( ar1, ar2, ar3, A, A2, A3, drift, theta, Sigma, tau, Xp_trend, betas_trend, last_trends, last_errors, h ) { # Construct Autoregressive matrices if (missing(A) & missing(A2) & missing(A3)) { A <- A2 <- A3 <- matrix(0, ncol = length(ar1), nrow = length(ar1)) diag(A) <- ar1 diag(A2) <- ar2 diag(A3) <- ar3 } if (missing(A2) & !missing(A)) { A2 <- matrix(0, ncol = NROW(A), nrow = NROW(A)) } if (missing(A3) & !missing(A)) { A3 <- matrix(0, ncol = NROW(A), nrow = NROW(A)) } # Drift terms if (missing(drift)) { drift <- rep(0, NROW(A)) } # Construct moving average matrix if (missing(theta)) { theta <- matrix(0, ncol = NROW(A), nrow = NROW(A)) } if (!is.matrix(theta)) { theta2 <- matrix(0, ncol = NROW(A), nrow = NROW(A)) diag(theta2) <- theta theta <- theta2 } # Construct covariance matrix if (missing(Sigma)) { Sigma <- matrix(0, ncol = NROW(A), nrow = NROW(A)) diag(Sigma) <- 1 / tau } # Construct last trend estimates if (!is.matrix(last_trends) | NROW(last_trends) == 1) { last_trends <- matrix(last_trends) } # Construct last error estimates if (missing(last_errors)) { last_errors <- matrix(0, ncol = NROW(A), nrow = 3) } if (!is.matrix(last_errors) | NROW(last_errors) == 1) { last_errors <- matrix(last_errors) } return(list( A = A, A2 = A2, A3 = A3, drift = drift, theta = theta, Sigma = Sigma, last_trends = last_trends, last_errors = last_errors, Xp_trend = Xp_trend, betas_trend = betas_trend, h = h )) } #' Simulate stationary VAR(p) phi matrices using the algorithm proposed by #' Ansley and Kohn (1986) #' @noRd stationary_VAR_phi <- function(p = 1, n_series = 3, ar_scale = 1) { stopifnot(ar_scale > 0) Id <- diag(nrow = n_series) all_P <- array(dim = c(n_series, n_series, p)) for (i in 1:p) { A <- matrix(rnorm(n_series * n_series, sd = ar_scale), nrow = n_series) # Enforce diagonal AR terms to be positive if this is # the first phi matrix if (i == 1) { diag(A) <- abs(diag(A)) } B <- t(chol(Id + tcrossprod(A, A))) all_P[,, i] <- solve(B, A) } all_phi <- array(dim = c(n_series, n_series, p, p)) all_phi_star <- array(dim = c(n_series, n_series, p, p)) # Set initial values L <- L_star <- Sigma <- Sigma_star <- Gamma <- Id # Recursion algorithm (Ansley and Kohn 1986, lemma 2.1) for (s in 0:(p - 1)) { all_phi[,, s + 1, s + 1] <- L %*% all_P[,, s + 1] %*% solve(L_star) all_phi_star[,, s + 1, s + 1] <- tcrossprod(L_star, all_P[,, s + 1]) %*% solve(L) if (s >= 1) { for (k in 1:s) { all_phi[,, s + 1, k] <- all_phi[,, s, k] - all_phi[,, s + 1, s + 1] %*% all_phi_star[,, s, s - k + 1] all_phi_star[,, s + 1, k] <- all_phi_star[,, s, k] - all_phi_star[,, s + 1, s + 1] %*% all_phi[,, s, s - k + 1] } } if (s < p - 1) { Sigma_next <- Sigma - all_phi[,, s + 1, s + 1] %*% tcrossprod(Sigma_star, all_phi[,, s + 1, s + 1]) if (s < p + 1) { Sigma_star <- Sigma_star - all_phi_star[,, s + 1, s + 1] %*% tcrossprod(Sigma, all_phi_star[,, s + 1, s + 1]) L_star <- t(chol(Sigma_star)) } Sigma <- Sigma_next L <- t(chol(Sigma)) } } out <- vector(mode = 'list') for (i in 1:p) { out[[i]] <- all_phi[,, i, i] } return(out) } #' Parameters to monitor / extract #' @noRd trend_par_names = function( trend_model, trend_map, use_lv = FALSE, drift = FALSE ) { # Check arguments trend_model <- validate_trend_model(trend_model, drift = drift, warn = FALSE) if (use_lv) { if (grepl('ZMVN', trend_model)) { param <- c( 'trend', 'LV', 'penalty', 'lv_coefs', 'theta', 'Sigma', 'error' ) } if (grepl('RW', trend_model)) { param <- c( 'trend', 'LV', 'penalty', 'lv_coefs', 'theta', 'Sigma', 'error' ) } if (grepl('AR1', trend_model)) { param <- c( 'trend', 'ar1', 'LV', 'penalty', 'lv_coefs', 'theta', 'Sigma', 'error' ) } if (grepl('AR2', trend_model)) { param <- c( 'trend', 'ar1', 'ar2', 'LV', 'penalty', 'lv_coefs', 'theta', 'Sigma', 'error' ) } if (grepl('AR3', trend_model)) { param <- c( 'trend', 'ar1', 'ar2', 'ar3', 'LV', 'penalty', 'lv_coefs', 'theta', 'Sigma', 'error' ) } if (trend_model == 'CAR1') { param <- c('trend', 'ar1', 'LV', 'penalty', 'lv_coefs', 'Sigma') } if (trend_model == 'GP') { param <- c('trend', 'alpha_gp', 'rho_gp', 'LV', 'lv_coefs', 'b_gp') } if (grepl('VAR', trend_model)) { param <- c( 'trend', 'A', 'Sigma', 'lv_coefs', 'LV', 'P_real', 'sigma', 'theta', 'error' ) } } if (!use_lv) { if (grepl('ZMVN', trend_model)) { param <- c('trend', 'tau', 'sigma', 'theta', 'Sigma', 'error') } if (grepl('RW', trend_model)) { param <- c('trend', 'tau', 'sigma', 'theta', 'Sigma', 'error') } if (grepl('AR1', trend_model)) { param <- c('trend', 'tau', 'sigma', 'ar1', 'theta', 'Sigma', 'error') } if (grepl('AR2', trend_model)) { param <- c( 'trend', 'tau', 'sigma', 'ar1', 'ar2', 'theta', 'Sigma', 'error' ) } if (grepl('AR3', trend_model)) { param <- c( 'trend', 'tau', 'sigma', 'ar1', 'ar2', 'ar3', 'theta', 'Sigma', 'error' ) } if (trend_model == 'CAR1') { param <- c('trend', 'tau', 'sigma', 'ar1', 'Sigma') } if (trend_model == 'GP') { param <- c('trend', 'alpha_gp', 'rho_gp', 'b_gp') } if (grepl('VAR', trend_model)) { param <- c('trend', 'A', 'Sigma', 'P_real', 'sigma', 'theta', 'error') } if (trend_model %in% c('PWlinear', 'PWlogistic')) { param <- c('trend', 'delta_trend', 'k_trend', 'm_trend') } } if (trend_model != 'None') { if (drift) { param <- c(param, 'drift') } } if (grepl('hiercor', trend_model)) { param <- c(param, 'alpha_cor') } if (trend_model == 'None') { param <- NULL } param } #' Extraction of particular parameters #' @noRd extract_trend_pars = function( object, keep_all_estimates = TRUE, ending_time = NULL ) { # Get names of parameters to extract pars_to_extract <- trend_par_names( trend_model = attr(object$model_data, 'trend_model'), trend_map = object$trend_map, use_lv = object$use_lv, drift = object$drift ) # Extract into a named list if (length(pars_to_extract) > 0) { out <- vector(mode = 'list') included <- vector(length = length(pars_to_extract)) for (i in 1:length(pars_to_extract)) { # Check if it can be extracted first suppressWarnings( estimates <- try( mcmc_chains(object$model_output, params = pars_to_extract[i]), silent = TRUE ) ) if (inherits(estimates, 'try-error')) { included[i] <- FALSE } else { included[i] <- TRUE out[[i]] <- estimates } } out <- out[included] names(out) <- pars_to_extract[included] } else { out <- list() } # delta params for piecewise trends if ( attr(object$model_data, 'trend_model') %in% c('PWlinear', 'PWlogistic') ) { out$delta_trend <- lapply( seq_along(levels(object$obs_data$series)), function(series) { if (object$fit_engine == 'stan') { delta_estimates <- mcmc_chains( object$model_output, 'delta_trend' )[, seq( series, dim(mcmc_chains(object$model_output, 'delta_trend'))[2], by = NCOL(object$ytimes) )] } else { delta_estimates <- mcmc_chains(object$model_output, 'delta_trend')[, starts[series]:ends[series] ] } } ) if (attr(object$model_data, 'trend_model') == 'PWlogistic') { out$cap <- lapply( seq_along(levels(object$obs_data$series)), function(series) { t(replicate( NROW(out$delta_trend[[1]]), object$trend_model$cap[, series] )) } ) } out$changepoints <- t(replicate( NROW(out$delta_trend[[1]]), object$trend_model$changepoints )) out$change_freq <- replicate( NROW(out$delta_trend[[1]]), object$trend_model$change_freq ) out$change_scale <- replicate( NROW(out$delta_trend[[1]]), object$trend_model$changepoint_scale ) } # Latent trend loadings for dynamic factor models if (object$use_lv) { if ( attr(object$model_data, 'trend_model') %in% c('RW', 'AR1', 'AR2', 'AR3', 'CAR1', 'ZMVN') ) { # Just due to legacy reasons from working in JAGS, the simulation # functions use precision (tau) rather than SD (sigma) out$tau <- mcmc_chains(object$model_output, 'penalty') out$penalty <- NULL } n_series <- NCOL(object$ytimes) n_lv <- object$n_lv out$lv_coefs <- lapply(seq_len(n_series), function(series) { if (object$fit_engine == 'stan') { coef_start <- min(which(sort(rep(1:n_series, n_lv)) == series)) coef_end <- coef_start + n_lv - 1 as.matrix(mcmc_chains(object$model_output, 'lv_coefs')[, coef_start:coef_end ]) } else { lv_indices <- seq(1, n_series * n_lv, by = n_series) + (series - 1) as.matrix(mcmc_chains(object$model_output, 'lv_coefs')[, lv_indices]) } }) } else { if ( attr(object$model_data, 'trend_model') %in% c('RW', 'AR1', 'AR2', 'AR3', 'CAR1') ) { out$sigma <- NULL } } if (!keep_all_estimates) { #### Extract last xxx timepoints of latent trends for propagating forecasts # forward #### # Latent trend estimates for dynamic factor models if (object$use_lv) { n_lv <- object$n_lv if (object$fit_engine == 'stan') { out$last_lvs <- lapply(seq_len(n_lv), function(lv) { inds_lv <- seq(lv, dim(out$LV)[2], by = n_lv) lv_estimates <- out$LV[, inds_lv] # Need to only use estimates from the training period if (inherits(object$obs_data, 'list')) { end_train <- data.frame( y = object$obs_data$y, series = object$obs_data$series, time = object$obs_data$time ) %>% dplyr::filter(series == !!(levels(object$obs_data$series)[1])) %>% NROW() } else { end_train <- object$obs_data %>% dplyr::filter(series == !!(levels(object$obs_data$series)[1])) %>% NROW() } if (attr(object$model_data, 'trend_model') == 'GP') { if (!is.null(ending_time)) { lv_estimates <- lv_estimates[, 1:ending_time] } else { lv_estimates <- lv_estimates[, 1:end_train] } } else { if (!is.null(ending_time)) { lv_estimates <- lv_estimates[, 1:ending_time] } else { lv_estimates <- lv_estimates[, (NCOL(lv_estimates) - 2):(NCOL(lv_estimates)) ] } } lv_estimates }) } else { ends <- seq(0, dim(out$LV)[2], length.out = n_lv + 1) starts <- ends + 1 starts <- c(1, starts[-c(1, (n_lv + 1))]) ends <- ends[-1] out$last_lvs <- lapply(seq_len(n_lv), function(lv) { lv_estimates <- out$LV[, starts[lv]:ends[lv]] # Need to only use estimates from the training period if (class(object$obs_data)[1] == 'list') { end_train <- data.frame( y = object$obs_data$y, series = object$obs_data$series, time = object$obs_data$time ) %>% dplyr::filter(series == !!(levels(object$obs_data$series)[1])) %>% NROW() } else { end_train <- object$obs_data %>% dplyr::filter(series == !!(levels(object$obs_data$series)[1])) %>% NROW() } # GP models not available in JAGS if (!is.null(ending_time)) { lv_estimates <- lv_estimates[, 1:ending_time] } else { lv_estimates <- lv_estimates[, (NCOL(lv_estimates) - 2):(NCOL(lv_estimates)) ] } }) } # Get rid of the large posterior arrays for trend and LV estimates; # they won't be needed for propagating the trends forward out$LV <- NULL out$trend <- NULL } if (!object$use_lv) { if (attr(object$model_data, 'trend_model') != 'None') { out$last_trends <- lapply( seq_along(levels(object$obs_data$series)), function(series) { if (object$fit_engine == 'stan') { trend_estimates <- mcmc_chains( object$model_output, 'trend' )[, seq( series, dim(mcmc_chains(object$model_output, 'trend'))[2], by = NCOL(object$ytimes) )] } else { trend_estimates <- mcmc_chains(object$model_output, 'trend')[, starts[series]:ends[series] ] } # Need to only use estimates from the training period if (class(object$obs_data)[1] == 'list') { end_train <- data.frame( y = object$obs_data$y, series = object$obs_data$series, time = object$obs_data$time ) %>% dplyr::filter( series == !!(levels(object$obs_data$series)[series]) ) %>% NROW() } else { end_train <- object$obs_data %>% dplyr::filter( series == !!(levels(object$obs_data$series)[series]) ) %>% NROW() } trend_estimates <- trend_estimates[, 1:end_train] # Only need last 3 timesteps if this is not a GP trend model if (attr(object$model_data, 'trend_model') == 'GP') { if (!is.null(ending_time)) { trend_estimates <- trend_estimates[, 1:ending_time] } else { trend_estimates <- trend_estimates } } else { if (!is.null(ending_time)) { trend_estimates <- trend_estimates[, 1:ending_time] } else { trend_estimates <- trend_estimates[, (NCOL(trend_estimates) - 2):(NCOL(trend_estimates)) ] } } trend_estimates } ) out$trend <- NULL if (attr(object$model_data, 'trend_model') == 'VAR1') { # Need to ensure all series' trends are retained when subsampling # to produce draw-specific forecasts from VAR models out$last_lvs <- out$last_trends out$last_trends <- NULL } } } } # Extract centred training times and number of GP basis functions # if this is a GP model if (attr(object$model_data, 'trend_model') == 'GP') { num_basis_line <- object$model_file[grep( 'num_gp_basis = ', object$model_file )] out$num_gp_basis <- as.numeric(unlist(regmatches( num_basis_line, gregexpr("[[:digit:]]+", num_basis_line) ))) out$mean_time <- mean(unique(object$obs_data$time)) out$time_cent <- unique(object$obs_data$time) - mean(unique(object$obs_data$time)) # Get the basis coefficients in the correct format n_series <- NCOL(object$ytimes) if (object$use_lv) { n_lv <- object$n_lv all_bgps <- out$b_gp out$b_gp <- lapply(seq_len(n_lv), function(lv) { all_bgps[, seq(lv, NCOL(all_bgps), by = NCOL(out$alpha_gp))] }) } else { all_bgps <- out$b_gp out$b_gp <- lapply(seq_len(n_series), function(series) { all_bgps[, seq(series, NCOL(all_bgps), by = NCOL(out$alpha_gp))] }) } } if (attr(object$model_data, 'trend_model') == 'ZMVN') { out$ar1 <- rep(0, NROW(out$Sigma)) } # Return list of extracted posterior parameter samples out } #' Function for extracting a single draw of trend parameters for use #' in many of the forecasting / evaluation functions #' @noRd extract_general_trend_pars = function(samp_index, trend_pars) { general_trend_pars <- lapply(seq_along(trend_pars), function(x) { if ( names(trend_pars)[x] %in% c( 'last_lvs', 'lv_coefs', 'last_trends', 'A', 'Sigma', 'theta', 'b_gp', 'error', 'delta_trend', 'cap' ) ) { if ( names(trend_pars)[x] %in% c('last_lvs', 'lv_coefs', 'last_trends', 'b_gp', 'delta_trend', 'cap') ) { out <- unname(lapply(trend_pars[[x]], `[`, samp_index, )) } if (names(trend_pars)[x] %in% c('A', 'Sigma', 'theta', 'error')) { out <- unname(trend_pars[[x]][samp_index, ]) } } else if (names(trend_pars)[x] %in% c('time_cent', 'mean_time')) { out <- trend_pars[[x]] } else { if (is.matrix(trend_pars[[x]])) { out <- unname(trend_pars[[x]][samp_index, ]) } else { out <- unname(trend_pars[[x]][samp_index]) } } out }) names(general_trend_pars) <- names(trend_pars) return(general_trend_pars) } #' Function for extracting a single draw of trend parameters for a single series; #' deprecated as all forecasting / prediction functions now operate jointly on all #' series at once #' @noRd extract_series_trend_pars = function( series, samp_index, trend_pars, use_lv = FALSE ) { trend_extracts <- lapply(seq_along(trend_pars), function(x) { if ( names(trend_pars)[x] %in% c( 'last_lvs', 'lv_coefs', 'last_trends', 'A', 'Sigma', 'theta', 'b_gp', 'error' ) ) { if (!use_lv & names(trend_pars)[x] == 'b_gp') { out <- trend_pars[[x]][[series]][samp_index, ] } if (use_lv & names(trend_pars)[x] == 'b_gp') { out <- lapply(trend_pars[[x]], `[`, samp_index, ) } if (names(trend_pars)[x] %in% c('last_trends', 'lv_coefs')) { out <- trend_pars[[x]][[series]][samp_index, ] } if (names(trend_pars)[x] %in% c('last_lvs')) { out <- lapply(trend_pars[[x]], `[`, samp_index, ) } if (names(trend_pars)[x] %in% c('A', 'Sigma', 'theta', 'error')) { out <- trend_pars[[x]][samp_index, ] } } else if (names(trend_pars)[x] %in% c('time_cent', 'mean_time')) { out <- trend_pars[[x]] } else { if (is.matrix(trend_pars[[x]])) { if (use_lv) { out <- trend_pars[[x]][samp_index, ] } else { out <- trend_pars[[x]][samp_index, series] } } else { out <- trend_pars[[x]][samp_index] } } out }) names(trend_extracts) <- names(trend_pars) return(trend_extracts) } #' Wrapper function to forecast trends #' @noRd forecast_trend = function( trend_model, use_lv, trend_pars, Xp_trend = NULL, betas_trend = NULL, h = 1, time = NULL, cap = NULL, time_dis = NULL ) { # Propagate dynamic factors forward if (use_lv) { n_lv <- length(trend_pars$last_lvs) if (trend_model == 'CAR1') { ar1 <- trend_pars$ar1 Sigma <- rlang::missing_arg() if ('drift' %in% names(trend_pars)) { drift <- trend_pars$drift } else { drift <- rep(0, length(ar1)) } varma_params <- prep_varma_params( drift = drift, ar1 = ar1, ar2 = 0, ar3 = 0, theta = 0, Sigma = Sigma, tau = trend_pars$tau, Xp_trend = Xp_trend, betas_trend = betas_trend, last_trends = do.call( cbind, (lapply(trend_pars$last_lvs, function(x) tail(x, 3))) ), h = h ) # Propagate forward next_lvs <- sim_corcar1( A = varma_params$A, A2 = varma_params$A2, A3 = varma_params$A3, drift = varma_params$drift, theta = varma_params$theta, Sigma = varma_params$Sigma, last_trends = varma_params$last_trends, last_errors = varma_params$last_errors, Xp_trend = varma_params$Xp_trend, betas_trend = varma_params$betas_trend, h = varma_params$h, time_dis = time_dis ) } if (trend_model == 'GP') { next_lvs <- do.call( cbind, lapply(seq_len(n_lv), function(lv) { sim_gp( alpha_gp = trend_pars$alpha_gp[lv], rho_gp = trend_pars$rho_gp[lv], last_trends = trend_pars$last_lvs[[lv]], h = h ) }) ) } if (trend_model == 'VAR1') { # Reconstruct the A and Sigma matrices if ('A' %in% names(trend_pars)) { Amat <- matrix( trend_pars$A, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) ar1 <- rlang::missing_arg() } else if ('ar1' %in% names(trend_pars)) { if (trend_pars$ar1[1] == 0) { ar1 <- rep(0, length(trend_pars$last_lvs)) } else { ar1 <- trend_pars$ar1 } Amat <- rlang::missing_arg() } else { ar1 <- rep(1, length(trend_pars$last_lvs)) Amat <- rlang::missing_arg() } if ('ar2' %in% names(trend_pars)) { ar2 <- trend_pars$ar2 } else { ar2 <- rep(0, length(trend_pars$last_lvs)) } if ('ar3' %in% names(trend_pars)) { ar3 <- trend_pars$ar3 } else { ar3 <- rep(0, length(trend_pars$last_lvs)) } if ('Sigma' %in% names(trend_pars)) { Sigmamat <- matrix( trend_pars$Sigma, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) } else if ('sigma' %in% names(trend_pars)) { Sigmamat <- matrix( 0, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) diag(Sigmamat) <- trend_pars$sigma } else { Sigmamat <- matrix( 0, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) diag(Sigmamat) <- 1 / trend_pars$tau } # Reconstruct the last trend matrix last_trendmat <- do.call( cbind, (lapply(trend_pars$last_lvs, function(x) tail(x, 3))) ) # If this is a moving average model, reconstruct theta matrix and # last error matrix if ('theta' %in% names(trend_pars)) { thetamat <- matrix( trend_pars$theta, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) errormat <- rbind( rep(0, length(trend_pars$last_lvs)), rep(0, length(trend_pars$last_lvs)), tail(trend_pars$error, length(trend_pars$last_lvs)) ) } else { thetamat <- rlang::missing_arg() errormat <- rlang::missing_arg() } # Prep VARMA parameters varma_params <- prep_varma_params( A = Amat, ar1 = ar1, ar2 = ar2, ar3 = ar3, Sigma = Sigmamat, last_trends = last_trendmat, last_errors = errormat, theta = thetamat, Xp_trend = Xp_trend, betas_trend = betas_trend, h = h ) next_lvs <- sim_varma( A = varma_params$A, A2 = varma_params$A2, A3 = varma_params$A3, drift = varma_params$drift, theta = varma_params$theta, Sigma = varma_params$Sigma, last_trends = varma_params$last_trends, last_errors = varma_params$last_errors, Xp_trend = varma_params$Xp_trend, betas_trend = varma_params$betas_trend, h = varma_params$h ) } trend_fc <- next_lvs } # Simpler if not using dynamic factors if (!use_lv) { if (trend_model == 'CAR1') { ar1 <- trend_pars$ar1 Sigma <- rlang::missing_arg() if ('drift' %in% names(trend_pars)) { drift <- trend_pars$drift } else { drift <- rep(0, length(ar1)) } varma_params <- prep_varma_params( drift = drift, ar1 = ar1, ar2 = 0, ar3 = 0, theta = 0, Sigma = Sigma, tau = trend_pars$tau, Xp_trend = Xp_trend, betas_trend = betas_trend, last_trends = do.call( cbind, (lapply(trend_pars$last_lvs, function(x) tail(x, 3))) ), h = h ) # Propagate forward trend_fc <- sim_corcar1( A = varma_params$A, A2 = varma_params$A2, A3 = varma_params$A3, drift = varma_params$drift, theta = varma_params$theta, Sigma = varma_params$Sigma, last_trends = varma_params$last_trends, last_errors = varma_params$last_errors, Xp_trend = varma_params$Xp_trend, betas_trend = varma_params$betas_trend, h = varma_params$h, time_dis = time_dis ) } if (trend_model == 'GP') { trend_fc <- sim_hilbert_gp( alpha_gp = trend_pars$alpha_gp, rho_gp = trend_pars$rho_gp, b_gp = trend_pars$b_gp, last_trends = trend_pars$last_trends, fc_times = time, train_times = trend_pars$time_cent, mean_train_times = trend_pars$mean_time ) } if (trend_model == 'VAR1') { # Reconstruct the A and Sigma matrices if ('A' %in% names(trend_pars)) { Amat <- matrix( trend_pars$A, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) ar1 <- rlang::missing_arg() } else if ('ar1' %in% names(trend_pars)) { if (trend_pars$ar1[1] == 0) { ar1 <- rep(0, length(trend_pars$last_lvs)) } else { ar1 <- trend_pars$ar1 } Amat <- rlang::missing_arg() } else { ar1 <- rep(1, length(trend_pars$last_lvs)) Amat <- rlang::missing_arg() } if ('ar2' %in% names(trend_pars)) { ar2 <- trend_pars$ar2 } else { ar2 <- rep(0, length(trend_pars$last_lvs)) } if ('ar3' %in% names(trend_pars)) { ar3 <- trend_pars$ar3 } else { ar3 <- rep(0, length(trend_pars$last_lvs)) } if ('Sigma' %in% names(trend_pars)) { Sigmamat <- matrix( trend_pars$Sigma, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) } else if ('sigma' %in% names(trend_pars)) { Sigmamat <- matrix( 0, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) diag(Sigmamat) <- trend_pars$sigma } else { Sigmamat <- matrix( 0, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) diag(Sigmamat) <- 1 / trend_pars$tau } # Reconstruct the last trend matrix last_trendmat <- do.call( cbind, (lapply(trend_pars$last_lvs, function(x) tail(x, 3))) ) # If this is a moving average model, reconstruct theta matrix and # last error matrix if ('theta' %in% names(trend_pars)) { thetamat <- matrix( trend_pars$theta, nrow = length(trend_pars$last_lvs), ncol = length(trend_pars$last_lvs), byrow = TRUE ) errormat <- rbind( rep(0, length(trend_pars$last_lvs)), rep(0, length(trend_pars$last_lvs)), tail(trend_pars$error, length(trend_pars$last_lvs)) ) } else { thetamat <- rlang::missing_arg() errormat <- rlang::missing_arg() } # Prep VARMA parameters varma_params <- prep_varma_params( A = Amat, ar1 = ar1, ar2 = ar2, ar3 = ar3, Sigma = Sigmamat, last_trends = last_trendmat, last_errors = errormat, theta = thetamat, Xp_trend = Xp_trend, betas_trend = betas_trend, h = h ) # Propagate forward trend_fc <- sim_varma( A = varma_params$A, A2 = varma_params$A2, A3 = varma_params$A3, drift = varma_params$drift, theta = varma_params$theta, Sigma = varma_params$Sigma, last_trends = varma_params$last_trends, last_errors = varma_params$last_errors, Xp_trend = varma_params$Xp_trend, betas_trend = varma_params$betas_trend, h = varma_params$h ) } if (trend_model == 'PWlinear') { insight::check_if_installed( "extraDistr", reason = 'to simulate from piecewise trends' ) trend_fc <- do.call( cbind, lapply(seq_along(trend_pars$delta_trend), function(x) { # Sample forecast horizon changepoints n_changes <- stats::rpois( 1, (trend_pars$change_freq * (max(time) - min(time))) ) # Sample deltas lambda <- median(abs(c(trend_pars$delta_trend[[x]]))) + 1e-8 deltas_new <- extraDistr::rlaplace(n_changes, mu = 0, sigma = lambda) # Spread changepoints evenly across the forecast horizon t_change_new <- unique(sample( min(time):max(time), n_changes, replace = TRUE )) # Combine with changepoints from the history deltas <- c(trend_pars$delta_trend[[x]], deltas_new) changepoint_ts <- sort(c(trend_pars$changepoints, t_change_new)) # Generate a trend draw draw <- suppressWarnings(piecewise_linear( t = 1:max(time), deltas = deltas, k = trend_pars$k_trend[x], m = trend_pars$m_trend[x], changepoint_ts = changepoint_ts )) # Keep only the forecast horizon estimates tail(draw, max(time) - min(time) + 1) }) ) } if (trend_model == 'PWlogistic') { insight::check_if_installed( "extraDistr", reason = 'to simulate from piecewise trends' ) trend_fc <- do.call( cbind, lapply(seq_along(trend_pars$delta_trend), function(x) { # Sample forecast horizon changepoints n_changes <- stats::rpois( 1, (trend_pars$change_freq * (max(time) - min(time))) ) # Sample deltas lambda <- median(abs(c(trend_pars$delta_trend[[x]]))) + 1e-8 deltas_new <- extraDistr::rlaplace(n_changes, mu = 0, sigma = lambda) # Spread changepoints evenly across the forecast horizon t_change_new <- unique(sample( min(time):max(time), n_changes, replace = TRUE )) # Combine with changepoints from the history deltas <- c(trend_pars$delta_trend[[x]], deltas_new) changepoint_ts <- sort(c(trend_pars$changepoints, t_change_new)) # Get historical capacities oldcaps <- trend_pars$cap[[x]] # And forecast capacities s_name <- levels(cap$series)[x] newcaps = cap %>% dplyr::filter(series == s_name) %>% dplyr::arrange(time) %>% dplyr::pull(cap) caps <- c(oldcaps, newcaps) # Generate a trend draw draw <- piecewise_logistic( t = 1:max(time), cap = caps, deltas = deltas, k = trend_pars$k_trend[x], m = trend_pars$m_trend[x], changepoint_ts = changepoint_ts ) # Keep only the forecast horizon estimates tail(draw, max(time) - min(time) + 1) }) ) } } return(trend_fc) } ================================================ FILE: R/update.mvgam.R ================================================ #' Update an existing \pkg{mvgam} model object #' #' This function allows a previously fitted \pkg{mvgam} model to be updated. #' #' @name update.mvgam #' #' @importFrom mgcv nb betar #' @importFrom rlang missing_arg #' #' @inheritParams mvgam #' #' @param object \code{list} object returned from \code{mvgam}. See [mvgam()] #' #' @param formula Optional new `formula` object. Note, `mvgam` currently does #' not support dynamic formula updates such as removal of specific terms with #' `- term`. When updating, the entire formula needs to be supplied. #' #' @param ... Other arguments to be passed to \code{\link{mvgam}} or #' \code{\link{jsdgam}} #' #' @return A \code{list} object of class \code{mvgam} containing model output, #' the text representation of the model file, the mgcv model output (for #' easily generating simulations at unsampled covariate values), Dunn-Smyth #' residuals for each outcome variable and key information needed for other #' functions in the package. See \code{\link{mvgam-class}} for details. Use #' `methods(class = "mvgam")` for an overview on available methods. #' #' @author Nicholas J Clark #' #' @examples #' \dontrun{ #' # Simulate some data and fit a Poisson AR1 model #' simdat <- sim_mvgam(n_series = 1, trend_model = AR()) #' #' mod <- mvgam( #' y ~ s(season, bs = 'cc'), #' trend_model = AR(), #' noncentred = TRUE, #' data = simdat$data_train, #' chains = 2 #' ) #' #' summary(mod) #' conditional_effects(mod, type = 'link') #' #' # Update to an AR2 model #' updated_mod <- update( #' mod, #' trend_model = AR(p = 2), #' noncentred = TRUE #' ) #' #' summary(updated_mod) #' conditional_effects(updated_mod, type = 'link') #' #' # Now update to a Binomial AR1 by adding information on trials #' # requires that we supply newdata that contains the 'trials' variable #' simdat$data_train$trials <- max(simdat$data_train$y) + 15 #' #' updated_mod <- update( #' mod, #' formula = cbind(y, trials) ~ s(season, bs = 'cc'), #' noncentred = TRUE, #' data = simdat$data_train, #' family = binomial() #' ) #' #' summary(updated_mod) #' conditional_effects(updated_mod, type = 'link') #' } #' #' @export update.mvgam = function( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) { if (missing(chains)) { chains <- object$model_output@sim$chains } if (missing(burnin)) { burnin <- object$model_output@sim$warmup if (is.null(burnin)) burnin <- 0 } if (missing(samples)) { samples <- object$model_output@sim$iter - burnin } if (missing(threads)) { threads <- attr(object$model_data, 'threads') if (is.null(threads)) threads <- 1 } if (missing(algorithm)) { algorithm <- object$algorithm } if (!algorithm %in% 'sampling') { burnin <- 1 } if (missing(formula)) { formula <- object$call if (attr(object$mgcv_model, 'drop_obs_intercept')) { formula <- update(formula, ~ . - 1) } } if (missing(knots)) { if (is.null(attr(object$mgcv_model, 'knots'))) { knots <- missing_arg() } else { knots <- attr(object$mgcv_model, 'knots') } } if (missing(share_obs_params)) { share_obs_params <- object$share_obs_params } if (missing(trend_formula)) { if (is.null(object$trend_call)) { trend_formula <- missing_arg() } else { trend_formula <- object$trend_call } } if (!missing(trend_formula)) { if (missing(trend_knots)) { if (is.null(attr(object$trend_mgcv_model, 'knots'))) { trend_knots <- missing_arg() } else { trend_knots <- attr(object$trend_mgcv_model, 'knots') } } } if (missing(trend_map)) { if (is.null(object$trend_map)) { trend_map <- missing_arg() } else { trend_map <- object$trend_map } } if (missing(data)) { data_train <- object$obs_data } else { data_train <- data } if (missing(priors)) { if (!is.null(object$priors)) { priors <- object$priors } else { priors <- rlang::missing_arg() } } if (!missing(newdata)) { # If new testing data supplied, include as the test data data_test <- newdata include_fc <- TRUE } else if (!is.null(object$test_data)) { # only include test data when no new training data is supplied if (missing(data)) { include_fc <- TRUE data_test <- object$test_data } else { include_fc <- FALSE } } else { include_fc <- FALSE } if (missing(trend_model)) { trend_model <- object$trend_model } if (missing(use_lv)) { use_lv <- object$use_lv } if (missing(n_lv)) { n_lv <- object$n_lv } if (missing(family)) { family_char <- object$family family <- family_char if (family_char == 'negative binomial') { family <- nb() } if (family_char == 'beta') { family <- betar() } if (family_char == 'student') { family <- student_t() } } if (include_fc) { updated_mod <- mvgam( formula = formula, trend_formula = trend_formula, trend_map = trend_map, knots = knots, trend_knots = trend_knots, data = data_train, newdata = data_test, trend_model = trend_model, use_lv = use_lv, n_lv = n_lv, family = family, share_obs_params = share_obs_params, refit = TRUE, lfo = lfo, use_stan = ifelse(object$fit_engine == 'stan', TRUE, FALSE), priors = priors, chains = chains, burnin = burnin, samples = samples, algorithm = algorithm, threads = threads, ... ) } else { updated_mod <- mvgam( formula = formula, trend_formula = trend_formula, trend_map = trend_map, knots = knots, trend_knots = trend_knots, data = data_train, trend_model = trend_model, use_lv = use_lv, n_lv = n_lv, family = family, share_obs_params = share_obs_params, refit = TRUE, lfo = lfo, use_stan = ifelse(object$fit_engine == 'stan', TRUE, FALSE), priors = priors, chains = chains, burnin = burnin, samples = samples, algorithm = algorithm, threads = threads, ... ) } return(updated_mod) } #' @rdname update.mvgam #' #' @inheritParams jsdgam #' @inheritParams update.mvgam #' #' @param formula Optional new `formula` object. Note, `mvgam` currently does #' not support dynamic formula updates such as removal of specific terms with #' `- term`. When updating, the entire formula needs to be supplied. #' #' @param factor_formula Optional new `formula` object for the factor linear #' predictors #' #' @return A \code{list} object of class \code{mvgam} containing model output, #' the text representation of the model file, the mgcv model output (for #' easily generating simulations at unsampled covariate values), Dunn-Smyth #' residuals for each series and key information needed for other functions in #' the package. See \code{\link{mvgam-class}} for details. Use #' `methods(class = "mvgam")` for an overview on available methods. #' #' @author Nicholas J Clark #' #' @export update.jsdgam = function( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) { if (missing(chains)) { chains <- object$model_output@sim$chains } if (missing(burnin)) { burnin <- object$model_output@sim$warmup if (is.null(burnin)) burnin <- 0 } if (missing(samples)) { samples <- object$model_output@sim$iter - burnin } if (missing(threads)) { threads <- attr(object$model_data, 'threads') if (is.null(threads)) threads <- 1 } if (missing(algorithm)) { algorithm <- object$algorithm } if (!algorithm %in% 'sampling') { burnin <- 1 } if (missing(formula)) { formula <- object$call if (attr(object$mgcv_model, 'drop_obs_intercept')) { formula <- update(formula, ~ . - 1) } } if (missing(factor_formula)) { factor_formula <- object$trend_call } if (missing(knots)) { if (is.null(attr(object$mgcv_model, 'knots'))) { knots <- missing_arg() } else { knots <- attr(object$mgcv_model, 'knots') } } if (missing(factor_knots)) { if (is.null(attr(object$trend_mgcv_model, 'knots'))) { factor_knots <- missing_arg() } else { factor_knots <- attr(object$trend_mgcv_model, 'knots') } } if (missing(share_obs_params)) { share_obs_params <- object$share_obs_params } if (missing(data)) { data_train <- object$obs_data } else { data_train <- data } if (missing(priors)) { if (!is.null(object$priors)) { priors <- object$priors } else { priors <- rlang::missing_arg() } } if (!missing(newdata)) { # If new testing data supplied, include as the test data data_test <- newdata include_fc <- TRUE } else if (!is.null(object$test_data)) { # only include test data when no new training data is supplied if (missing(data)) { include_fc <- TRUE data_test <- object$test_data } else { include_fc <- FALSE } } else { include_fc <- FALSE } if (missing(n_lv)) { n_lv <- object$n_lv } if (missing(family)) { family_char <- object$family family <- family_char if (family_char == 'negative binomial') { family <- nb() } if (family_char == 'beta') { family <- betar() } if (family_char == 'student') { family <- student_t() } } if (include_fc) { updated_mod <- jsdgam( formula = formula, factor_formula = factor_formula, knots = knots, factor_knots = factor_knots, data = data_train, newdata = data_test, trend_model = trend_model, n_lv = n_lv, family = family, share_obs_params = share_obs_params, refit = TRUE, lfo = lfo, use_stan = ifelse(object$fit_engine == 'stan', TRUE, FALSE), priors = priors, chains = chains, burnin = burnin, samples = samples, algorithm = algorithm, threads = threads, ... ) } else { updated_mod <- mvgam( formula = formula, factor_formula = factor_formula, knots = knots, factor_knots = factor_knots, data = data_train, trend_model = trend_model, n_lv = n_lv, family = family, share_obs_params = share_obs_params, refit = TRUE, lfo = lfo, use_stan = ifelse(object$fit_engine == 'stan', TRUE, FALSE), priors = priors, chains = chains, burnin = burnin, samples = samples, algorithm = algorithm, threads = threads, ... ) } return(updated_mod) } ================================================ FILE: R/update_priors.R ================================================ #' Update priors for a JAGS or Stan model file #' #' #' @param model_file Prepared mvgam model file #' @param priors \code{data.frame} with prior definitions (in JAGS or Stan syntax) #' @param use_stan \code{logical}. Only Stan models can have parameter bounds edited #' @return A `character string` containing the updated model file #' @noRd update_priors = function(model_file, priors, use_stan) { # Check the prior df structure if (!any(class(priors) == 'data.frame')) { stop( 'priors must be a data.frame with at least the colnames: param_name, prior' ) } if (!'prior' %in% names(priors)) { stop( 'priors must be a data.frame with at least the colnames: param_name, prior' ) } if (!'param_name' %in% names(priors)) { stop( 'priors must be a data.frame with at least the colnames: param_name, prior' ) } # Replace any call to 'Intercept' with '(Intercept)' to match mgcv style priors[] <- lapply( priors, function(x) gsub("Intercept(?!.*[^()]*\\))", "(Intercept)", x, perl = TRUE) ) if (!is.null(attr(priors, 'posterior_to_prior'))) { model_file <- posterior_to_prior(model_file, priors) } else { # Modify the file to update the prior definitions for (i in 1:NROW(priors)) { # gp() terms can be supplied using more mgcv-like syntax; replace # with the uglier syntax that is used in the Stan code so the prior # can be correctly updated if ( grepl('gp(', priors$prior[i], fixed = TRUE) | grepl('gp_trend', priors$prior[i], fixed = TRUE) ) { priors$prior[i] <- paste( clean_gp_priorname(trimws(strsplit(priors$prior[i], "[~]")[[1]][1])), '~', trimws(strsplit(priors$prior[i], "[~]")[[1]][2]) ) } if ( !any(grepl( paste(trimws(strsplit(priors$prior[i], "[~]")[[1]][1]), '~'), model_file, fixed = TRUE )) ) { # Updating parametric effects if ( any(grepl( paste0('// prior for ', priors$param_name[i], '...'), model_file, fixed = TRUE )) ) { header_line <- grep( paste0('// prior for ', priors$param_name[i], '...'), model_file, fixed = TRUE ) newprior <- paste(trimws(strsplit(priors$prior[i], "[~]")[[1]][2])) model_file[header_line + 1] <- paste( trimws(strsplit(model_file[header_line + 1], "[~]")[[1]][1]), '~', newprior ) } else if (grepl('num_gp_basis', priors$prior[i])) { model_file[grep( 'num_gp_basis = min(20, n);', model_file, fixed = TRUE )] <- priors$prior[i] } else if (grepl('=', priors$prior[i])) { tomatch <- trimws(strsplit( paste0( '\\b', gsub( ']', '\\]', gsub('[', '\\[', priors$prior[i], fixed = TRUE), fixed = TRUE ) ), "[=]" )[[1]][1]) model_file[grep(tomatch, model_file, fixed = TRUE)] <- priors$prior[i] } else { warning( 'no match found in model_file for parameter: ', trimws(strsplit(priors$prior[i], "[~]")[[1]][1]), call. = FALSE ) } } else { model_file[grep( paste(trimws(strsplit(priors$prior[i], "[~]")[[1]][1]), '~'), model_file, fixed = TRUE )] <- priors$prior[i] } } # Modify the file to update any bounds on parameters if (use_stan) { if (any(!is.na(c(priors$new_lowerbound, priors$new_upperbound)))) { for (i in 1:NROW(priors)) { # Not currently possible to include new bounds on parametric effect # priors if (grepl('fixed effect|Intercept', priors$param_info[i])) { if ( !is.na(priors$new_lowerbound)[i] | !is.na(priors$new_upperbound)[i] ) { warning( 'not currently possible to place bounds on fixed effect priors: ', trimws(strsplit(priors$prior[i], "[~]")[[1]][1]), call. = FALSE ) } } else { # gp() terms can be supplied using more mgcv-like syntax; replace # with the uglier syntax that is used in the Stan code so the prior # can be correctly updated if ( grepl('gp(', priors$param_name[i], fixed = TRUE) | grepl('gp_trend', priors$param_name[i], fixed = TRUE) ) { priors$param_name[i] <- gsub( '(', '_', priors$param_name[i], fixed = TRUE ) priors$param_name[i] <- gsub( ')', '_', priors$param_name[i], fixed = TRUE ) priors$param_name[i] <- gsub( ':', 'by', priors$param_name[i], fixed = TRUE ) } # Create boundary text strings if (!is.na(priors$new_lowerbound[i])) { change_lower <- TRUE lower_text <- paste0('lower=', priors$new_lowerbound[i]) } else { if (grepl('lower=', priors$param_name[i])) { change_lower <- TRUE lower_text <- paste0( 'lower=', regmatches( priors$param_name[i], regexpr( "lower=.*?\\K-?\\d+", priors$param_name[i], perl = TRUE ) ) ) } else { change_lower <- FALSE } } if (!is.na(priors$new_upperbound[i])) { change_upper <- TRUE upper_text <- paste0('upper=', priors$new_upperbound[i]) } else { if (grepl('upper=', priors$param_name[i])) { change_upper <- TRUE upper_text <- paste0( 'upper=', regmatches( priors$param_name[i], regexpr( "upper=.*?\\K-?\\d+", priors$param_name[i], perl = TRUE ) ) ) } else { change_upper <- FALSE } } # Insert changes if (change_lower & change_upper) { model_file[grep( trimws(priors$param_name[i]), model_file, fixed = TRUE )] <- ifelse( !grepl('<', priors$param_name[i]), sub( '\\[', paste0('<', lower_text, ',', upper_text, '>\\['), priors$param_name[i] ), sub( "<[^\\)]+>", paste0('<', lower_text, ',', upper_text, '>'), priors$param_name[i] ) ) } if (change_lower & !change_upper) { model_file[grep( trimws(priors$param_name[i]), model_file, fixed = TRUE )] <- ifelse( !grepl('<', priors$param_name[i]), sub( '\\[', paste0('<', lower_text, '>\\['), priors$param_name[i] ), sub( "<[^\\)]+>", paste0('<', lower_text, '>'), priors$param_name[i] ) ) } if (change_upper & !change_lower) { model_file[grep( trimws(priors$param_name[i]), model_file, fixed = TRUE )] <- ifelse( !grepl('<', priors$param_name[i]), sub( '\\[', paste0('<', upper_text, '>\\['), priors$param_name[i] ), sub( "<[^\\)]+>", paste0('<', upper_text, '>'), priors$param_name[i] ) ) } } change_lower <- FALSE change_upper <- FALSE } } } } return(model_file) } #' Make detailed changes to allow a prior model to as closely match a posterior #' from a previous model as possible #' @noRd posterior_to_prior = function(model_file, priors) { # parametric terms para_terms <- priors$group[which(priors$parametric == TRUE)] para_priors <- priors$prior[which(priors$parametric == TRUE)] para_lowers <- priors$lb[which(priors$parametric == TRUE)] para_uppers <- priors$ub[which(priors$parametric == TRUE)] if (length(para_terms) > 0) { for (i in 1:length(para_terms)) { header_line <- grep( paste0(para_terms[i], '...'), model_file, fixed = TRUE ) model_file[header_line + 1] <- paste0( trimws(strsplit(model_file[header_line + 1], "[~]")[[1]][1]), ' ~ ', para_priors[i], ';' ) } } # Other lines to modify mainlines_to_modify <- unique(priors$group[which(priors$parametric == FALSE)]) for (i in 1:length(mainlines_to_modify)) { priors %>% dplyr::filter(group == mainlines_to_modify[i]) -> group_priors replace_line <- c() for (j in 1:NROW(group_priors)) { replace_line <- c( replace_line, paste0(group_priors$class[j], ' ~ ', group_priors$prior[j]) ) } replace_line <- paste0(paste(replace_line, collapse = ';\n'), ';\n') orig_line <- grep( paste(trimws(strsplit(mainlines_to_modify[i], "[~]")[[1]][1]), '~'), model_file, fixed = TRUE ) model_file[orig_line] <- replace_line } model_file <- readLines(textConnection(model_file), n = -1) if ('P_real' %in% mainlines_to_modify) { priors %>% dplyr::filter(group == 'P_real') -> group_priors replace_line <- c() for (j in 1:NROW(group_priors)) { replace_line <- c( replace_line, paste0(group_priors$class[j], ' ~ ', group_priors$prior[j]) ) } replace_line <- paste0(paste(replace_line, collapse = ';\n'), ';\n') remove_start <- grep( '// partial autocorrelation hyperpriors', model_file, fixed = TRUE ) + 1 remove_end <- grep( 'P_real[i, j] ~ normal(Pmu[2], 1 / sqrt(Pomega[2]));', model_file, fixed = TRUE ) + 2 model_file <- model_file[-c(remove_start:remove_end)] model_file[grep( '// partial autocorrelation hyperpriors', model_file, fixed = TRUE )] <- paste0(' // partial autocorrelation hyperpriors\n', replace_line) model_file <- readLines(textConnection(model_file), n = -1) } return(model_file) } #' Allow brmsprior objects to be supplied to mvgam() #' @noRd adapt_brms_priors = function( priors, formula, trend_formula, data, family = 'poisson', use_lv = FALSE, n_lv, trend_model = 'None', trend_map, drift = FALSE, warnings = FALSE, knots ) { # Replace any call to 'Intercept' with '(Intercept)' to match mgcv style priors[] <- lapply( priors, function(x) gsub("Intercept(?!.*[^()]*\\))", "(Intercept)", x, perl = TRUE) ) # Get priors that are able to be updated priors_df <- get_mvgam_priors( formula = formula, trend_formula = trend_formula, data = data, family = family, use_lv = use_lv, n_lv = n_lv, trend_model = trend_model, trend_map = trend_map, knots = knots ) if ( any( grepl('_gp', priors$class, fixed = TRUE) & (!is.na(priors$ub) | !is.na(priors$lb)) ) ) { warning( 'bounds cannot currently be changed for gp parameters', call. = FALSE ) } # Update using priors from the brmsprior object for (i in 1:NROW(priors)) { newclass <- ifelse( grepl('_gp(', priors$class[i], fixed = TRUE), clean_gp_priorname(priors$class[i]), priors$class[i] ) newcoef <- ifelse( grepl('_gp(', priors$coef[i], fixed = TRUE), clean_gp_priorname(priors$coef[i]), priors$coef[i] ) if ( any(grepl(paste0(priors$class[i], ' ~ '), priors_df$prior, fixed = TRUE)) ) { # Update the prior distribution priors_df$prior[grepl( paste0(priors$class[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- paste0(newclass, ' ~ ', priors$prior[i], ';') # Now update bounds priors_df$new_lowerbound[grepl( paste0(priors$class[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- priors$lb[i] priors_df$new_upperbound[grepl( paste0(priors$class[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- priors$ub[i] } else if ( priors$coef[i] != '' & any(grepl(paste0(priors$coef[i], ' ~ '), priors_df$prior, fixed = TRUE)) ) { # Update the prior distribution priors_df$prior[grepl( paste0(priors$coef[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- paste0(newcoef, ' ~ ', priors$prior[i], ';') # Now update bounds priors_df$new_lowerbound[grepl( paste0(priors$coef[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- priors$lb[i] priors_df$new_upperbound[grepl( paste0(priors$coef[i], ' ~ '), priors_df$prior, fixed = TRUE )] <- priors$ub[i] } else if (priors$class[i] == 'b') { # Update all fixed effect priors if (any(grepl('fixed effect', priors_df$param_info))) { for (j in 1:NROW(priors_df)) { if (grepl('fixed effect', priors_df$param_info[j])) { priors_df$prior[j] <- paste0( paste( trimws( strsplit(priors_df$prior[j], "[~]")[[1]][1] ), '~ ' ), priors$prior[i], ';' ) } } } } else { if (warnings) { warning( paste0( 'no match found in model_file for parameter: ', paste0(priors$class[i], ' ', priors$coef[i]) ), call. = FALSE ) } } } return(priors_df) } #'@noRd clean_gp_priorname = function(prior) { if (grepl('[', prior, fixed = TRUE)) { newlhs <- trimws(strsplit(prior, "\\[")[[1]][1]) if (grepl('[1][', prior, fixed = TRUE)) { index <- paste0('[1][', trimws(strsplit(prior, "\\[")[[1]][3])) } else { index <- '[1]' } out <- paste0(clean_gpnames(newlhs), index) } else { out <- clean_gpnames(prior) } out } ================================================ FILE: R/utils-pipe.R ================================================ #' Pipe operator #' #' See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. #' #' @name %>% #' @rdname pipe #' @keywords internal #' @export #' @importFrom magrittr %>% #' @usage lhs \%>\% rhs #' @param lhs A value or the magrittr placeholder. #' @param rhs A function call using the magrittr semantics. #' @return The result of calling `rhs(lhs)`. NULL ================================================ FILE: R/validations.R ================================================ #'Argument validation functions #'@param data Data to be validated (list or data.frame) #'@noRd validate_series_time = function( data, name = 'data', trend_model, check_levels = TRUE, check_times = TRUE ) { # First validation requires the full trend_model object if (!inherits(trend_model, 'mvgam_trend')) { trend_model <- list( trend_model = trend_model, unit = 'time', gr = 'NA', subgr = 'series' ) } # Respect any original additions of implicit_vars implicit_series <- implicit_time <- FALSE if (!is.null(attr(data, 'implicit_vars'))) { implicit_series <- 'series' %in% attr(data, 'implicit_vars') implicit_time <- 'time' %in% attr(data, 'implicit_vars') } # Validate any grouping structure and update the data accordingly data <- validate_series_groups( data = data, trend_model = trend_model, name = name ) if (!is.null(attr(data, 'implicit_vars'))) { implicit_series <- 'series' %in% attr(data, 'implicit_vars') implicit_time <- 'time' %in% attr(data, 'implicit_vars') } # Now we only need the character trend_model string trend_model <- trend_model$trend_model # Series label must be present as a factor and # must contain a time variable if (inherits(data, 'data.frame')) { data %>% dplyr::ungroup() -> data if (!'series' %in% colnames(data)) { data$series <- factor('series1') implicit_series <- TRUE } # Series factor must have all unique levels present if (!is.factor(data$series)) { stop('Variable "series" must be a factor type', call. = FALSE) } if (!'time' %in% colnames(data)) { if (trend_model == 'None') { # Add a time indicator if missing data %>% dplyr::group_by(series) %>% dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() -> data implicit_time <- TRUE } else { stop(name, " does not contain a 'time' variable", call. = FALSE) } } } if (inherits(data, 'list')) { if (!'series' %in% names(data)) { data$series <- factor('series1') implicit_series <- TRUE } # Series factor must have all unique levels present if (!is.factor(data$series)) { stop('Variable "series" must be a factor type', call. = FALSE) } if (!'time' %in% names(data)) { if (trend_model == 'None') { # Add a time indicator if missing data.frame(series = data$series) %>% dplyr::group_by(series) %>% dplyr::mutate(time = dplyr::row_number()) %>% dplyr::pull(time) -> times implicit_time <- TRUE data$time <- times } else { stop(name, " does not contain a 'time' variable", call. = FALSE) } } } # Add an identifier so post-processing functions know # what the original supplied data ordering was; this is needed # for ensuring that functions such as fitted() and # residuals() return objects that match the original order that # the user supplied, if no newdata are given data$index..orig..order <- 1:length(data$time) # Add a new 'time' variable that will be useful for rearranging data for # modeling, in case 'time' is also supplied as a covariate or if this is # a continuous time model (CAR1) data$index..time..index <- data$time # Use the data ordering to set the index of time for CAR1 if (trend_model == 'CAR1') { data.frame(series = data$series, time = data$time) %>% dplyr::mutate(orig_rows = dplyr::row_number()) %>% dplyr::group_by(series) %>% dplyr::mutate(idx = dplyr::row_number()) %>% dplyr::arrange(time) %>% dplyr::mutate(time. = dplyr::row_number()) %>% dplyr::ungroup() %>% dplyr::arrange(orig_rows) %>% dplyr::pull(time.) -> times data$index..time..index <- times } # Series factor must have all unique levels present if this is a # forecast check if (check_levels) { if (!all(levels(data$series) %in% unique(data$series))) { stop( paste0( 'Mismatch between factor levels of "series" and unique values of "series"', '\n', 'Use\n `setdiff(levels(data$series), unique(data$series))` \nand', '\n', ' `intersect(levels(data$series), unique(data$series))`\nfor guidance' ), call. = FALSE ) } } # Ensure each series has an observation, even if NA, for each # unique timepoint (only for trend models that require discrete time with # regularly spaced sampling intervals) if (check_times) { all_times_avail = function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } min_time <- as.numeric(min(data$index..time..index)) max_time <- as.numeric(max(data$index..time..index)) data.frame(series = data$series, time = data$index..time..index) %>% dplyr::group_by(series) %>% dplyr::summarise( all_there = all_times_avail(time, min_time, max_time) ) -> checked_times if (any(checked_times$all_there == FALSE)) { stop( "One or more series in ", name, " is missing observations for one or more timepoints", call. = FALSE ) } } if (implicit_series & implicit_time) { attr(data, 'implicit_vars') <- c('series', 'time') } if (implicit_series & !implicit_time) { attr(data, 'implicit_vars') <- 'series' } if (implicit_time & !implicit_series) { attr(data, 'implicit_vars') <- 'time' } if (!implicit_time & !implicit_series) { attr(data, 'implicit_vars') <- NULL } return(data) } # Function to ensure units of analysis, groups and subgroups are arranged # and formatted properly for mvgam processing and modelling #'@noRd validate_series_groups = function(data, trend_model, name = 'data') { implicit_series <- implicit_time <- FALSE # Checks only needed if trend_model isn't 'None' if (trend_model$trend_model != 'None') { # Check that unit and subgr exist in data and are the correct type if (is.null(trend_model$gr)) { trend_model$gr <- 'NA' } if (is.null(trend_model$unit)) { trend_model$unit <- 'time' } if (is.null(trend_model$subgr)) { trend_model$subgr <- 'series' } if ( trend_model$gr == 'NA' & trend_model$unit == 'time' & trend_model$subgr == 'series' ) { if (!'series' %in% names(data)) { data$series <- factor('series1') implicit_series <- TRUE } if (!'time' %in% names(data)) { if (trend_model$trend_model %in% c('ZMVNcor', 'ZMVNhiercor')) { # Add a time indicator if missing data.frame(series = data$series) %>% dplyr::group_by(series) %>% dplyr::mutate(time = dplyr::row_number()) %>% dplyr::pull(time) -> times implicit_time <- TRUE data$time <- times } else { stop(name, " does not contain a 'time' variable", call. = FALSE) } } } validate_var_exists( data = data, variable = trend_model$unit, type = 'num/int', name = name, trend_char = trend_model$trend_model ) validate_var_exists( data = data, variable = trend_model$subgr, type = 'factor', name = name, trend_char = trend_model$trend_model ) # If gr is supplied, check it exists and is the correct type if (trend_model$gr != 'NA') { implicit_series <- implicit_time <- TRUE validate_var_exists( data = data, variable = trend_model$gr, type = 'factor', name = name, trend_char = trend_model$trend_model ) if (trend_model$subgr == 'series') { stop( 'argument "subgr" cannot be set to "series" if "gr" is also supplied', call. = FALSE ) } # Add necessary 'time' variable gr_dat <- data.frame( time = data[[trend_model$unit]], gr = data[[trend_model$gr]], subgr = data[[trend_model$subgr]] ) # Check that each level of gr contains all possible levels of subgr gr_total_levels <- gr_dat %>% dplyr::group_by(gr) %>% dplyr::summarise(tot_subgrs = length(unique(subgr))) %>% dplyr::ungroup() %>% dplyr::pull(tot_subgrs) if (length(gr_total_levels) > 1) { if (stats::var(gr_total_levels) != 0) { stop( paste0( 'Some levels of "', trend_model$gr, '" do not contain all\n', 'unique levels of "', trend_model$subgr, '"', " in ", name ), call. = FALSE ) } } gr_dat %>% dplyr::mutate( series = interaction( gr, subgr, drop = TRUE, sep = '_', lex.order = TRUE ) ) -> gr_dat } else { if (trend_model$unit != 'time') { implicit_time <- TRUE } if (trend_model$subgr != 'series') { implicit_series <- TRUE } gr_dat <- data.frame( time = data[[trend_model$unit]], subgr = data[[trend_model$subgr]] ) %>% dplyr::mutate(series = as.factor(subgr)) } # Add the possibly new 'series' and 'time' variables to # the data and return data$series <- gr_dat$series data$time <- gr_dat$time } if (implicit_series & implicit_time) { attr(data, 'implicit_vars') <- c('series', 'time') } if (implicit_series & !implicit_time) { attr(data, 'implicit_vars') <- 'series' } if (implicit_time & !implicit_series) { attr(data, 'implicit_vars') <- 'time' } if (!implicit_time & !implicit_series) { attr(data, 'implicit_vars') <- NULL } return(data) } #'@noRd validate_var_exists = function( data, variable, type = 'factor', name = 'data', trend_char ) { if (trend_char != 'None') { if (!exists(variable, data)) { stop( paste0('Variable "', variable, '" not found in ', name), call. = FALSE ) } if (type == 'num/int') { if (!is.numeric(data[[variable]])) { stop( paste0( 'Variable "', variable, '" must be either numeric or integer type' ), call. = FALSE ) } } if (type == 'factor') { if (!is.factor(data[[variable]])) { stop( paste0('Variable "', variable, '" must be a factor type'), call. = FALSE ) } } } } #'@noRd deparse_variable = function(...) { deparse0(substitute(...)) } #'@noRd as_one_logical = function(x, allow_na = FALSE) { s <- substitute(x) x <- as.logical(x) if (length(x) != 1L || anyNA(x) && !allow_na) { s <- deparse0(s, max_char = 100L) stop("Cannot coerce '", s, "' to a single logical value.", call. = FALSE) } x } #'@noRd as_one_integer <- function(x, allow_na = FALSE) { s <- substitute(x) x <- suppressWarnings(as.integer(x)) if (length(x) != 1L || anyNA(x) && !allow_na) { s <- deparse0(s, max_char = 100L) stop("Cannot coerce '", s, "' to a single integer value.", call. = FALSE) } x } #'@noRd deparse0 <- function(x, max_char = NULL, ...) { out <- collapse(deparse(x, ...)) if (isTRUE(max_char > 0)) { out <- substr(out, 1L, max_char) } out } #'@noRd collapse <- function(..., sep = "") { paste(..., sep = sep, collapse = "") } #'@noRd validate_silent <- function(silent) { silent <- as_one_integer(silent) if (silent < 0 || silent > 2) { stop("'silent' must be between 0 and 2.", call. = FALSE) } silent } #'@importFrom rlang warn #'@noRd validate_family = function(family, use_stan = TRUE) { if (is.character(family)) { if (family == 'beta') { family <- betar() } family <- try(eval(parse(text = family)), silent = TRUE) if (inherits(family, 'try-error')) { stop("family not recognized", call. = FALSE) } } if (is.function(family)) { family <- family() } if (is.null(family$family)) { stop("family not recognized", call. = FALSE) } if (!inherits(family, 'family')) { stop('family not recognized', call. = FALSE) } if (family$family == 'Beta regression') { family$family <- 'beta' } if (family$family == 'tweedie') { insight::check_if_installed( "tweedie", reason = 'to simulate from Tweedie distributions' ) } if ( !family$family %in% c('poisson', 'negative binomial', 'tweedie') & !use_stan ) { stop( 'JAGS only supports poisson(), nb() or tweedie() families', call. = FALSE ) } # Stan cannot support Tweedie if (use_stan & family$family == 'tweedie') { stop('Tweedie family not supported for stan', call. = FALSE) } if (family$family %in% c('binomial', 'beta_binomial')) { rlang::warn( paste0( "Binomial and Beta-binomial families require cbind(n_successes, n_trials)\n", "in the formula left-hand side. Do not use cbind(n_successes, n_failures)!" ), .frequency = "once", .frequency_id = 'cbind_binomials' ) } return(family) } #'@noRd validate_family_restrictions = function(response, family) { response <- response[!is.na(response)] # 0s and 1s only for Bernoulli if (family$family == 'bernoulli') { y <- response nobs <- length(response) weights <- rep(1, length(response)) eval(binomial()$initialize) } # 0s and 1s not allowed for Beta if (family$family == 'beta') { if (any(response <= 0)) { stop('Values <= 0 not allowed for beta responses', call. = FALSE) } if (any(response >= 1)) { stop('Values >= 1 not allowed for beta responses', call. = FALSE) } } # negatives not allowed for several families if ( family$family %in% c('poisson', 'negative binomial', 'tweedie', 'binomial', 'beta_binomial') ) { if (any(response < 0)) { stop( paste0('Values < 0 not allowed for count family responses'), call. = FALSE ) } } # negatives and/or zeros not allowed for several families if (family$family %in% c('lognormal', 'Gamma')) { if (any(response <= 0)) { stop( paste0('Values <= 0 not allowed for ', family$family, ' responses'), call. = FALSE ) } } } #'@noRd validate_trend_model = function( trend_model, drift = FALSE, noncentred = FALSE, warn = TRUE ) { if (inherits(trend_model, 'mvgam_trend')) { ma_term <- if (trend_model$ma) { 'MA' } else { NULL } cor_term <- if (trend_model$cor) { 'cor' } else { NULL } if (is.null(trend_model$gr)) { trend_model$gr <- 'NA' } if (trend_model$gr != 'NA') { gr_term <- 'hier' } else { gr_term <- NULL } trend_model <- paste0( trend_model$trend_model, trend_model$p, ma_term, gr_term, cor_term ) } else { if (trend_model != 'None' & warn) { rlang::warn( paste0( "Supplying trend_model as a character string is deprecated\n", "Please use the dedicated functions (i.e. RW() or ZMVN()) instead" ), .frequency = "once", .frequency_id = 'trend_characters' ) } } trend_model <- match.arg(arg = trend_model, choices = trend_model_choices()) if (trend_model == 'VAR') { trend_model <- 'VAR1' } if (trend_model == 'VARcor') { trend_model <- 'VAR1cor' } if (trend_model == 'VARhiercor') { trend_model <- 'VAR1hiercor' } if (trend_model %in% c('VARMA', 'VARMAcor')) { trend_model <- 'VARMA1,1cor' } if ( !trend_model %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'CAR1') & noncentred ) { message('Non-centering of trends currently not available for this model') } if (trend_model %in% c('PWlinear', 'PWlogistic')) { insight::check_if_installed( "extraDistr", reason = 'to simulate from piecewise trends' ) } return(trend_model) } #'@noRd validate_obs_formula = function(formula, data, refit = FALSE) { if (attr(terms(formula), "response") == 0L) { stop('response variable is missing from formula', call. = FALSE) } # Check that response terms are in the data; account for possible # 'cbind' in there if this is a binomial model resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(formula(formula))[[2]]) if (!as.character(terms(formula(formula))[[2]]) %in% names(data)) { stop( paste0('variable ', terms(formula(formula))[[2]], ' not found in data'), call. = FALSE ) } } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] out_name <- resp_terms[1] for (i in 1:length(resp_terms)) { if (!resp_terms[i] %in% names(data)) { stop( paste0('variable ', resp_terms[i], ' not found in data'), call. = FALSE ) } } } else { stop( 'Not sure how to deal with this response variable specification', call. = FALSE ) } } if (any(attr(terms(formula), 'term.labels') %in% 'y')) { stop( 'due to internal data processing, "y" should not be used as the name of a predictor in mvgam', call. = FALSE ) } # Add a y outcome for sending to the modelling backend data$y <- data[[out_name]] return(data) } #'@noRd validate_trend_formula = function(formula) { if (!is.null(rlang::f_lhs(formula))) { stop( 'Argument "trend_formula" should not have a left-hand side', call. = FALSE ) } if (any(grepl('series', as.character(formula)))) { stop( 'Argument "trend_formula" should not have the identifier "series" in it.\nUse "trend" instead for varying effects', call. = FALSE ) } if (!is.null(attr(terms(formula(formula)), 'offset'))) { stop('Offsets not allowed in argument "trend_formula"', call. = FALSE) } } #'@noRd validate_gr_subgr = function(gr, subgr, cor) { gr <- deparse0(substitute(gr)) subgr <- deparse0(substitute(subgr)) if (gr != 'NA') { if (subgr == 'NA') { stop( 'argument "subgr" must be supplied if "gr" is also supplied', call. = FALSE ) } } if (subgr != 'NA') { if (gr == 'NA') { stop( 'argument "gr" must be supplied if "subgr" is also supplied', call. = FALSE ) } else { cor <- TRUE } } list(.group = gr, .subgroup = subgr, .cor = cor) } #'@noRd validate_proportional = function(x) { s <- substitute(x) x <- base::suppressWarnings(as.numeric(x)) if (length(x) != 1L || anyNA(x)) { stop("Argument '", s, "' must be a single numeric value", call. = FALSE) } if (x < 0 || x > 1) { stop( "Argument '", s, "' must be a proportion ranging from 0 to 1, inclusive", call. = FALSE ) } } #'@noRd validate_equaldims = function(x, y) { s <- substitute(x) q <- substitute(y) if (NCOL(x) != NCOL(y)) { stop( "Argument '", s, "' and argument '", q, "' must have equal dimensions", call. = FALSE ) } if (NROW(x) != NROW(y)) { stop( "Argument '", s, "' and argument '", q, "' must have equal dimensions", call. = FALSE ) } } #'@noRd validate_pos_integer = function(x) { s <- substitute(x) x <- base::suppressWarnings(as.numeric(x)) if (length(x) != 1L || anyNA(x)) { stop("Argument '", s, "' must be a single numeric value", call. = FALSE) } if (sign(x) != 1) { stop("Argument '", s, "' must be a positive integer", call. = FALSE) } else { if (x %% 1 != 0) { stop("Argument '", s, "' must be a positive integer", call. = FALSE) } } } #'@noRd validate_pos_integers = function(x) { s <- substitute(x) val_pos = function(y, s) { y <- base::suppressWarnings(as.numeric(y)) if (sign(y) != 1) { stop("Negative values in ", s, " detected", call. = FALSE) } else { if (y %% 1 != 0) { stop("Non-integer values in ", s, " detected", call. = FALSE) } } } res <- lapply(seq_along(x), function(i) val_pos(x[i], s)) } #'@noRd validate_predictors = function(object, newdata) { # Check names of supplied variables against those required # for prediction required_vars <- insight::find_predictors( object, component = "all" )$conditional required_vars <- setdiff( required_vars, attr(object$obs_data, 'implicit_vars') ) # If time and / or series were in original data but not used, # there is no need for them in the newdata for models with # no trend_model if (!inherits(object$trend_model, 'mvgam_trend')) { if (object$trend_model == 'None') { # Find all predictor terms that were used by the model obs_preds <- insight::find_predictors( object$mgcv_model, effects = 'all', component = 'all', flatten = TRUE ) trend_preds <- vector() if (!is.null(object$trend_call)) { trend_preds <- insight::find_predictors( object$trend_mgcv_model, effects = 'all', component = 'all', flatten = TRUE ) preds <- unique(c(obs_preds, trend_preds)) } else { preds <- obs_preds } # If time and series not used as predictors, no need for them # to be in required_vars if ('time' %in% required_vars & !'time' %in% preds) { required_vars <- setdiff(required_vars, 'time') } if ( 'series' %in% required_vars & !'series' %in% preds & (!object$use_lv | is.null(object$trend_call)) ) { required_vars <- setdiff(required_vars, 'series') } } } if (length(required_vars)) { if (any(required_vars %in% names(newdata) == FALSE)) { stop( paste0( 'the following required variables are missing from newdata:\n ', paste( required_vars[which(!required_vars %in% names(newdata))], collapse = ', ' ) ), call. = FALSE ) } } } #'@noRd validate_even <- function(x) { s <- substitute(x) x <- base::suppressWarnings(as.numeric(x)) if (x %% 2 != 0) { stop("Argument '", s, "' must be an even integer", call. = FALSE) } } #'@noRd validate_pos_real = function(x) { s <- substitute(x) x <- base::suppressWarnings(as.numeric(x)) if (length(x) != 1L || anyNA(x)) { stop("Argument '", s, "' must be a single numeric value", call. = FALSE) } if (sign(x) != 1) { stop("Argument '", s, "' must be a positive real value", call. = FALSE) } } #'@noRd validate_trendmap = function(trend_map, data_train, trend_model, use_stan) { # Trend mapping not supported by JAGS if (!use_stan) { stop('trend mapping not available for JAGS', call. = FALSE) } # trend_map must have an entry for each unique time series if (!all(sort(trend_map$series) == sort(unique(data_train$series)))) { stop( 'Argument "trend_map" must have an entry for every unique time series in "data"', call. = FALSE ) } # trend_map must not specify a greater number of trends than there are series if (max(trend_map$trend) > length(unique(data_train$series))) { stop( 'Argument "trend_map" specifies more latent trends than there are series in "data"', call. = FALSE ) } # trend_map must not skip any trends, but can have zeros for some entries drop_zero = function(x) { x[x != 0] } if ( !all( drop_zero(sort(unique(trend_map$trend))) == seq(1:max(trend_map$trend)) ) ) { stop( 'Argument "trend_map" must link at least one series to each latent trend', call. = FALSE ) } # series variable must be a factor with same levels as the series variable # in the data if (!is.factor(trend_map$series)) { stop( 'trend_map$series must be a factor with levels matching levels of data$series', call. = FALSE ) } if (!all(levels(trend_map$series) == levels(data_train$series))) { stop( 'trend_map$series must be a factor with levels matching levels of data$series', call. = FALSE ) } } #'@noRd validate_trend_restrictions = function( trend_model, formula, trend_formula, trend_map, drift = FALSE, drop_obs_intercept = FALSE, use_lv = FALSE, n_lv, data_train, use_stan = TRUE, priors = FALSE ) { # Assess whether additional moving average or correlated errors are needed ma_cor_adds <- ma_cor_additions(trend_model) list2env(ma_cor_adds, envir = environment()) if (length(unique(data_train$series)) == 1 & add_cor) { warning( 'Correlated process errors not possible with only 1 series', call. = FALSE ) add_cor <- FALSE } # Some checks on general trend setup restrictions if (!priors) { if (trend_model %in% c('PWlinear', 'PWlogistic')) { if (attr(terms(formula), 'intercept') == 1 & !drop_obs_intercept) { warning( paste0( 'It is difficult / impossible to estimate intercepts\n', 'and piecewise trend offset parameters. You may want to\n', 'consider dropping the intercept from the formula' ), call. = FALSE ) } if (use_lv) { stop( 'Cannot estimate piecewise trends using dynamic factors', call. = FALSE ) } } } if (use_lv & (add_ma | add_cor) & missing(trend_formula)) { stop( 'Cannot estimate moving averages or correlated errors for dynamic factors', call. = FALSE ) } if (use_lv & drift) { warning( 'Cannot identify drift terms for this model\ninclude "time" as a fixed effect instead', call. = FALSE ) drift <- FALSE } if (drift && trend_model == 'CAR1') { warning( 'Cannot identify drift terms for CAR models; setting "drift = FALSE"', call. = FALSE ) drift <- FALSE } if ( trend_model %in% c('VAR', 'VAR1', 'VAR1cor', 'VARMA1,1cor', 'GP') & drift ) { warning( 'Cannot identify drift terms for VAR or GP models; setting "drift = FALSE"', call. = FALSE ) drift <- FALSE } if (use_lv & trend_model == 'VAR1' & missing(trend_formula)) { stop( 'Cannot identify dynamic factor models that evolve as VAR processes', call. = FALSE ) } if (!use_stan & trend_model %in% c('GP', 'VAR1', 'PWlinear', 'PWlogistic')) { stop( 'Gaussian Process, VAR and piecewise trends not supported for JAGS', call. = FALSE ) } # Check trend formula and create the trend_map if missing if (!missing(trend_formula)) { validate_trend_formula(trend_formula) if (missing(trend_map)) { trend_map <- data.frame( series = factor( levels(data_train$series), levels = levels(data_train$series) ), trend = 1:length(unique(data_train$series)) ) } if ( !trend_model %in% c('None', 'RW', 'AR1', 'AR2', 'AR3', 'VAR1', 'CAR1', 'ZMVN') ) { stop( 'only None, ZMVN, RW, AR1, AR2, AR3, CAR1 and VAR trends currently supported for trend predictor models', call. = FALSE ) } } # Check trend_map is correctly specified if (!missing(trend_map)) { validate_trendmap( trend_map = trend_map, data_train = data_train, trend_model = trend_model, use_stan = use_stan ) # If trend_map correctly specified, set use_lv to TRUE for # most models (but not yet for VAR models, which require additional # modifications) if (trend_model == 'VAR1') { use_lv <- FALSE } else { use_lv <- TRUE } n_lv <- max(trend_map$trend) } # Number of latent variables cannot be greater than number of series if (use_lv) { if (missing(n_lv)) { n_lv <- min(2, floor(length(unique(data_train$series)) / 2)) } if (n_lv > length(unique(data_train$series))) { stop('number of latent variables cannot be greater than number of series') } } if (missing(trend_map)) { trend_map <- NULL } if (missing(n_lv)) { n_lv <- NULL } return(list( trend_model = trend_model, add_cor = add_cor, add_ma = add_ma, use_var1 = use_var1, use_var1cor = use_var1cor, use_lv = use_lv, n_lv = n_lv, trend_map = trend_map, drift = drift )) } #'@noRd check_priorsim = function(prior_simulation, data_train, orig_y, formula) { # Fill y with NAs if this is a simulation from the priors if (prior_simulation) { data_train$y <- rep(NA, length(data_train$y)) } else { data_train$y <- orig_y } # Fill response variable with original supplied values resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { out_name <- as.character(terms(formula(formula))[[2]]) } else { if (any(grepl('cbind', resp_terms))) { resp_terms <- resp_terms[-grepl('cbind', resp_terms)] out_name <- resp_terms[1] } } data_train[[out_name]] <- orig_y return(data_train) } #'@noRd check_gp_terms = function(formula, data_train, family) { # Check for proper binomial specification if (!missing(family)) { if (is.character(family)) { if (family == 'beta') { family <- betar() } family <- try(eval(parse(text = family)), silent = TRUE) if (inherits(family, 'try-error')) { stop("family not recognized", call. = FALSE) } } if (is.function(family)) { family <- family() } if (family$family %in% c('binomial', 'beta_binomial')) { # Check that response terms use the cbind() syntax resp_terms <- as.character(terms(formula(formula))[[2]]) if (length(resp_terms) == 1) { stop( 'Binomial family requires cbind() syntax in the formula left-hand side', call. = FALSE ) } else { if (any(grepl('cbind', resp_terms))) {} else { stop( 'Binomial family requires cbind() syntax in the formula left-hand side', call. = FALSE ) } } } } # Check for gp terms in the validated formula orig_formula <- gp_terms <- gp_details <- NULL if (any(grepl('gp(', attr(terms(formula), 'term.labels'), fixed = TRUE))) { formula <- interpret_mvgam( formula, N = max(data_train$time), family = family ) orig_formula <- formula # Keep intercept? keep_intercept <- attr(terms(formula), 'intercept') == 1 # Indices of gp() terms in formula gp_terms <- which_are_gp(formula) # Extract attributes gp_details <- get_gp_attributes(formula, data_train, family) # Replace with s() terms so the correct terms are included # in the model.frame formula <- gp_to_s(formula, data_train, family) if (!keep_intercept) formula <- update(formula, . ~ . - 1) } return(list( orig_formula = orig_formula, gp_terms = gp_terms, formula = formula, gp_details = gp_details )) } #'@noRd check_obs_intercept = function(formula, orig_formula) { # Check for missing rhs in formula # If there are no terms in the observation formula (i.e. y ~ -1), # we will use an intercept-only observation formula and fix # the intercept coefficient at zero drop_obs_intercept <- FALSE if ( length(attr(terms(formula), 'term.labels')) == 0 & !attr(terms(formula), 'intercept') == 1 ) { formula_envir <- attr(formula, '.Environment') if (length(attr(terms(formula), 'factors')) == 0) { resp <- as.character(attr(terms(formula), 'variables'))[2] } else { resp <- dimnames(attr(terms(formula), 'factors'))[[1]][1] } if (!is.null(attr(terms(formula(formula)), 'offset'))) { formula <- formula(paste0( resp, ' ~ ', paste(gsub(' - 1', ' + 1', rlang::f_text(formula))) )) } else { formula <- formula(paste(resp, '~ 1')) } attr(formula, '.Environment') <- formula_envir drop_obs_intercept <- TRUE } if (is.null(orig_formula)) { orig_formula <- formula } return(list( orig_formula = orig_formula, formula = formula, drop_obs_intercept = drop_obs_intercept )) } #'@noRd check_nmix = function( family, family_char, trend_formula, trend_model, trend_map, data_train, priors = FALSE ) { # Check for N-mixture modifications add_nmix <- FALSE nmix_trendmap <- TRUE if (family_char == 'nmix') { if (!(exists('cap', where = data_train))) { stop( 'Max abundances must be supplied as a variable named "cap" for N-mixture models', call. = FALSE ) } add_nmix <- TRUE if (!priors) { family <- poisson() family_char <- 'poisson' if (missing(trend_formula)) { stop('Argument "trend_formula" required for nmix models', call. = FALSE) } } if (!missing(trend_map)) { nmix_trendmap <- TRUE } use_lv <- TRUE if (trend_model == 'None') { trend_model <- 'RW' } } return(list( trend_model = trend_model, add_nmix = add_nmix, nmix_trendmap = nmix_trendmap, family = family, family_char = family_char )) } #'@noRd validate_threads = function(family_char, threads) { if ( threads > 1 & !family_char %in% c( 'poisson', 'negative binomial', 'gaussian', 'lognormal', 'beta', 'student', 'Gamma' ) ) { warning( 'multithreading not yet supported for this family; setting threads = 1' ) threads <- 1 } return(threads) } #'@noRd find_jags = function(jags_path) { if (!requireNamespace('runjags', quietly = TRUE)) { stop('runjags library is required but not found', call. = FALSE) } if (missing(jags_path)) { requireNamespace('runjags', quietly = TRUE) jags_path <- runjags::findjags() } # Code borrowed from the runjags package jags_status <- runjags::testjags(jags_path, silent = TRUE) if (!jags_status$JAGS.available) { if (jags_status$os == "windows") { Sys.sleep(0.2) jags_status <- runjags::testjags(jags_path, silent = TRUE) } if (!jags_status$JAGS.available) { cat( "Unable to call JAGS using '", jags_path, "'\nTry specifying the path to the JAGS binary as jags_path argument, or re-installing the rjags package.\nUse the runjags::testjags() function for more detailed diagnostics.\n", sep = "" ) stop( "Unable to call JAGS.\nEither use the Stan backend or follow examples in ?mvgam to generate data / model files and run outside of mvgam", call. = FALSE ) } } } #'@noRd find_stan = function() { if (!requireNamespace('rstan', quietly = TRUE)) { warning('rstan library not found; checking for cmdstanr library') if (!requireNamespace('cmdstanr', quietly = TRUE)) { stop('cmdstanr library not found', call. = FALSE) } } } #'@noRd as_one_character <- function(x, allow_na = FALSE) { s <- substitute(x) x <- as.character(x) if (length(x) != 1L || anyNA(x) && !allow_na) { s <- deparse0(s, max_char = 100L) stop("Cannot coerce '", s, "' to a single character value.", call. = FALSE) } x } ================================================ FILE: README.Rmd ================================================ --- output: github_document --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", dev = "png", dpi = 150, fig.height = 6, fig.width = 9, out.width = "100%" ) ``` mvgam R package logo[Stan Logo](https://mc-stan.org/) # mvgam > **M**ulti**V**ariate (Dynamic) **G**eneralized **A**dditive **M**odels [![R-CMD-check](https://github.com/nicholasjclark/mvgam/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nicholasjclark/mvgam/actions/) [![Coverage status](https://codecov.io/gh/nicholasjclark/mvgam/graph/badge.svg?token=RCJ2B7S0BL)](https://app.codecov.io/gh/nicholasjclark/mvgam) [![Documentation](https://img.shields.io/badge/documentation-mvgam-orange.svg?colorB=brightgreen)](https://nicholasjclark.github.io/mvgam/) [![Methods in Ecology & Evolution](https://img.shields.io/badge/Methods%20in%20Ecology%20&%20Evolution-14,%20771–784-blue.svg)](https://doi.org/10.1111/2041-210X.13974) [![CRAN Version](https://www.r-pkg.org/badges/version/mvgam)](https://cran.r-project.org/package=mvgam) [![CRAN Downloads](https://cranlogs.r-pkg.org/badges/grand-total/mvgam?color=brightgreen)](https://cran.r-project.org/package=mvgam) The `mvgam` 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs) that can include highly flexible nonlinear predictor effects, latent variables and multivariate time series models. The package does this by relying on functionalities from the impressive [`brms`](https://paulbuerkner.com/brms/){target="_blank"} and [`mgcv`](https://cran.r-project.org/package=mgcv){target="_blank"} packages. Parameters are estimated using the probabilistic programming language [`Stan`](https://mc-stan.org/), giving users access to the most advanced Bayesian inference algorithms available. This allows `mvgam` to fit a very wide range of models, including: * [Multivariate State-Space Time Series Models](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html){target="_blank"} * [Continuous-Time Autoregressive Time Series Models](https://nicholasjclark.github.io/mvgam/reference/RW.html#ref-examples){target="_blank"} * [Shared Signal Time Series Models](https://nicholasjclark.github.io/mvgam/articles/shared_states.html){target="_blank"} * [Dynamic Factor Models](https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html){target="_blank"} * [Hierarchical N-mixture Models](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html){target="_blank"} * [Hierarchical Generalized Additive Models](https://www.youtube.com/watch?v=2POK_FVwCHk){target="_blank"} * [Joint Species Distribution Models](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html){target="_blank"} ## Installation You can install the stable package version from `CRAN` using: `install.packages('mvgam')`, or install the latest development version using: `devtools::install_github("nicholasjclark/mvgam")`. You will also need a working version of `Stan` installed (along with either `rstan` and/or `cmdstanr`). Please refer to installation links for `Stan` with `rstan` [here](https://mc-stan.org/users/interfaces/rstan){target="_blank"}, or for `Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/){target="_blank"}. ## Cheatsheet [![`mvgam` usage cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.png)](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) ## A simple example We can explore the package’s primary functions using one of it's built-in datasets. Use `plot_mvgam_series()` to inspect features for time series from [the Portal Project](https://portal.weecology.org/){target="_blank"}, which represent counts of baited captures for four desert rodent species over time (see `?portal_data` for more details about the dataset). ```{r include = FALSE} library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12)) ``` ```{r echo = FALSE} library(mvgam) ``` ```{r, fig.alt = "Visualizing multivariate time series in R using mvgam", warning=FALSE} data(portal_data) plot_mvgam_series( data = portal_data, y = 'captures', series = 'all' ) plot_mvgam_series( data = portal_data, y = 'captures', series = 1 ) plot_mvgam_series( data = portal_data, y = 'captures', series = 4 ) ``` These plots show that the time series are count responses, with missing data, many zeroes, seasonality and temporal autocorrelation all present. These features make time series analysis and forecasting very difficult using conventional software. But `mvgam` shines in these tasks. For most forecasting exercises, we'll want to split the data into training and testing folds: ```{r} data_train <- portal_data %>% dplyr::filter(time <= 60) data_test <- portal_data %>% dplyr::filter(time > 60 & time <= 65) ``` Formulate an `mvgam` model; this model fits a State-Space GAM in which each species has its own intercept, linear association with `ndvi_ma12` and potentially nonlinear association with `mintemp`. These effects are estimated jointly with a full time series model for the temporal dynamics (in this case a Vector Autoregressive process). We assume the outcome follows a Poisson distribution and will condition the model in `Stan` using MCMC sampling with `Cmdstan`: ```{r, include=FALSE} mod <- mvgam( # Observation model is empty as we don't have any # covariates that impact observation error formula = captures ~ 0, # Process model contains varying intercepts, # varying slopes of ndvi_ma12 and varying smooths # of mintemp for each series. # Temporal dynamics are modelled with a Vector # Autoregression (VAR(1)) trend_formula = ~ trend + s(trend, bs = 're', by = ndvi_ma12) + s(mintemp, bs = 'bs', by = trend) - 1, trend_model = VAR(cor = TRUE), # Obvservations are conditionally Poisson family = poisson(), priors = c(prior(normal(0, 2), class = b), prior(exponential(2.5), class = sigma)), # Condition on the training data data = data_train, control = list(adapt_delta = 0.99), burnin = 1500 ) ``` ```{r, eval=FALSE} mod <- mvgam( # Observation model is empty as we don't have any # covariates that impact observation error formula = captures ~ 0, # Process model contains varying intercepts, # varying slopes of ndvi_ma12 and varying smooths # of mintemp for each series. # Temporal dynamics are modelled with a Vector # Autoregression (VAR(1)) trend_formula = ~ trend + s(trend, bs = 're', by = ndvi_ma12) + s(mintemp, bs = 'bs', by = trend) - 1, trend_model = VAR(cor = TRUE), # Obvservations are conditionally Poisson family = poisson(), # Condition on the training data data = data_train, backend = 'cmdstanr' ) ``` Using `print()` returns a quick summary of the object: ```{r} mod ``` Split Rhat and Effective Sample Size diagnostics show good convergence of model estimates ```{r, fig.alt = "Rhats of parameters estimated with Stan in mvgam", warning=FALSE} mcmc_plot(mod, type = 'rhat_hist') ``` ```{r, fig.alt = "Effective sample sizes of parameters estimated with Stan in mvgam", warning=FALSE} mcmc_plot(mod, type = 'neff_hist') ``` Use `conditional_effects()` for a quick visualisation of the main terms in model formulae ```{r, fig.alt = "Plotting GAM effects in mvgam and R"} conditional_effects(mod, type = 'link') ``` If you have the `gratia` package installed, it can also be used to plot partial effects of smooths ```{r, fig.alt = "Plotting GAM smooth functions in mvgam using gratia", message=FALSE} require(gratia) draw(mod, trend_effects = TRUE) ``` Or design more targeted plots using `plot_predictions()` from the `marginaleffects` package ```{r, fig.alt = "Using marginaleffects and mvgam to plot GAM smooth functions in R"} plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series'), type = 'link' ) ``` ```{r, fig.alt = "Using marginaleffects and mvgam to plot GAM smooth functions in R"} plot_predictions( mod, condition = c('mintemp', 'series', 'series'), type = 'link' ) ``` We can also view the model's posterior predictions for the entire series (testing and training). Forecasts can be scored using a range of proper scoring rules. See `?score.mvgam_forecast` for more details ```{r, fig.alt = "Plotting forecast distributions using mvgam in R", warning=FALSE} fcs <- forecast(mod, newdata = data_test) plot(fcs, series = 1) + plot(fcs, series = 2) + plot(fcs, series = 3) + plot(fcs, series = 4) ``` For Vector Autoregressions fit in `mvgam`, we can inspect [impulse response functions and forecast error variance decompositions](https://ecogambler.netlify.app/blog/vector-autoregressions/#impulse-response-functions){target="_blank"}. The `irf()` function runs an Impulse Response Function (IRF) simulation whereby a positive “shock” is generated for a target process at time `t = 0`. All else remaining stable, it then monitors how each of the remaining processes in the latent VAR would be expected to respond over the forecast horizon `h`. The function computes impulse responses for all processes in the object and returns them in an array that can be plotted using the S3 `plot()` function. Here we will use the generalized IRF, which makes no assumptions about the order in which the series appear in the VAR process, and inspect how each process is expected to respond to a sudden, positive pulse from the other processes over a horizon of 12 timepoints. ```{r, fig.alt = "Impulse response functions computed using mvgam in R"} irfs <- irf(mod, h = 12, orthogonal = FALSE) plot(irfs, series = 1) plot(irfs, series = 3) ``` Using the same logic as above, we can inspect forecast error variance decompositions (FEVDs) for each process using`fevd()`. This type of analysis asks how orthogonal shocks to all process in the system contribute to the variance of forecast uncertainty for a focal process over increasing horizons. In other words, the proportion of the forecast variance of each latent time series can be attributed to the effects of the other series in the VAR process. FEVDs are useful because some shocks may not be expected to cause variations in the short-term but may cause longer-term fluctuations ```{r, fig.alt = "Forecast error variance decompositions computed using mvgam in R"} fevds <- fevd(mod, h = 12) plot(fevds) ``` This plot shows that the variance of forecast uncertainty for each process is initially dominated by contributions from that same process (i.e. self-dependent effects) but that effects from other processes become more important over increasing forecast horizons. Given what we saw from the IRF plots above, these long-term contributions from interactions among the processes makes sense. Plotting randomized quantile residuals over `time` for each series can give useful information about what might be missing from the model. We can use the highly versatile `pp_check()` function to plot these: ```{r, warning=FALSE} pp_check( mod, type = 'resid_ribbon_grouped', group = 'series', x = 'time', ndraws = 200 ) ``` When describing the model, it can be helpful to use the `how_to_cite()` function to generate a scaffold for describing the model and sampling details in scientific communications ```{r} description <- how_to_cite(mod) ``` ```{r, eval = FALSE} description ``` ```{r, echo=FALSE} cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ``` ```{r echo=FALSE} cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ``` The post-processing methods we have shown above are just the tip of the iceberg. For a full list of methods to apply on fitted model objects, type `methods(class = "mvgam")`. ## Extended observation families `mvgam` was originally designed to analyse and forecast non-negative integer-valued data. But further development of `mvgam` has resulted in support for a growing number of observation families. Currently, the package can handle data for the following: * `gaussian()` for real-valued data * `student_t()` for heavy-tailed real-valued data * `lognormal()` for non-negative real-valued data * `Gamma()` for non-negative real-valued data * `betar()` for proportional data on `(0,1)` * `bernoulli()` for binary data * `poisson()` for count data * `nb()` for overdispersed count data * `binomial()` for count data with known number of trials * `beta_binomial()` for overdispersed count data with known number of trials * `nmix()` for count data with imperfect detection (unknown number of trials) See `??mvgam_families` for more information. Below is a simple example for simulating and modelling proportional data with `Beta` observations over a set of seasonal series with independent Gaussian Process dynamic trends: ```{r beta_sim, message=FALSE, warning=FALSE} set.seed(100) data <- sim_mvgam( family = betar(), T = 80, trend_model = GP(), prop_trend = 0.5, seasonality = "shared" ) plot_mvgam_series( data = data$data_train, series = "all" ) ``` ```{r, include=FALSE} mod <- mvgam( y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar() ) ``` ```{r, eval=FALSE} mod <- mvgam( y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar() ) ``` Inspect the summary to see that the posterior now also contains estimates for the `Beta` precision parameters $\phi$. ```{r} summary(mod, include_betas = FALSE) ``` Plot the hindcast and forecast distributions for each series ```{r eval=FALSE} library(patchwork) fc <- forecast(mod) wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2 ) ``` ```{r beta_fc, echo=FALSE, message=FALSE} library(patchwork) fc <- forecast(mod) wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2 ) ``` There are many more extended uses of `mvgam`, including the ability to fit hierarchical State-Space GAMs that include dynamic and spatially varying coefficient models, dynamic factors, Joint Species Distribution Models and much more. See the [package documentation](https://nicholasjclark.github.io/mvgam/){target="_blank"} for more details. `mvgam` can also be used to generate all necessary data structures and modelling code necessary to fit DGAMs using `Stan`. This can be helpful if users wish to make changes to the model to better suit their own bespoke research / analysis goals. The [`Stan` Discourse](https://discourse.mc-stan.org/){target="_blank"} is a helpful place to troubleshoot. ## Citing `mvgam` and related software When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work (but feel free to ⭐ this repo as well). When using `mvgam`, please cite the following: > Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. *Methods in Ecology and Evolution*. DOI: https://doi.org/10.1111/2041-210X.13974 As `mvgam` acts as an interface to `Stan`, please additionally cite: > Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan: A probabilistic programming language. *Journal of Statistical Software*. 76(1). DOI: https://doi.org/10.18637/jss.v076.i01 `mvgam` relies on several other `R` packages and, of course, on `R` itself. Use `how_to_cite()` to simplify the process of finding appropriate citations for your software setup. ## Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on [GitHub](https://github.com/nicholasjclark/mvgam/issues). Please also feel free to use the [`mvgam` Discussion Board](https://github.com/nicholasjclark/mvgam/discussions) to hunt for or post other discussion topics related to the package, and do check out the [`mvgam` Changelog](https://nicholasjclark.github.io/mvgam/news/index.html) for any updates about recent upgrades that the package has incorporated. ## Other resources A series of [vignettes cover data formatting, forecasting and several extended case studies of DGAMs](https://nicholasjclark.github.io/mvgam/){target="_blank"}. A number of other examples, including some step-by-step introductory webinars, have also been compiled: * [Time series in R and Stan using the `mvgam` package](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3){target="_blank"} * [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"} * [Distributed lags (and hierarchical distributed lags) using `mgcv` and `mvgam`](https://ecogambler.netlify.app/blog/distributed-lags-mgcv/){target="_blank"} * [State-Space Vector Autoregressions in `mvgam`](https://ecogambler.netlify.app/blog/vector-autoregressions/){target="_blank"} * [Ecological Forecasting with Dynamic GAMs; a tutorial and detailed case study](https://www.youtube.com/watch?v=RwllLjgPUmM){target="_blank"} * [Incorporating time-varying seasonality in forecast models](https://ecogambler.netlify.app/blog/time-varying-seasonality/){target="_blank"} ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark'at'uq.edu.au). Other contributions are also very welcome, but please see [The Contributor Instructions](https://github.com/nicholasjclark/mvgam/blob/master/.github/CONTRIBUTING.md) for general guidelines. Note that by participating in this project you agree to abide by the terms of its [Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). ## License The `mvgam` project is licensed under an `MIT` open source license ================================================ FILE: README.md ================================================ mvgam R package logo[Stan Logo](https://mc-stan.org/) # mvgam > **M**ulti**V**ariate (Dynamic) **G**eneralized **A**dditive **M**odels [![R-CMD-check](https://github.com/nicholasjclark/mvgam/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nicholasjclark/mvgam/actions/) [![Coverage status](https://codecov.io/gh/nicholasjclark/mvgam/graph/badge.svg?token=RCJ2B7S0BL)](https://app.codecov.io/gh/nicholasjclark/mvgam) [![Documentation](https://img.shields.io/badge/documentation-mvgam-orange.svg?colorB=brightgreen)](https://nicholasjclark.github.io/mvgam/) [![Methods in Ecology & Evolution](https://img.shields.io/badge/Methods%20in%20Ecology%20&%20Evolution-14,%20771–784-blue.svg)](https://doi.org/10.1111/2041-210X.13974) [![CRAN Version](https://www.r-pkg.org/badges/version/mvgam)](https://cran.r-project.org/package=mvgam) [![CRAN Downloads](https://cranlogs.r-pkg.org/badges/grand-total/mvgam?color=brightgreen)](https://cran.r-project.org/package=mvgam) The `mvgam` 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs) that can include highly flexible nonlinear predictor effects, latent variables and multivariate time series models. The package does this by relying on functionalities from the impressive brms and mgcv packages. Parameters are estimated using the probabilistic programming language [`Stan`](https://mc-stan.org/), giving users access to the most advanced Bayesian inference algorithms available. This allows `mvgam` to fit a very wide range of models, including: - Multivariate State-Space Time Series Models - Continuous-Time Autoregressive Time Series Models - Shared Signal Time Series Models - Dynamic Factor Models - Hierarchical N-mixture Models - Hierarchical Generalized Additive Models - Joint Species Distribution Models ## Installation You can install the stable package version from `CRAN` using: `install.packages('mvgam')`, or install the latest development version using: `devtools::install_github("nicholasjclark/mvgam")`. You will also need a working version of `Stan` installed (along with either `rstan` and/or `cmdstanr`). Please refer to installation links for `Stan` with `rstan` here, or for `Stan` with `cmdstandr` here. ## Cheatsheet [![`mvgam` usage cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.png)](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) ## A simple example We can explore the package’s primary functions using one of it’s built-in datasets. Use `plot_mvgam_series()` to inspect features for time series from the Portal Project, which represent counts of baited captures for four desert rodent species over time (see `?portal_data` for more details about the dataset). data(portal_data) plot_mvgam_series( data = portal_data, y = 'captures', series = 'all' ) Visualizing multivariate time series in R using mvgam plot_mvgam_series( data = portal_data, y = 'captures', series = 1 ) Visualizing multivariate time series in R using mvgam plot_mvgam_series( data = portal_data, y = 'captures', series = 4 ) Visualizing multivariate time series in R using mvgam These plots show that the time series are count responses, with missing data, many zeroes, seasonality and temporal autocorrelation all present. These features make time series analysis and forecasting very difficult using conventional software. But `mvgam` shines in these tasks. For most forecasting exercises, we’ll want to split the data into training and testing folds: data_train <- portal_data %>% dplyr::filter(time <= 60) data_test <- portal_data %>% dplyr::filter(time > 60 & time <= 65) Formulate an `mvgam` model; this model fits a State-Space GAM in which each species has its own intercept, linear association with `ndvi_ma12` and potentially nonlinear association with `mintemp`. These effects are estimated jointly with a full time series model for the temporal dynamics (in this case a Vector Autoregressive process). We assume the outcome follows a Poisson distribution and will condition the model in `Stan` using MCMC sampling with `Cmdstan`: mod <- mvgam( # Observation model is empty as we don't have any # covariates that impact observation error formula = captures ~ 0, # Process model contains varying intercepts, # varying slopes of ndvi_ma12 and varying smooths # of mintemp for each series. # Temporal dynamics are modelled with a Vector # Autoregression (VAR(1)) trend_formula = ~ trend + s(trend, bs = 're', by = ndvi_ma12) + s(mintemp, bs = 'bs', by = trend) - 1, trend_model = VAR(cor = TRUE), # Obvservations are conditionally Poisson family = poisson(), # Condition on the training data data = data_train, backend = 'cmdstanr' ) Using `print()` returns a quick summary of the object: mod #> GAM observation formula: #> captures ~ 1 #> #> GAM process formula: #> ~trend + s(trend, bs = "re", by = ndvi_ma12) + s(mintemp, bs = "bs", #> by = trend) - 1 #> #> Family: #> poisson #> #> Link function: #> log #> #> Trend model: #> VAR(cor = TRUE) #> #> #> N latent factors: #> 4 #> #> N series: #> 4 #> #> N timepoints: #> 60 #> #> Status: #> Fitted using Stan #> 4 chains, each with iter = 2000; warmup = 1500; thin = 1 #> Total post-warmup draws = 2000 Split Rhat and Effective Sample Size diagnostics show good convergence of model estimates mcmc_plot(mod, type = 'rhat_hist') #> `stat_bin()` using `bins = 30`. Pick better value `binwidth`. Rhats of parameters estimated with Stan in mvgam mcmc_plot(mod, type = 'neff_hist') #> `stat_bin()` using `bins = 30`. Pick better value `binwidth`. Effective sample sizes of parameters estimated with Stan in mvgam Use `conditional_effects()` for a quick visualisation of the main terms in model formulae conditional_effects(mod, type = 'link') Plotting GAM effects in mvgam and RPlotting GAM effects in mvgam and RPlotting GAM effects in mvgam and R If you have the `gratia` package installed, it can also be used to plot partial effects of smooths require(gratia) draw(mod, trend_effects = TRUE) Plotting GAM smooth functions in mvgam using gratia Or design more targeted plots using `plot_predictions()` from the `marginaleffects` package plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series'), type = 'link' ) Using marginaleffects and mvgam to plot GAM smooth functions in R plot_predictions( mod, condition = c('mintemp', 'series', 'series'), type = 'link' ) Using marginaleffects and mvgam to plot GAM smooth functions in R We can also view the model’s posterior predictions for the entire series (testing and training). Forecasts can be scored using a range of proper scoring rules. See `?score.mvgam_forecast` for more details fcs <- forecast(mod, newdata = data_test) plot(fcs, series = 1) + plot(fcs, series = 2) + plot(fcs, series = 3) + plot(fcs, series = 4) #> Out of sample DRPS: #> 8.451467 #> Out of sample DRPS: #> 5.168817 #> Out of sample DRPS: #> 8.52922325 #> Out of sample DRPS: #> 3.60317975 Plotting forecast distributions using mvgam in R For Vector Autoregressions fit in `mvgam`, we can inspect impulse response functions and forecast error variance decompositions. The `irf()` function runs an Impulse Response Function (IRF) simulation whereby a positive “shock” is generated for a target process at time `t = 0`. All else remaining stable, it then monitors how each of the remaining processes in the latent VAR would be expected to respond over the forecast horizon `h`. The function computes impulse responses for all processes in the object and returns them in an array that can be plotted using the S3 `plot()` function. Here we will use the generalized IRF, which makes no assumptions about the order in which the series appear in the VAR process, and inspect how each process is expected to respond to a sudden, positive pulse from the other processes over a horizon of 12 timepoints. irfs <- irf(mod, h = 12, orthogonal = FALSE) plot(irfs, series = 1) Impulse response functions computed using mvgam in R plot(irfs, series = 3) Impulse response functions computed using mvgam in R Using the same logic as above, we can inspect forecast error variance decompositions (FEVDs) for each process using`fevd()`. This type of analysis asks how orthogonal shocks to all process in the system contribute to the variance of forecast uncertainty for a focal process over increasing horizons. In other words, the proportion of the forecast variance of each latent time series can be attributed to the effects of the other series in the VAR process. FEVDs are useful because some shocks may not be expected to cause variations in the short-term but may cause longer-term fluctuations fevds <- fevd(mod, h = 12) plot(fevds) Forecast error variance decompositions computed using mvgam in R This plot shows that the variance of forecast uncertainty for each process is initially dominated by contributions from that same process (i.e. self-dependent effects) but that effects from other processes become more important over increasing forecast horizons. Given what we saw from the IRF plots above, these long-term contributions from interactions among the processes makes sense. Plotting randomized quantile residuals over `time` for each series can give useful information about what might be missing from the model. We can use the highly versatile `pp_check()` function to plot these: pp_check( mod, type = 'resid_ribbon_grouped', group = 'series', x = 'time', ndraws = 200 ) When describing the model, it can be helpful to use the `how_to_cite()` function to generate a scaffold for describing the model and sampling details in scientific communications description <- how_to_cite(mod) description #> Methods text skeleton #> We used the R package mvgam (version 1.1.594; Clark & Wells, 2023) to #> construct, fit and interrogate the model. mvgam fits Bayesian #> State-Space models that can include flexible predictor effects in both #> the process and observation components by incorporating functionalities #> from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan, #> 2023) packages. To encourage stability and prevent forecast variance #> from increasing indefinitely, we enforced stationarity of the Vector #> Autoregressive process following methods described by Heaps (2023) and #> Clark et al. (2025). The mvgam-constructed model and observed data were #> passed to the probabilistic programming environment Stan (version #> 2.38.0; Carpenter et al. 2017, Stan Development Team 2026), specifically #> through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4 #> Hamiltonian Monte Carlo chains for 1500 warmup iterations and 500 #> sampling iterations for joint posterior estimation. Rank normalized #> split Rhat (Vehtari et al. 2021) and effective sample sizes were used to #> monitor convergence. #> #> Primary references #> Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models #> (DGAMs) for forecasting discrete ecological time series. Methods in #> Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974 #> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models #> Using Stan. Journal of Statistical Software, 80(1), 1-28. #> doi:10.18637/jss.v080.i01 #> Wood, SN (2017). Generalized Additive Models: An Introduction with R #> (2nd edition). Chapman and Hall/CRC. #> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R #> Package splines2. Journal of Data Science, 19(3), 498-517. #> doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020. #> Heaps, SE (2023). Enforcing stationarity through the prior in vector #> autoregressions. Journal of Computational and Graphical Statistics 32, #> 74-83. #> Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM, #> Karunarathna KANK (2025). Beyond single-species models: leveraging #> multispecies forecasts to navigate the dynamics of ecological #> predictability. PeerJ 13:e18929. #> Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, #> Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic #> programming language. Journal of Statistical Software 76. #> Gabry J, Cesnovar R, Johnson A, and Bronder S (2026). cmdstanr: R #> Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, #> https://discourse.mc-stan.org. #> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021). #> Rank-normalization, folding, and localization: An improved Rhat for #> assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2) #> 667-718. https://doi.org/10.1214/20-BA1221. #> #> Other useful references #> Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret #> statistical models using marginaleffects for R and Python. Journal of #> Statistical Software, 111(9), 1-32. #> https://doi.org/10.18637/jss.v111.i09 #> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). #> Visualization in Bayesian workflow. Journal of the Royal Statatistical #> Society A, 182, 389-402. doi:10.1111/rssa.12378. #> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model #> evaluation using leave-one-out cross-validation and WAIC. Statistics and #> Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4. #> Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out #> cross-validation for Bayesian time series models. Journal of Statistical #> Computation and Simulation, 90(14), 2499-2523. #> https://doi.org/10.1080/00949655.2020.1783262 The post-processing methods we have shown above are just the tip of the iceberg. For a full list of methods to apply on fitted model objects, type `methods(class = "mvgam")`. ## Extended observation families `mvgam` was originally designed to analyse and forecast non-negative integer-valued data. But further development of `mvgam` has resulted in support for a growing number of observation families. Currently, the package can handle data for the following: - `gaussian()` for real-valued data - `student_t()` for heavy-tailed real-valued data - `lognormal()` for non-negative real-valued data - `Gamma()` for non-negative real-valued data - `betar()` for proportional data on `(0,1)` - `bernoulli()` for binary data - `poisson()` for count data - `nb()` for overdispersed count data - `binomial()` for count data with known number of trials - `beta_binomial()` for overdispersed count data with known number of trials - `nmix()` for count data with imperfect detection (unknown number of trials) See `??mvgam_families` for more information. Below is a simple example for simulating and modelling proportional data with `Beta` observations over a set of seasonal series with independent Gaussian Process dynamic trends: set.seed(100) data <- sim_mvgam( family = betar(), T = 80, trend_model = GP(), prop_trend = 0.5, seasonality = "shared" ) plot_mvgam_series( data = data$data_train, series = "all" ) mod <- mvgam( y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar() ) Inspect the summary to see that the posterior now also contains estimates for the `Beta` precision parameters *ϕ*. summary(mod, include_betas = FALSE) #> GAM formula: #> y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, #> k = 5) #> #> Family: #> beta #> #> Link function: #> logit #> #> Trend model: #> GP() #> #> N series: #> 3 #> #> N timepoints: #> 80 #> #> Status: #> Fitted using Stan #> 4 chains, each with iter = 1000; warmup = 500; thin = 1 #> Total post-warmup draws = 2000 #> #> Observation precision parameter estimates: #> 2.5% 50% 97.5% Rhat n_eff #> phi[1] 7.8 12.0 18.0 1 1829 #> phi[2] 5.6 8.6 13.0 1 1023 #> phi[3] 4.1 6.0 8.7 1 1404 #> #> GAM coefficient (beta) estimates: #> 2.5% 50% 97.5% Rhat n_eff #> (Intercept) 0.11 0.45 0.7 1.01 602 #> #> Approximate significance of GAM smooths: #> edf Ref.df Chi.sq p-value #> s(season) 3.9071 5 9.792 0.0653 . #> s(season):seriesseries_1 1.0934 4 11.307 0.2695 #> s(season):seriesseries_2 2.5629 4 2.227 0.4544 #> s(season):seriesseries_3 0.8565 4 6.556 0.5358 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> marginal deviation: #> 2.5% 50% 97.5% Rhat n_eff #> alpha_gp[1] 0.150 0.40 0.88 1.00 828 #> alpha_gp[2] 0.570 0.93 1.50 1.00 1018 #> alpha_gp[3] 0.052 0.40 0.92 1.01 672 #> #> length scale: #> 2.5% 50% 97.5% Rhat n_eff #> rho_gp[1] 1.2 3.6 11 1.00 1482 #> rho_gp[2] 3.0 12.0 30 1.01 367 #> rho_gp[3] 1.3 4.9 26 1.00 532 #> #> Stan MCMC diagnostics: #> ✔ No issues with effective samples per iteration #> ✔ Rhat looks good for all parameters #> ✔ No issues with divergences #> ✔ No issues with maximum tree depth #> #> Samples were drawn using sampling(hmc). For each parameter, n_eff is a #> crude measure of effective sample size, and Rhat is the potential scale #> reduction factor on split MCMC chains (at convergence, Rhat = 1) #> #> Use how_to_cite() to get started describing this model Plot the hindcast and forecast distributions for each series library(patchwork) fc <- forecast(mod) wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2 ) There are many more extended uses of `mvgam`, including the ability to fit hierarchical State-Space GAMs that include dynamic and spatially varying coefficient models, dynamic factors, Joint Species Distribution Models and much more. See the package documentation for more details. `mvgam` can also be used to generate all necessary data structures and modelling code necessary to fit DGAMs using `Stan`. This can be helpful if users wish to make changes to the model to better suit their own bespoke research / analysis goals. The Stan Discourse is a helpful place to troubleshoot. ## Citing `mvgam` and related software When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work (but feel free to ⭐ this repo as well). When using `mvgam`, please cite the following: > Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models > (DGAMs) for forecasting discrete ecological time series. *Methods in > Ecology and Evolution*. DOI: As `mvgam` acts as an interface to `Stan`, please additionally cite: > Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., > Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). > Stan: A probabilistic programming language. *Journal of Statistical > Software*. 76(1). DOI: `mvgam` relies on several other `R` packages and, of course, on `R` itself. Use `how_to_cite()` to simplify the process of finding appropriate citations for your software setup. ## Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on [GitHub](https://github.com/nicholasjclark/mvgam/issues). Please also feel free to use the [`mvgam` Discussion Board](https://github.com/nicholasjclark/mvgam/discussions) to hunt for or post other discussion topics related to the package, and do check out the [`mvgam` Changelog](https://nicholasjclark.github.io/mvgam/news/index.html) for any updates about recent upgrades that the package has incorporated. ## Other resources A series of vignettes cover data formatting, forecasting and several extended case studies of DGAMs. A number of other examples, including some step-by-step introductory webinars, have also been compiled: - Time series in R and Stan using the mvgam package - Ecological Forecasting with Dynamic Generalized Additive Models - Distributed lags (and hierarchical distributed lags) using mgcv and mvgam - State-Space Vector Autoregressions in mvgam - Ecological Forecasting with Dynamic GAMs; a tutorial and detailed case study - Incorporating time-varying seasonality in forecast models ## Interested in contributing? I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark’at’uq.edu.au). Other contributions are also very welcome, but please see [The Contributor Instructions](https://github.com/nicholasjclark/mvgam/blob/master/.github/CONTRIBUTING.md) for general guidelines. Note that by participating in this project you agree to abide by the terms of its [Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). ## License The `mvgam` project is licensed under an `MIT` open source license ================================================ FILE: build_vignettes_CRAN.R ================================================ # Vignette names vignettes <- list.files('./vignettes', pattern = '.Rmd') # Generate R script versions of vignettes purl_vignettes <- function(x) { # Generate the R script version knitr::purl( input = paste0('vignettes/', x), output = paste0('doc/', sub('.Rmd', '.R', x)) ) # Copy this version to inst/doc file.copy( from = paste0('doc/', sub('.Rmd', '.R', x)), to = paste0('inst/doc/', sub('.Rmd', '.R', x)), overwrite = TRUE ) } # Build vignette htmls build_vignettes = function(x) { # Build the vignette html file devtools::build_rmd( paste0('vignettes/', x) ) # Copy the .Rmd to doc file.copy( from = paste0('vignettes/', x), to = paste0('doc/', x), overwrite = TRUE ) # Copy the .Rmd to inst/doc file.copy( from = paste0('vignettes/', x), to = paste0('inst/doc/', x), overwrite = TRUE ) # Copy the .html to inst/doc file.copy( from = sub('.Rmd', '.html', paste0('vignettes/', x)), to = paste0('inst/doc/', sub('.Rmd', '.html', x)), overwrite = TRUE ) # Copy the .html to doc file.copy( from = sub('.Rmd', '.html', paste0('vignettes/', x)), to = paste0('doc/', sub('.Rmd', '.html', x)), overwrite = TRUE ) # Remove the .html file.remove(sub('.Rmd', '.html', paste0('vignettes/', x))) } # Apply these functions to all vignette .Rmds lapply(vignettes, purl_vignettes) lapply(vignettes, build_vignettes) ================================================ FILE: cran-comments.md ================================================ ## Version 1.1.594 ## Summary of changes This version is a minor patch update to fix a test that spawned more than two cores. It also brings several cosmetic updates to improve the way summaries are printed and stored. There are no major structural changes or modifications that would break pre-existing workflows ## Test environments * Windows install: R 4.4.3 * win-builder: R-devel * win-builder: R-release * ubuntu-latest: R-release * ubuntu-latest: R-devel * macOS-latest: R-release ## R CMD check results * There were no ERRORs or WARNINGs. There were 2 NOTEs due to listing 'cmdstanr' in Suggests. This package is not a dependency but provides an additional backend option for users to select when fitting 'Stan' models, if they wish. A similar package that has been available on CRAN for quite some time ('brms') uses the same convention. I have included the `Additional_repositories` field in the DESCRIPTION to appropriately tell users where they can find this package. * There is one compilation WARNING about RcppArmadillo fallback compilation that appears in the CRAN check system. This warning originates from the RcppArmadillo system headers (suggesting to define -DARMA_USE_CURRENT) and is not related to any code in the mvgam package. It is a system-level compilation flag recommendation that would typically be addressed by system administrators or RcppArmadillo maintainers. ## `valgrind` memory check results * Running all examples using `--run-donttest`, and all package tests (including those skipped on CRAN) with `R -d "valgrind --tool=memcheck --leak-check=full"` resulted in no WARNINGs or ERRORs Maintainer: 'Nicholas J Clark ' ================================================ FILE: doc/data_in_mvgam.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- simdat <- sim_mvgam( n_series = 4, T = 24, prop_missing = 0.2 ) head(simdat$data_train, 16) ## -------------------------------------------------------------------------------- class(simdat$data_train$series) levels(simdat$data_train$series) ## -------------------------------------------------------------------------------- all( levels(simdat$data_train$series) %in% unique(simdat$data_train$series) ) ## -------------------------------------------------------------------------------- summary(glm( y ~ series + time, data = simdat$data_train, family = poisson() )) ## -------------------------------------------------------------------------------- summary(mgcv::gam( y ~ series + s(time, by = series), data = simdat$data_train, family = poisson() )) ## -------------------------------------------------------------------------------- gauss_dat <- data.frame( outcome = rnorm(10), series = factor("series1", levels = "series1"), time = 1:10 ) gauss_dat ## -------------------------------------------------------------------------------- mgcv::gam(outcome ~ time, family = betar(), data = gauss_dat) ## ----error=TRUE------------------------------------------------------------------ try({ mvgam(outcome ~ time, family = betar(), data = gauss_dat) }) ## -------------------------------------------------------------------------------- # A function to ensure all timepoints within a sequence are identical all_times_avail <- function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame( series = simdat$data_train$series, time = simdat$data_train$time ) %>% dplyr::group_by(series) %>% dplyr::summarise( all_there = all_times_avail( time, min_time, max_time ) ) -> checked_times if (any(checked_times$all_there == FALSE)) { warning( "One or more series in is missing observations for one or more timepoints" ) } else { cat("All series have observations at all timepoints :)") } ## -------------------------------------------------------------------------------- bad_times <- data.frame( time = seq(1, 16, by = 2), series = factor("series_1"), outcome = rnorm(8) ) bad_times ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian()) }) ## -------------------------------------------------------------------------------- bad_times %>% dplyr::right_join(expand.grid( time = seq( min(bad_times$time), max(bad_times$time) ), series = factor(unique(bad_times$series), levels = levels(bad_times$series)) )) %>% dplyr::arrange(time) -> good_times good_times ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian()) }) ## -------------------------------------------------------------------------------- bad_levels <- data.frame( time = 1:8, series = factor( "series_1", levels = c( "series_1", "series_2" ) ), outcome = rnorm(8) ) levels(bad_levels$series) ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian()) }) ## -------------------------------------------------------------------------------- setdiff(levels(bad_levels$series), unique(bad_levels$series)) ## -------------------------------------------------------------------------------- bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors( outcome ~ 1, data = good_levels, family = gaussian() ) }) ## -------------------------------------------------------------------------------- miss_dat <- data.frame( outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor("series1", levels = "series1"), time = 1:10 ) miss_dat ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) }) ## -------------------------------------------------------------------------------- miss_dat <- list( outcome = rnorm(10), series = factor("series1", levels = "series1"), time = 1:10 ) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2, 3] <- NA ## ----error=TRUE------------------------------------------------------------------ try({ get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) }) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, y = "y", series = "all" ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, y = "y", series = 1 ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, y = "y", series = 1 ) ## -------------------------------------------------------------------------------- data("all_neon_tick_data") str(dplyr::ungroup(all_neon_tick_data)) ## -------------------------------------------------------------------------------- plotIDs <- c( "SCBI_013", "SCBI_002", "SERC_001", "SERC_005", "SERC_006", "SERC_012", "BLAN_012", "BLAN_005" ) ## -------------------------------------------------------------------------------- model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) ## -------------------------------------------------------------------------------- model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid( plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52) )) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join( all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct() ) -> model_dat ## -------------------------------------------------------------------------------- model_dat %>% dplyr::mutate( series = plotID, y = target ) %>% dplyr::mutate( siteID = factor(siteID), series = factor(series) ) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat ## -------------------------------------------------------------------------------- model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat ## -------------------------------------------------------------------------------- levels(model_dat$series) ## ----error=TRUE------------------------------------------------------------------ try({ get_mvgam_priors( y ~ 1, data = model_dat, family = poisson() ) }) ## -------------------------------------------------------------------------------- testmod <- mvgam( y ~ s(epiWeek, by = series, bs = "cc") + s(series, bs = "re"), trend_model = AR(), data = model_dat, backend = "cmdstanr", run_model = FALSE ) ## -------------------------------------------------------------------------------- str(testmod$model_data) ## -------------------------------------------------------------------------------- stancode(testmod) ================================================ FILE: doc/data_in_mvgam.Rmd ================================================ --- title: "Formatting data for use in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Formatting data for use in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how to take raw data and format it for use in `mvgam`. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for `mvgam`. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html) and [the growing set of walk through video tutorials on `mvgam` applications](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3&si=lyg7qUrMLbD-tHCB). ## Required *tidy* data format Manipulating the data into a 'long' format (i.e. *tidy* format) is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to pass as data for to the two primary modelling functions, `mvgam()` and `jsdgam()`. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} simdat <- sim_mvgam( n_series = 4, T = 24, prop_missing = 0.2 ) head(simdat$data_train, 16) ``` ### `series` as a `factor` variable Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a `factor` variable. ```{r} class(simdat$data_train$series) levels(simdat$data_train$series) ``` It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including `mvgam()` and `get_mvgam_priors()`) will give an error if this is not the case, but it may be worth checking anyway: ```{r} all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series)) ``` Note that you can technically supply data that does not have a `series` indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the `?AR` documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the `?ZMVN` documentation). ### A single outcome variable You may also have notices that we do not spread the `numeric / integer`-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data (though the outcome does not have to be labelled `y`). This is another important requirement in `mvgam`, but it shouldn't be too unfamiliar to `R` users who frequently use modelling packages such as `lme4`, `mgcv`, `brms` or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series: ```{r} summary(glm( y ~ series + time, data = simdat$data_train, family = poisson() )) ``` ```{r} summary(mgcv::gam( y ~ series + s(time, by = series), data = simdat$data_train, family = poisson() )) ``` Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values `>= 1` or `<= 0` are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in `R` will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. `mvgam` will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be `< 1`) and attempt a Beta regression in `mvgam` using the `betar` family: ```{r} gauss_dat <- data.frame( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) gauss_dat ``` A call to `gam()` using the `mgcv` package leads to a model that actually fits (though it does give an unhelpful warning message): ```{r} mgcv::gam(outcome ~ time, family = betar(), data = gauss_dat ) ``` But the same call to `mvgam()` gives us something more useful: ```{r error=TRUE} mvgam(outcome ~ time, family = betar(), data = gauss_dat ) ``` Please see `?mvgam_families` for more information on the types of responses that the package can handle and their restrictions ### A `time` variable The other requirement for most models that can be fit in `mvgam` is a `numeric / integer`-classed variable labelled `time`. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in `mvgam` (see `?mvgam_trends` for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc...). But note that you can have missing observations for some (or all) series. `mvgam()` will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple `dplyr` call: ```{r} # A function to ensure all timepoints within a sequence are identical all_times_avail <- function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame( series = simdat$data_train$series, time = simdat$data_train$time ) %>% dplyr::group_by(series) %>% dplyr::summarise(all_there = all_times_avail( time, min_time, max_time )) -> checked_times if (any(checked_times$all_there == FALSE)) { warning("One or more series in is missing observations for one or more timepoints") } else { cat("All series have observations at all timepoints :)") } ``` Note that models which use dynamic components will assume that smaller values of `time` are *older* (i.e. `time = 1` came *before* `time = 2`, etc...) ### Irregular sampling intervals? Most `mvgam` dynamic trend models expect `time` to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the `CAR()` trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the **Examples** section in `?CAR`. You can also use `trend_model = 'None'` (the default in `mvgam()`) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the `?brms::gp` for details. But to reiterate the point from above, if you do not have time series data (or don't want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use `trend_model = ZMVN(...)` without the need for a `time` variable (see `?ZMVN` for details). ## Checking data with `get_mvgam_priors()` The `get_mvgam_priors()` function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn't anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to `get_mvgam_priors()`. Here we first simulate some data in which some of the timepoints in the `time` variable are not included in the data: ```{r} bad_times <- data.frame( time = seq(1, 16, by = 2), series = factor("series_1"), outcome = rnorm(8) ) bad_times ``` Next we call `get_mvgam_priors()` by simply specifying an intercept-only model, which is enough to trigger all the checks: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian() ) ``` This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don't have any covariates, it should be pretty easy using `expand.grid()`: ```{r} bad_times %>% dplyr::right_join(expand.grid( time = seq( min(bad_times$time), max(bad_times$time) ), series = factor(unique(bad_times$series), levels = levels(bad_times$series) ) )) %>% dplyr::arrange(time) -> good_times good_times ``` Now the call to `get_mvgam_priors()`, using our filled in data, should work: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian() ) ``` This function should also pick up on misaligned factor levels for the `series` variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data: ```{r} bad_levels <- data.frame( time = 1:8, series = factor("series_1", levels = c( "series_1", "series_2" ) ), outcome = rnorm(8) ) levels(bad_levels$series) ``` Another call to `get_mvgam_priors()` brings up a useful error: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian() ) ``` Following the message's advice tells us there is a level for `series_2` in the `series` variable, but there are no observations for this series in the data: ```{r} setdiff(levels(bad_levels$series), unique(bad_levels$series)) ``` Re-assigning the levels fixes the issue: ```{r} bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ 1, data = good_levels, family = gaussian() ) ``` ### Covariates with no `NA`s Covariates can be used in models just as you would when using `mgcv` (see `?formula.gam` for details of the formula syntax). But although the outcome variable can have `NA`s, covariates cannot. Most regression software will silently drop any raws in the model matrix that have `NA`s, which is not helpful when debugging. Both the `mvgam()` and `get_mvgam_priors()` functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values: ```{r} miss_dat <- data.frame( outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` Just like with the `mgcv` package, `mvgam` can also accept data as a `list` object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by `mvgam` should still work on these data. Here we change the `cov` predictor to be a `matrix`: ```{r} miss_dat <- list( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2, 3] <- NA ``` A call to `get_mvgam_priors()` returns the same error: ```{r error=TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` ## Plotting with `plot_mvgam_series()` Plotting the data is a useful way to ensure everything looks ok, once you've gone throug the above checks on factor levels and timepoint x series combinations. The `plot_mvgam_series()` function will take supplied data and plot either a series of line plots (if you choose `series = 'all'`) or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = "all" ) ``` Or we can look more closely at the distribution for the first time series: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = 1 ) ``` If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, y = "y", series = 1 ) ``` ## Example with NEON tick data To give one example of how data can be reformatted for `mvgam` modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. *Ixodes scapularis* is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised [long-term monitoring of tick abundances as well as other important indicators of ecological change](https://www.neonscience.org/data-collection/ticks){target="_blank"}. Nymphal abundance of *I. scapularis* is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs. We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in [Clark & Wells 2022](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13974){target="_blank"}. You can read a bit about the data using the call `?all_neon_tick_data` ```{r} data("all_neon_tick_data") str(dplyr::ungroup(all_neon_tick_data)) ``` For this exercise, we will use the `epiWeek` variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the `plotID` column): ```{r} plotIDs <- c( "SCBI_013", "SCBI_002", "SERC_001", "SERC_005", "SERC_006", "SERC_012", "BLAN_012", "BLAN_005" ) ``` Now we can select the target species we want (*I. scapularis*), filter to the correct plot IDs and convert the `epiWeek` variable from `character` to `numeric`: ```{r} model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) ``` Now is the tricky part: we need to fill in missing observations with `NA`s. The tick data are sparse in that field observers do not go out and sample in each possible `epiWeek`. So there are many particular weeks in which observations are not included in the data. But we can use `expand.grid()` again to take care of this: ```{r} model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid( plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52) )) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join(all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct()) -> model_dat ``` Create the `series` variable needed for `mvgam` modelling: ```{r} model_dat %>% dplyr::mutate( series = plotID, y = target ) %>% dplyr::mutate( siteID = factor(siteID), series = factor(series) ) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat ``` Now create the `time` variable, which needs to track `Year` and `epiWeek` for each unique series. The `n` function from `dplyr` is often useful if generating a `time` index for grouped dataframes: ```{r} model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat ``` Check factor levels for the `series`: ```{r} levels(model_dat$series) ``` This looks good, as does a more rigorous check using `get_mvgam_priors()`: ```{r error=TRUE} get_mvgam_priors( y ~ 1, data = model_dat, family = poisson() ) ``` We can also set up a model in `mvgam()` but use `run_model = FALSE` to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the `cmdstanr` backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated `Stan` code for any inefficiencies that can be fixed to lead to sampling performance improvements: ```{r} testmod <- mvgam( y ~ s(epiWeek, by = series, bs = "cc") + s(series, bs = "re"), trend_model = AR(), data = model_dat, backend = "cmdstanr", run_model = FALSE ) ``` This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling: ```{r} str(testmod$model_data) ``` ```{r} stancode(testmod) ``` ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/data_in_mvgam.html ================================================ Formatting data for use in mvgam

Formatting data for use in mvgam

Nicholas J Clark

2026-01-19

This vignette gives an example of how to take raw data and format it for use in mvgam. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for mvgam. For full details on the basic mvgam functionality, please see the introductory vignette and the growing set of walk through video tutorials on mvgam applications.

Required tidy data format

Manipulating the data into a ‘long’ format (i.e. tidy format) is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to pass as data for to the two primary modelling functions, mvgam() and jsdgam(). A simple example can be viewed by simulating data using the sim_mvgam() function. See ?sim_mvgam for more details

simdat <- sim_mvgam(
  n_series = 4, 
  T = 24, 
  prop_missing = 0.2
)
head(simdat$data_train, 16)
#>     y season year   series time
#> 1   3      1    1 series_1    1
#> 2   2      1    1 series_2    1
#> 3   2      1    1 series_3    1
#> 4   7      1    1 series_4    1
#> 5   1      2    1 series_1    2
#> 6   3      2    1 series_2    2
#> 7   3      2    1 series_3    2
#> 8   1      2    1 series_4    2
#> 9   1      3    1 series_1    3
#> 10  4      3    1 series_2    3
#> 11  4      3    1 series_3    3
#> 12 NA      3    1 series_4    3
#> 13 NA      4    1 series_1    4
#> 14  2      4    1 series_2    4
#> 15  2      4    1 series_3    4
#> 16  5      4    1 series_4    4

series as a factor variable

Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a factor variable.

class(simdat$data_train$series)
#> [1] "factor"
levels(simdat$data_train$series)
#> [1] "series_1" "series_2" "series_3" "series_4"

It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including mvgam() and get_mvgam_priors()) will give an error if this is not the case, but it may be worth checking anyway:

all(levels(simdat$data_train$series) %in% 
      unique(simdat$data_train$series))
#> [1] TRUE

Note that you can technically supply data that does not have a series indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the ?AR documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the ?ZMVN documentation).

A single outcome variable

You may also have notices that we do not spread the numeric / integer-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data (though the outcome does not have to be labelled y). This is another important requirement in mvgam, but it shouldn’t be too unfamiliar to R users who frequently use modelling packages such as lme4, mgcv, brms or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series:

summary(glm(
  y ~ series + time,
  data = simdat$data_train,
  family = poisson()
))
#> 
#> Call:
#> glm(formula = y ~ series + time, family = poisson(), data = simdat$data_train)
#> 
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)    -0.31987    0.37647  -0.850 0.395515    
#> seriesseries_2  1.28070    0.37645   3.402 0.000669 ***
#> seriesseries_3  1.18080    0.38064   3.102 0.001921 ** 
#> seriesseries_4  1.17583    0.38161   3.081 0.002061 ** 
#> time           -0.01996    0.01888  -1.057 0.290507    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 115.3  on 59  degrees of freedom
#> Residual deviance:  96.7  on 55  degrees of freedom
#>   (12 observations deleted due to missingness)
#> AIC: 214.96
#> 
#> Number of Fisher Scoring iterations: 5
summary(mgcv::gam(
  y ~ series + s(time, by = series),
  data = simdat$data_train,
  family = poisson()
))
#> 
#> Family: poisson 
#> Link function: log 
#> 
#> Formula:
#> y ~ series + s(time, by = series)
#> 
#> Parametric coefficients:
#>                Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)     -0.8004     0.4355  -1.838  0.06608 . 
#> seriesseries_2   0.9043     0.5742   1.575  0.11526   
#> seriesseries_3   1.4777     0.4741   3.117  0.00183 **
#> seriesseries_4   1.3673     0.4806   2.845  0.00445 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                          edf Ref.df Chi.sq p-value  
#> s(time):seriesseries_1 1.000  1.000  4.784  0.0287 *
#> s(time):seriesseries_2 5.767  6.810 15.826  0.0213 *
#> s(time):seriesseries_3 1.000  1.000  0.214  0.6433  
#> s(time):seriesseries_4 3.589  4.434 10.772  0.0395 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.541   Deviance explained = 60.4%
#> UBRE = 0.27372  Scale est. = 1         n = 60

Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values >= 1 or <= 0 are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in R will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. mvgam will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be < 1) and attempt a Beta regression in mvgam using the betar family:

gauss_dat <- data.frame(
  outcome = rnorm(10),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
gauss_dat
#>        outcome  series time
#> 1  -0.57990666 series1    1
#> 2  -0.86642679 series1    2
#> 3   0.20127362 series1    3
#> 4   1.36763744 series1    4
#> 5  -0.03516434 series1    5
#> 6   0.23979092 series1    6
#> 7   0.01013158 series1    7
#> 8  -0.54771525 series1    8
#> 9  -0.48140890 series1    9
#> 10 -1.20075974 series1   10

A call to gam() using the mgcv package leads to a model that actually fits (though it does give an unhelpful warning message):

mgcv::gam(outcome ~ time,
  family = betar(),
  data = gauss_dat
)
#> 
#> Family: Beta regression(0.124) 
#> Link function: logit 
#> 
#> Formula:
#> outcome ~ time
#> Total model degrees of freedom 2 
#> 
#> REML score: -180.7085

But the same call to mvgam() gives us something more useful:

mvgam(outcome ~ time,
  family = betar(),
  data = gauss_dat
)
#> Error: Values <= 0 not allowed for beta responses

Please see ?mvgam_families for more information on the types of responses that the package can handle and their restrictions

A time variable

The other requirement for most models that can be fit in mvgam is a numeric / integer-classed variable labelled time. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in mvgam (see ?mvgam_trends for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc…). But note that you can have missing observations for some (or all) series. mvgam() will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple dplyr call:

# A function to ensure all timepoints within a sequence are identical
all_times_avail <- function(time, min_time, max_time) {
  identical(
    as.numeric(sort(time)),
    as.numeric(seq.int(from = min_time, to = max_time))
  )
}

# Get min and max times from the data
min_time <- min(simdat$data_train$time)
max_time <- max(simdat$data_train$time)

# Check that all times are recorded for each series
data.frame(
  series = simdat$data_train$series,
  time = simdat$data_train$time
) %>%
  dplyr::group_by(series) %>%
  dplyr::summarise(all_there = all_times_avail(
    time,
    min_time,
    max_time
  )) -> checked_times
if (any(checked_times$all_there == FALSE)) {
  warning("One or more series in is missing observations for one or more timepoints")
} else {
  cat("All series have observations at all timepoints :)")
}
#> All series have observations at all timepoints :)

Note that models which use dynamic components will assume that smaller values of time are older (i.e. time = 1 came before time = 2, etc…)

Irregular sampling intervals?

Most mvgam dynamic trend models expect time to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the CAR() trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the Examples section in ?CAR. You can also use trend_model = 'None' (the default in mvgam()) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the ?brms::gp for details. But to reiterate the point from above, if you do not have time series data (or don’t want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use trend_model = ZMVN(...) without the need for a time variable (see ?ZMVN for details).

Checking data with get_mvgam_priors()

The get_mvgam_priors() function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn’t anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to get_mvgam_priors(). Here we first simulate some data in which some of the timepoints in the time variable are not included in the data:

bad_times <- data.frame(
  time = seq(1, 16, by = 2),
  series = factor("series_1"),
  outcome = rnorm(8)
)
bad_times
#>   time   series    outcome
#> 1    1 series_1  1.6357848
#> 2    3 series_1 -0.3858940
#> 3    5 series_1  1.7655861
#> 4    7 series_1 -1.4477319
#> 5    9 series_1 -1.0557525
#> 6   11 series_1  0.4308398
#> 7   13 series_1  1.9072537
#> 8   15 series_1  0.1525545

Next we call get_mvgam_priors() by simply specifying an intercept-only model, which is enough to trigger all the checks:

get_mvgam_priors(outcome ~ 1,
  data = bad_times,
  family = gaussian()
)
#> Error: One or more series in data is missing observations for one or more timepoints

This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don’t have any covariates, it should be pretty easy using expand.grid():

bad_times %>%
  dplyr::right_join(expand.grid(
    time = seq(
      min(bad_times$time),
      max(bad_times$time)
    ),
    series = factor(unique(bad_times$series),
      levels = levels(bad_times$series)
    )
  )) %>%
  dplyr::arrange(time) -> good_times
good_times
#>    time   series    outcome
#> 1     1 series_1  1.6357848
#> 2     2 series_1         NA
#> 3     3 series_1 -0.3858940
#> 4     4 series_1         NA
#> 5     5 series_1  1.7655861
#> 6     6 series_1         NA
#> 7     7 series_1 -1.4477319
#> 8     8 series_1         NA
#> 9     9 series_1 -1.0557525
#> 10   10 series_1         NA
#> 11   11 series_1  0.4308398
#> 12   12 series_1         NA
#> 13   13 series_1  1.9072537
#> 14   14 series_1         NA
#> 15   15 series_1  0.1525545

Now the call to get_mvgam_priors(), using our filled in data, should work:

get_mvgam_priors(outcome ~ 1,
  data = good_times,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                   example_change
#> 1 (Intercept) ~ student_t(3, 0.3, 2.5);      (Intercept) ~ normal(0, 1);
#> 2  sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.76, 0.83);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

This function should also pick up on misaligned factor levels for the series variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data:

bad_levels <- data.frame(
  time = 1:8,
  series = factor("series_1",
    levels = c(
      "series_1",
      "series_2"
    )
  ),
  outcome = rnorm(8)
)

levels(bad_levels$series)
#> [1] "series_1" "series_2"

Another call to get_mvgam_priors() brings up a useful error:

get_mvgam_priors(outcome ~ 1,
  data = bad_levels,
  family = gaussian()
)
#> Error: Mismatch between factor levels of "series" and unique values of "series"
#> Use
#>   `setdiff(levels(data$series), unique(data$series))` 
#> and
#>   `intersect(levels(data$series), unique(data$series))`
#> for guidance

Following the message’s advice tells us there is a level for series_2 in the series variable, but there are no observations for this series in the data:

setdiff(levels(bad_levels$series), 
        unique(bad_levels$series))
#> [1] "series_2"

Re-assigning the levels fixes the issue:

bad_levels %>%
  dplyr::mutate(series = droplevels(series)) -> good_levels
levels(good_levels$series)
#> [1] "series_1"
get_mvgam_priors(
  outcome ~ 1,
  data = good_levels,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                  prior                  example_change
#> 1  (Intercept) ~ student_t(3, 0, 2.5);     (Intercept) ~ normal(0, 1);
#> 2 sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(0.46, 0.96);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

Covariates with no NAs

Covariates can be used in models just as you would when using mgcv (see ?formula.gam for details of the formula syntax). But although the outcome variable can have NAs, covariates cannot. Most regression software will silently drop any raws in the model matrix that have NAs, which is not helpful when debugging. Both the mvgam() and get_mvgam_priors() functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values:

miss_dat <- data.frame(
  outcome = rnorm(10),
  cov = c(NA, rnorm(9)),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
miss_dat
#>       outcome          cov  series time
#> 1  -0.5965054           NA series1    1
#> 2   0.2126416  0.154650377 series1    2
#> 3   0.9601485  1.553717403 series1    3
#> 4  -0.8857684 -0.507988552 series1    4
#> 5  -0.4037936  0.245187700 series1    5
#> 6  -0.4738641 -0.009847922 series1    6
#> 7  -1.2390329  0.342620485 series1    7
#> 8   1.9631220 -0.642393988 series1    8
#> 9  -1.6783068 -1.335488789 series1    9
#> 10 -1.3909946 -0.254555529 series1   10
get_mvgam_priors(
  outcome ~ cov,
  data = miss_dat,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                  cov            1     cov fixed effect
#> 3 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                    prior                   example_change
#> 1 (Intercept) ~ student_t(3, -0.5, 2.5);      (Intercept) ~ normal(0, 1);
#> 2              cov ~ student_t(3, 0, 2);              cov ~ normal(0, 1);
#> 3   sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.43, 0.49);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA

Just like with the mgcv package, mvgam can also accept data as a list object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by mvgam should still work on these data. Here we change the cov predictor to be a matrix:

miss_dat <- list(
  outcome = rnorm(10),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10)
miss_dat$cov[2, 3] <- NA

A call to get_mvgam_priors() returns the same error:

get_mvgam_priors(
  outcome ~ cov,
  data = miss_dat,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                 cov1            1    cov1 fixed effect
#> 3                                 cov2            1    cov2 fixed effect
#> 4                                 cov3            1    cov3 fixed effect
#> 5                                 cov4            1    cov4 fixed effect
#> 6                                 cov5            1    cov5 fixed effect
#> 7 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                   example_change
#> 1 (Intercept) ~ student_t(3, 0.2, 2.5);      (Intercept) ~ normal(0, 1);
#> 2            cov1 ~ student_t(3, 0, 2);             cov1 ~ normal(0, 1);
#> 3            cov2 ~ student_t(3, 0, 2);             cov2 ~ normal(0, 1);
#> 4            cov3 ~ student_t(3, 0, 2);             cov3 ~ normal(0, 1);
#> 5            cov4 ~ student_t(3, 0, 2);             cov4 ~ normal(0, 1);
#> 6            cov5 ~ student_t(3, 0, 2);             cov5 ~ normal(0, 1);
#> 7  sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.86, 0.33);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA
#> 4             NA             NA
#> 5             NA             NA
#> 6             NA             NA
#> 7             NA             NA

Plotting with plot_mvgam_series()

Plotting the data is a useful way to ensure everything looks ok, once you’ve gone throug the above checks on factor levels and timepoint x series combinations. The plot_mvgam_series() function will take supplied data and plot either a series of line plots (if you choose series = 'all') or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use:

plot_mvgam_series(
  data = simdat$data_train,
  y = "y",
  series = "all"
)

Plotting time series features for GAM models in mvgam

Or we can look more closely at the distribution for the first time series:

plot_mvgam_series(
  data = simdat$data_train,
  y = "y",
  series = 1
)

Plotting time series features for GAM models in mvgam

If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots:

plot_mvgam_series(
  data = simdat$data_train,
  newdata = simdat$data_test,
  y = "y",
  series = 1
)

Plotting time series features for GAM models in mvgam

Example with NEON tick data

To give one example of how data can be reformatted for mvgam modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. Ixodes scapularis is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised long-term monitoring of tick abundances as well as other important indicators of ecological change. Nymphal abundance of I. scapularis is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs.

We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in Clark & Wells 2022. You can read a bit about the data using the call ?all_neon_tick_data

data("all_neon_tick_data")
str(dplyr::ungroup(all_neon_tick_data))
#> tibble [3,505 × 24] (S3: tbl_df/tbl/data.frame)
#>  $ Year                : num [1:3505] 2015 2015 2015 2015 2015 ...
#>  $ epiWeek             : chr [1:3505] "37" "38" "39" "40" ...
#>  $ yearWeek            : chr [1:3505] "201537" "201538" "201539" "201540" ...
#>  $ plotID              : chr [1:3505] "BLAN_005" "BLAN_005" "BLAN_005" "BLAN_005" ...
#>  $ siteID              : chr [1:3505] "BLAN" "BLAN" "BLAN" "BLAN" ...
#>  $ nlcdClass           : chr [1:3505] "deciduousForest" "deciduousForest" "deciduousForest" "deciduousForest" ...
#>  $ decimalLatitude     : num [1:3505] 39.1 39.1 39.1 39.1 39.1 ...
#>  $ decimalLongitude    : num [1:3505] -78 -78 -78 -78 -78 ...
#>  $ elevation           : num [1:3505] 168 168 168 168 168 ...
#>  $ totalSampledArea    : num [1:3505] 162 NA NA NA 162 NA NA NA NA 164 ...
#>  $ amblyomma_americanum: num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ ixodes_scapularis   : num [1:3505] 2 NA NA NA 0 NA NA NA NA 0 ...
#>  $ time                : Date[1:3505], format: "2015-09-13" "2015-09-20" ...
#>  $ RHMin_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMin_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ soi                 : num [1:3505] -18.4 -17.9 -23.5 -28.4 -25.9 ...
#>  $ cum_sdd             : num [1:3505] 173 173 173 173 173 ...
#>  $ cum_gdd             : num [1:3505] 1129 1129 1129 1129 1129 ...

For this exercise, we will use the epiWeek variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the plotID column):

plotIDs <- c(
  "SCBI_013", "SCBI_002",
  "SERC_001", "SERC_005",
  "SERC_006", "SERC_012",
  "BLAN_012", "BLAN_005"
)

Now we can select the target species we want (I. scapularis), filter to the correct plot IDs and convert the epiWeek variable from character to numeric:

model_dat <- all_neon_tick_data %>%
  dplyr::ungroup() %>%
  dplyr::mutate(target = ixodes_scapularis) %>%
  dplyr::filter(plotID %in% plotIDs) %>%
  dplyr::select(Year, epiWeek, plotID, target) %>%
  dplyr::mutate(epiWeek = as.numeric(epiWeek))

Now is the tricky part: we need to fill in missing observations with NAs. The tick data are sparse in that field observers do not go out and sample in each possible epiWeek. So there are many particular weeks in which observations are not included in the data. But we can use expand.grid() again to take care of this:

model_dat %>%
  # Create all possible combos of plotID, Year and epiWeek;
  # missing outcomes will be filled in as NA
  dplyr::full_join(expand.grid(
    plotID = unique(model_dat$plotID),
    Year = unique(model_dat$Year),
    epiWeek = seq(1, 52)
  )) %>%
  # left_join back to original data so plotID and siteID will
  # match up, in case you need the siteID for anything else later on
  dplyr::left_join(all_neon_tick_data %>%
    dplyr::select(siteID, plotID) %>%
    dplyr::distinct()) -> model_dat

Create the series variable needed for mvgam modelling:

model_dat %>%
  dplyr::mutate(
    series = plotID,
    y = target
  ) %>%
  dplyr::mutate(
    siteID = factor(siteID),
    series = factor(series)
  ) %>%
  dplyr::select(-target, -plotID) %>%
  dplyr::arrange(Year, epiWeek, series) -> model_dat

Now create the time variable, which needs to track Year and epiWeek for each unique series. The n function from dplyr is often useful if generating a time index for grouped dataframes:

model_dat %>%
  dplyr::ungroup() %>%
  dplyr::group_by(series) %>%
  dplyr::arrange(Year, epiWeek) %>%
  dplyr::mutate(time = seq(1, dplyr::n())) %>%
  dplyr::ungroup() -> model_dat

Check factor levels for the series:

levels(model_dat$series)
#> [1] "BLAN_005" "BLAN_012" "SCBI_002" "SCBI_013" "SERC_001" "SERC_005" "SERC_006"
#> [8] "SERC_012"

This looks good, as does a more rigorous check using get_mvgam_priors():

get_mvgam_priors(
  y ~ 1,
  data = model_dat,
  family = poisson()
)
#>    param_name param_length  param_info                                  prior
#> 1 (Intercept)            1 (Intercept) (Intercept) ~ student_t(3, -2.3, 2.5);
#>                example_change new_lowerbound new_upperbound
#> 1 (Intercept) ~ normal(0, 1);             NA             NA

We can also set up a model in mvgam() but use run_model = FALSE to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the cmdstanr backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated Stan code for any inefficiencies that can be fixed to lead to sampling performance improvements:

testmod <- mvgam(
  y ~ s(epiWeek, by = series, bs = "cc") +
    s(series, bs = "re"),
  trend_model = AR(),
  data = model_dat,
  backend = "cmdstanr",
  run_model = FALSE
)

This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling:

str(testmod$model_data)
#> List of 25
#>  $ y           : num [1:416, 1:8] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
#>  $ n           : int 416
#>  $ X           : num [1:3328, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:3328] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ zero        : num [1:73] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ S2          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S3          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S4          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S5          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S6          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S7          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S8          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 1.02
#>  $ ytimes      : int [1:416, 1:8] 1 9 17 25 33 41 49 57 65 73 ...
#>  $ n_series    : int 8
#>  $ sp          : Named num [1:9] 0.368 0.368 0.368 0.368 0.368 ...
#>   ..- attr(*, "names")= chr [1:9] "s(epiWeek):seriesBLAN_005" "s(epiWeek):seriesBLAN_012" "s(epiWeek):seriesSCBI_002" "s(epiWeek):seriesSCBI_013" ...
#>  $ y_observed  : num [1:416, 1:8] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ total_obs   : int 3328
#>  $ num_basis   : int 73
#>  $ n_sp        : num 9
#>  $ n_nonmissing: int 400
#>  $ obs_ind     : int [1:400] 89 93 98 101 115 118 121 124 127 130 ...
#>  $ flat_ys     : num [1:400] 2 0 0 0 0 0 0 25 36 14 ...
#>  $ flat_xs     : num [1:400, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:400] "705" "737" "777" "801" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "AR1"
stancode(testmod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   matrix[8, 8] S2; // mgcv smooth penalty matrix S2
#>   matrix[8, 8] S3; // mgcv smooth penalty matrix S3
#>   matrix[8, 8] S4; // mgcv smooth penalty matrix S4
#>   matrix[8, 8] S5; // mgcv smooth penalty matrix S5
#>   matrix[8, 8] S6; // mgcv smooth penalty matrix S6
#>   matrix[8, 8] S7; // mgcv smooth penalty matrix S7
#>   matrix[8, 8] S8; // mgcv smooth penalty matrix S8
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 65] = b_raw[1 : 65];
#>   b[66 : 73] = mu_raw[1] + b_raw[66 : 73] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ inv_gamma(1.418, 0.452);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, -2.3, 2.5);
#>   
#>   // prior for s(epiWeek):seriesBLAN_005...
#>   b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior for s(epiWeek):seriesBLAN_012...
#>   b_raw[10 : 17] ~ multi_normal_prec(zero[10 : 17],
#>                                      S2[1 : 8, 1 : 8] * lambda[2]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_002...
#>   b_raw[18 : 25] ~ multi_normal_prec(zero[18 : 25],
#>                                      S3[1 : 8, 1 : 8] * lambda[3]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_013...
#>   b_raw[26 : 33] ~ multi_normal_prec(zero[26 : 33],
#>                                      S4[1 : 8, 1 : 8] * lambda[4]);
#>   
#>   // prior for s(epiWeek):seriesSERC_001...
#>   b_raw[34 : 41] ~ multi_normal_prec(zero[34 : 41],
#>                                      S5[1 : 8, 1 : 8] * lambda[5]);
#>   
#>   // prior for s(epiWeek):seriesSERC_005...
#>   b_raw[42 : 49] ~ multi_normal_prec(zero[42 : 49],
#>                                      S6[1 : 8, 1 : 8] * lambda[6]);
#>   
#>   // prior for s(epiWeek):seriesSERC_006...
#>   b_raw[50 : 57] ~ multi_normal_prec(zero[50 : 57],
#>                                      S7[1 : 8, 1 : 8] * lambda[7]);
#>   
#>   // prior for s(epiWeek):seriesSERC_012...
#>   b_raw[58 : 65] ~ multi_normal_prec(zero[58 : 65],
#>                                      S8[1 : 8, 1 : 8] * lambda[8]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[66 : 73] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ inv_gamma(1.418, 0.452);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Further reading

The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice:

Clark, Nicholas J. and Wells, K. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. (2023): 14, 771-784.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

de Sousa, Heitor C., et al. Severe fire regimes decrease resilience of ectothermic populations. Journal of Animal Ecology (2024): 93(11), 1656-1669.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis (2023): 179, 107659.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Zhu, L., et al. Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts. Functional Ecology (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/forecast_evaluation.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(1) simdat <- sim_mvgam( T = 100, n_series = 3, mu = 2, trend_model = GP(), prop_trend = 0.75, family = poisson(), prop_missing = 0.10 ) ## -------------------------------------------------------------------------------- str(simdat) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, series = "all" ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 1 ) ## ----include=FALSE--------------------------------------------------------------- mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, newdata = simdat$data_test ) ## ----eval=FALSE------------------------------------------------------------------ # mod1 <- mvgam( # y ~ s(season, bs = "cc", k = 8) + # s(time, by = series, bs = "cr", k = 20), # knots = list(season = c(0.5, 12.5)), # trend_model = "None", # data = simdat$data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod1, include_betas = FALSE) ## ----fig.alt = "Plotting GAM smooth functions using mvgam"----------------------- conditional_effects(mod1, type = "link") ## ----include=FALSE, message=FALSE------------------------------------------------ mod2 <- mvgam( y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ## ----eval=FALSE------------------------------------------------------------------ # mod2 <- mvgam(y ~ 1, # trend_formula = ~ s(season, bs = "cc", k = 8) - 1, # trend_knots = list(season = c(0.5, 12.5)), # trend_model = AR(cor = TRUE), # noncentred = TRUE, # data = simdat$data_train, # silent = 1 # ) ## -------------------------------------------------------------------------------- summary(mod2, include_betas = FALSE) ## ----fig.alt = "Summarising latent Gaussian Process parameters in mvgam"--------- mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas") ## ----fig.alt = "Summarising latent Gaussian Process parameters in mvgam"--------- mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas") ## ----fig.alt = "Plotting latent Gaussian Process effects in mvgam and marginaleffects"---- conditional_effects(mod2, type = "link") ## -------------------------------------------------------------------------------- fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) ## -------------------------------------------------------------------------------- str(fc_mod1) ## -------------------------------------------------------------------------------- plot(fc_mod1, series = 1) plot(fc_mod2, series = 1) plot(fc_mod1, series = 2) plot(fc_mod2, series = 2) ## ----include=FALSE--------------------------------------------------------------- mod2 <- mvgam( y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod2 <- mvgam(y ~ 1, # trend_formula = ~ s(season, bs = "cc", k = 8) - 1, # trend_knots = list(season = c(0.5, 12.5)), # trend_model = AR(cor = TRUE), # noncentred = TRUE, # data = simdat$data_train, # newdata = simdat$data_test, # silent = 2 # ) ## -------------------------------------------------------------------------------- fc_mod2 <- forecast(mod2) ## ----warning=FALSE, fig.alt = "Plotting posterior forecast distributions using mvgam and R"---- plot(fc_mod2, series = 1) ## ----warning=FALSE--------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps") str(crps_mod1) crps_mod1$series_1 ## ----warning=FALSE--------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6) crps_mod1$series_1 ## -------------------------------------------------------------------------------- link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link") score(link_mod1, score = "elpd")$series_1 ## -------------------------------------------------------------------------------- energy_mod2 <- score(fc_mod2, score = "energy") str(energy_mod2) ## -------------------------------------------------------------------------------- energy_mod2$all_series ## -------------------------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps") crps_mod2 <- score(fc_mod2, score = "crps") diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) ================================================ FILE: doc/forecast_evaluation.Rmd ================================================ --- title: "Forecasting and forecast evaluation in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Forecasting and forecast evaluation in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules. ## Simulating discrete time series We begin by simulating some data to show how forecasts are computed and evaluated in `mvgam`. The `sim_mvgam()` function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting `trend_model = GP()` and `prop_trend = 0.75`, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing. ```{r} set.seed(1) simdat <- sim_mvgam( T = 100, n_series = 3, mu = 2, trend_model = GP(), prop_trend = 0.75, family = poisson(), prop_missing = 0.10 ) ``` The returned object is a `list` containing training and testing data (`sim_mvgam()` automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data ```{r} str(simdat) ``` Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, series = "all" ) ``` For individual series, we can plot the training and testing data, as well as some more specific features of the observed data: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 1 ) ``` ### Modelling dynamics with splines The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible: ```{r include=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, newdata = simdat$data_test ) ``` ```{r eval=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, bs = "cr", k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, silent = 2 ) ``` The model fits without issue: ```{r} summary(mod1, include_betas = FALSE) ``` And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear ```{r, fig.alt = "Plotting GAM smooth functions using mvgam"} conditional_effects(mod1, type = "link") ``` ### Modelling dynamics with a correlated AR1 Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See `?AR` for more details. ```{r include=FALSE, message=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` The summary for this model now contains information on the autoregressive and process error parameters for each time series: ```{r} summary(mod2, include_betas = FALSE) ``` We can plot the posteriors for these parameters, and for any other parameter for that matter, using `bayesplot` routines. First the autoregressive parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas") ``` And now the variance ($\sigma$) parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas") ``` We can again plot the conditional seasonal effect: ```{r, fig.alt = "Plotting latent Gaussian Process effects in mvgam and marginaleffects"} conditional_effects(mod2, type = "link") ``` The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts ## Forecasting with the `forecast()` function Probabilistic forecasts can be computed in two main ways in `mvgam`. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding `newdata` to the `forecast()` function. It is crucial that any `newdata` fed to the `forecast()` function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the `forecast()` function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see `?forecast.mvgam` for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions ```{r} fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) ``` The objects we have created are of class `mvgam_forecast`, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data: ```{r} str(fc_mod1) ``` We can plot the forecasts for some series from each model using the `S3 plot` method for objects of this class: ```{r} plot(fc_mod1, series = 1) plot(fc_mod2, series = 1) plot(fc_mod1, series = 2) plot(fc_mod2, series = 2) ``` Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment. ## Forecasting with `newdata` in `mvgam()` The second way we can produce forecasts in `mvgam` is to feed the testing data directly to the `mvgam()` function as `newdata`. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the `generated quantities` block in `Stan`. As an example, we can refit `mod2` but include the testing data for automatic forecasts: ```{r include=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` Because the model already contains a forecast distribution, we do not need to feed `newdata` to the `forecast()` function: ```{r} fc_mod2 <- forecast(mod2) ``` The forecasts will be nearly identical to those calculated previously: ```{r warning=FALSE, fig.alt = "Plotting posterior forecast distributions using mvgam and R"} plot(fc_mod2, series = 1) ``` ## Scoring forecast distributions A primary purpose of the `mvgam_forecast` class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See `?mvgam::score.mvgam_forecast` to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution. ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps") str(crps_mod1) crps_mod1$series_1 ``` The returned list contains a `data.frame` for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the `in_interval` column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval: ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6) crps_mod1$series_1 ``` We can also compare forecasts against out of sample observations using the [Expected Log Predictive Density (ELPD; also known as the log score)](https://link.springer.com/article/10.1007/s11222-016-9696-4){target="_blank"}. The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the `forecast()` function: ```{r} link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link") score(link_mod1, score = "elpd")$series_1 ``` Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. `mvgam` offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute: ```{r} energy_mod2 <- score(fc_mod2, score = "energy") str(energy_mod2) ``` The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the `all_series` slot): ```{r} energy_mod2$all_series ``` You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (`mod2`) is better, while a positive value means the spline model (`mod1`) is better. ```{r} crps_mod1 <- score(fc_mod1, score = "crps") crps_mod2 <- score(fc_mod2, score = "crps") diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) ``` The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data ## Further reading The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules: Clark N.J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ* 13:e18929 (2025) https://doi.org/10.7717/peerj.18929 Hyndman, Rob J., and George Athanasopoulos. [Forecasting: principles and practice](https://otexts.com/fpp3/distaccuracy.html). *OTexts*, (2018). Gneiting, Tilmann, and Adrian E. Raftery. [Strictly proper scoring rules, prediction, and estimation](https://www.tandfonline.com/doi/abs/10.1198/016214506000001437) *Journal of the American statistical Association* 102.477 (2007) 359-378. Simonis, Juniper L., et al. [Evaluating probabilistic ecological forecasts](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecy.3431) *Ecology* 102.8 (2021) e03431. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/forecast_evaluation.html ================================================ Forecasting and forecast evaluation in mvgam

Forecasting and forecast evaluation in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules.

Simulating discrete time series

We begin by simulating some data to show how forecasts are computed and evaluated in mvgam. The sim_mvgam() function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting trend_model = GP() and prop_trend = 0.75, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing.

set.seed(1)
simdat <- sim_mvgam(
  T = 100,
  n_series = 3,
  mu = 2,
  trend_model = GP(),
  prop_trend = 0.75,
  family = poisson(),
  prop_missing = 0.10
)

The returned object is a list containing training and testing data (sim_mvgam() automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data

str(simdat)
#> List of 6
#>  $ data_train        :'data.frame':  225 obs. of  5 variables:
#>   ..$ y     : int [1:225] 6 NA 11 2 5 20 7 8 NA 11 ...
#>   ..$ season: int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>   ..$ year  : int [1:225] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>  $ data_test         :'data.frame':  75 obs. of  5 variables:
#>   ..$ y     : int [1:75] 4 23 8 3 NA 3 1 20 8 3 ...
#>   ..$ season: int [1:75] 4 4 4 5 5 5 6 6 6 7 ...
#>   ..$ year  : int [1:75] 7 7 7 7 7 7 7 7 7 7 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:75] 76 76 76 77 77 77 78 78 78 79 ...
#>  $ true_corrs        : num [1:3, 1:3] 1 0.0861 0.1161 0.0861 1 ...
#>  $ true_trends       : num [1:100, 1:3] -0.851 -0.758 -0.664 -0.571 -0.48 ...
#>  $ global_seasonality: num [1:100] -0.966 -0.197 0.771 1.083 0.37 ...
#>  $ trend_params      :List of 2
#>   ..$ alpha: num [1:3] 0.883 0.936 1.036
#>   ..$ rho  : num [1:3] 7.54 4.01 7.49

Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts:

plot_mvgam_series(
  data = simdat$data_train,
  series = "all"
)

Plotting time series features for GAM models in mvgam

For individual series, we can plot the training and testing data, as well as some more specific features of the observed data:

plot_mvgam_series(
  data = simdat$data_train,
  newdata = simdat$data_test,
  series = 1
)

Plotting time series features for GAM models in mvgam

Modelling dynamics with splines

The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible:

mod1 <- mvgam(
  y ~ s(season, bs = "cc", k = 8) +
    s(time, by = series, bs = "cr", k = 20),
  knots = list(season = c(0.5, 12.5)),
  trend_model = "None",
  data = simdat$data_train,
  silent = 2
)

The model fits without issue:

summary(mod1, include_betas = FALSE)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#> <environment: 0x0000022e41d5a728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.9 1.9     2    1   979
#> 
#> Approximate significance of GAM smooths:
#>                          edf Ref.df Chi.sq  p-value    
#> s(season)              3.418      6  22.91  < 2e-16 ***
#> s(time):seriesseries_1 8.763     19  30.85 4.25e-06 ***
#> s(time):seriesseries_2 9.635     19  41.49  < 2e-16 ***
#> s(time):seriesseries_3 6.676     19  56.28   0.0862 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✖ 51 of 2000 iterations saturated the maximum tree depth of 10 (2.55%)
#>     Try a larger max_treedepth to avoid saturation
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear

conditional_effects(mod1, type = "link")

Plotting GAM smooth functions using mvgamPlotting GAM smooth functions using mvgam

Modelling dynamics with a correlated AR1

Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See ?AR for more details.

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  silent = 1
)

The summary for this model now contains information on the autoregressive and process error parameters for each time series:

summary(mod2, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x0000022e41d5a728>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 8) - 1
#> <environment: 0x0000022e41d5a728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(cor = TRUE)
#> 
#> N process models:
#> 3 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.8   2   2.4 1.01   512
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.23 0.32  0.44 1.01   318
#> sigma[2] 0.30 0.43  0.58 1.01   498
#> sigma[3] 0.18 0.25  0.36 1.01   329
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.75 0.89  0.99 1.01   438
#> ar1[2] 0.66 0.83  0.96 1.02   379
#> ar1[3] 0.87 0.96  1.00 1.01   478
#> 
#> Approximate significance of GAM process smooths:
#>             edf Ref.df Chi.sq  p-value    
#> s(season) 1.737      6  23.81 1.22e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

We can plot the posteriors for these parameters, and for any other parameter for that matter, using bayesplot routines. First the autoregressive parameters:

mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

And now the variance (\(\sigma\)) parameters:

mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

We can again plot the conditional seasonal effect:

conditional_effects(mod2, type = "link")

Plotting latent Gaussian Process effects in mvgam and marginaleffects

The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts

Forecasting with the forecast() function

Probabilistic forecasts can be computed in two main ways in mvgam. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding newdata to the forecast() function. It is crucial that any newdata fed to the forecast() function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the forecast() function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see ?forecast.mvgam for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions

fc_mod1 <- forecast(mod1, newdata = simdat$data_test)
fc_mod2 <- forecast(mod2, newdata = simdat$data_test)

The objects we have created are of class mvgam_forecast, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data:

str(fc_mod1)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000022e41d5a728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 6 2 7 11 8 6 9 11 7 4 ...
#>   ..$ series_2: int [1:75] NA 5 8 2 1 NA 2 4 0 2 ...
#>   ..$ series_3: int [1:75] 11 20 NA 36 44 34 57 50 26 28 ...
#>  $ train_times       :List of 3
#>   ..$ series_1: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ series_2: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ series_3: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:25] 4 3 1 3 1 NA NA 7 9 8 ...
#>   ..$ series_2: int [1:25] 23 NA 20 20 14 7 6 6 6 1 ...
#>   ..$ series_3: int [1:25] 8 3 8 3 NA 1 1 9 8 NA ...
#>  $ test_times        :List of 3
#>   ..$ series_1: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>   ..$ series_2: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>   ..$ series_3: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:75] 3 3 1 0 2 3 5 5 2 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:2000, 1:75] 3 4 2 5 7 8 7 2 7 10 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:2000, 1:75] 11 28 13 12 14 20 12 7 31 27 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:25] 4 1 1 5 5 6 2 3 2 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,1]" "ypred[77,1]" "ypred[78,1]" "ypred[79,1]" ...
#>   ..$ series_2: num [1:2000, 1:25] 26 33 21 34 12 28 33 16 23 39 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,2]" "ypred[77,2]" "ypred[78,2]" "ypred[79,2]" ...
#>   ..$ series_3: num [1:2000, 1:25] 10 5 3 7 4 2 8 11 3 10 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,3]" "ypred[77,3]" "ypred[78,3]" "ypred[79,3]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

We can plot the forecasts for some series from each model using the S3 plot method for objects of this class:

plot(fc_mod1, series = 1)

plot(fc_mod2, series = 1)


plot(fc_mod1, series = 2)

plot(fc_mod2, series = 2)

Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment.

Forecasting with newdata in mvgam()

The second way we can produce forecasts in mvgam is to feed the testing data directly to the mvgam() function as newdata. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the generated quantities block in Stan. As an example, we can refit mod2 but include the testing data for automatic forecasts:

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  newdata = simdat$data_test,
  silent = 2
)

Because the model already contains a forecast distribution, we do not need to feed newdata to the forecast() function:

fc_mod2 <- forecast(mod2)

The forecasts will be nearly identical to those calculated previously:

plot(fc_mod2, series = 1)

Plotting posterior forecast distributions using mvgam and R

Scoring forecast distributions

A primary purpose of the mvgam_forecast class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See ?mvgam::score.mvgam_forecast to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution.

crps_mod1 <- score(fc_mod1, score = "crps")
str(crps_mod1)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.993 0.817 0.334 0.998 0.277 ...
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_2  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 2.01 NA 6.55 14.69 17.43 ...
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 0 0 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_3  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 3.487 0.463 4.064 0.5 NA ...
#>   ..$ in_interval   : num [1:25] 0 1 0 1 NA 1 1 0 0 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 6.49 NA 10.95 16.18 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "sum_crps" "sum_crps" "sum_crps" "sum_crps" ...
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   0.9932195           1            0.9            1       crps
#> 2   0.8173380           1            0.9            2       crps
#> 3   0.3338443           1            0.9            3       crps
#> 4   0.9980710           1            0.9            4       crps
#> 5   0.2773030           1            0.9            5       crps
#> 6          NA          NA            0.9            6       crps
#> 7          NA          NA            0.9            7       crps
#> 8   6.1295615           0            0.9            8       crps
#> 9   8.2855480           0            0.9            9       crps
#> 10  7.4110365           0            0.9           10       crps
#> 11 21.3898007           0            0.9           11       crps
#> 12 35.2857677           0            0.9           12       crps
#> 13 37.2882082           0            0.9           13       crps
#> 14 36.4251945           0            0.9           14       crps
#> 15 39.3858395           0            0.9           15       crps
#> 16 42.3677532           0            0.9           16       crps
#> 17 42.5461592           0            0.9           17       crps
#> 18 12.7316780           0            0.9           18       crps
#> 19 13.7700235           0            0.9           19       crps
#> 20  9.7282697           0            0.9           20       crps
#> 21  4.7711443           0            0.9           21       crps
#> 22  4.8054445           0            0.9           22       crps
#> 23  2.7825032           0            0.9           23       crps
#> 24  0.8591737           1            0.9           24       crps
#> 25  3.7808390           0            0.9           25       crps

The returned list contains a data.frame for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the in_interval column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval:

crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6)
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   0.9932195           1            0.6            1       crps
#> 2   0.8173380           1            0.6            2       crps
#> 3   0.3338443           1            0.6            3       crps
#> 4   0.9980710           1            0.6            4       crps
#> 5   0.2773030           1            0.6            5       crps
#> 6          NA          NA            0.6            6       crps
#> 7          NA          NA            0.6            7       crps
#> 8   6.1295615           0            0.6            8       crps
#> 9   8.2855480           0            0.6            9       crps
#> 10  7.4110365           0            0.6           10       crps
#> 11 21.3898007           0            0.6           11       crps
#> 12 35.2857677           0            0.6           12       crps
#> 13 37.2882082           0            0.6           13       crps
#> 14 36.4251945           0            0.6           14       crps
#> 15 39.3858395           0            0.6           15       crps
#> 16 42.3677532           0            0.6           16       crps
#> 17 42.5461592           0            0.6           17       crps
#> 18 12.7316780           0            0.6           18       crps
#> 19 13.7700235           0            0.6           19       crps
#> 20  9.7282697           0            0.6           20       crps
#> 21  4.7711443           0            0.6           21       crps
#> 22  4.8054445           0            0.6           22       crps
#> 23  2.7825032           0            0.6           23       crps
#> 24  0.8591737           0            0.6           24       crps
#> 25  3.7808390           0            0.6           25       crps

We can also compare forecasts against out of sample observations using the Expected Log Predictive Density (ELPD; also known as the log score). The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the forecast() function:

link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link")
score(link_mod1, score = "elpd")$series_1
#>         score eval_horizon score_type
#> 1   -2.155156            1       elpd
#> 2   -1.956860            2       elpd
#> 3   -1.242909            3       elpd
#> 4   -2.208022            4       elpd
#> 5   -1.218081            5       elpd
#> 6          NA            6       elpd
#> 7          NA            7       elpd
#> 8   -7.113590            8       elpd
#> 9   -8.499192            9       elpd
#> 10  -7.975085           10       elpd
#> 11 -18.627673           11       elpd
#> 12 -30.187736           12       elpd
#> 13 -28.528770           13       elpd
#> 14 -27.474431           14       elpd
#> 15 -27.138400           15       elpd
#> 16 -24.018949           16       elpd
#> 17 -28.766709           17       elpd
#> 18  -9.455606           18       elpd
#> 19 -10.169118           19       elpd
#> 20  -7.741233           20       elpd
#> 21  -6.998068           21       elpd
#> 22  -7.030657           22       elpd
#> 23  -5.715523           23       elpd
#> 24  -3.015423           24       elpd
#> 25  -6.271717           25       elpd

Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. mvgam offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute:

energy_mod2 <- score(fc_mod2, score = "energy")
str(energy_mod2)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_2  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 1 1 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_3  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 NA 1 1 1 1 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 4.74 NA 5.03 5.36 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "energy" "energy" "energy" "energy" ...

The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the all_series slot):

energy_mod2$all_series
#>        score eval_horizon score_type
#> 1   4.736450            1     energy
#> 2         NA            2     energy
#> 3   5.025547            3     energy
#> 4   5.363993            4     energy
#> 5         NA            5     energy
#> 6         NA            6     energy
#> 7         NA            7     energy
#> 8   3.918395            8     energy
#> 9   4.113319            9     energy
#> 10        NA           10     energy
#> 11 13.149358           11     energy
#> 12 22.547040           12     energy
#> 13        NA           13     energy
#> 14 21.170257           14     energy
#> 15 24.184433           15     energy
#> 16 25.110374           16     energy
#> 17 27.945911           17     energy
#> 18  6.180386           18     energy
#> 19 10.674543           19     energy
#> 20  4.093666           20     energy
#> 21  2.870332           21     energy
#> 22  3.443291           22     energy
#> 23        NA           23     energy
#> 24  8.866093           24     energy
#> 25  7.883124           25     energy

You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (mod2) is better, while a positive value means the spline model (mod1) is better.

crps_mod1 <- score(fc_mod1, score = "crps")
crps_mod2 <- score(fc_mod2, score = "crps")

diff_scores <- crps_mod2$series_1$score -
  crps_mod1$series_1$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))



diff_scores <- crps_mod2$series_2$score -
  crps_mod1$series_2$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))


diff_scores <- crps_mod2$series_3$score -
  crps_mod1$series_3$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))

The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data

Further reading

The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules:

Clark N.J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ 13:e18929 (2025) https://doi.org/10.7717/peerj.18929

Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles and practice. OTexts, (2018).

Gneiting, Tilmann, and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation Journal of the American statistical Association 102.477 (2007) 359-378.

Simonis, Juniper L., et al. Evaluating probabilistic ecological forecasts Ecology 102.8 (2021) e03431.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/mvgam_overview.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## ----Access time series data----------------------------------------------------- data("portal_data") ## ----Inspect data format and structure------------------------------------------- head(portal_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(portal_data) ## -------------------------------------------------------------------------------- data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ## ----Wrangle data for modelling-------------------------------------------------- portal_data %>% # Filter the data to only contain captures of the 'PP' dplyr::filter(series == 'PP') %>% droplevels() %>% dplyr::mutate(count = captures) %>% # Add a 'year' variable dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data ## -------------------------------------------------------------------------------- head(model_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) ## ----Summarise variables--------------------------------------------------------- summary(model_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = model_data, series = 1, y = "count") ## -------------------------------------------------------------------------------- model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) levels(model_data$year_fac) ## ----model1, include=FALSE, results='hide'--------------------------------------- model1 <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model1 <- mvgam( # count ~ s(year_fac, bs = "re") - 1, # family = poisson(), # data = model_data # ) ## -------------------------------------------------------------------------------- get_mvgam_priors( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ## -------------------------------------------------------------------------------- summary(model1) ## ----Extract coefficient posteriors---------------------------------------------- beta_post <- as.data.frame(model1, variable = "betas") dplyr::glimpse(beta_post) ## -------------------------------------------------------------------------------- stancode(model1) ## ----Plot random effect estimates------------------------------------------------ plot(model1, type = "re") ## -------------------------------------------------------------------------------- mcmc_plot( object = model1, variable = "betas", type = "areas" ) ## -------------------------------------------------------------------------------- pp_check(object = model1) ## ----Plot posterior hindcasts---------------------------------------------------- plot(model1, type = "forecast") ## ----Extract posterior hindcast-------------------------------------------------- hc <- hindcast(model1) str(hc) ## ----Extract hindcasts on the linear predictor scale----------------------------- hc <- hindcast(model1, type = "link") range(hc$hindcasts$PP) ## ----Plot posterior residuals---------------------------------------------------- plot(model1, type = "residuals") ## -------------------------------------------------------------------------------- model_data %>% dplyr::filter(time <= 70) -> data_train model_data %>% dplyr::filter(time > 70) -> data_test ## ----include=FALSE, message=FALSE, warning=FALSE--------------------------------- model1b <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model1b <- mvgam( # count ~ s(year_fac, bs = "re") - 1, # family = poisson(), # data = data_train, # newdata = data_test # ) ## ----Plotting predictions against test data-------------------------------------- plot(model1b, type = "forecast", newdata = data_test) ## ----Extract posterior forecasts------------------------------------------------- fc <- forecast(model1b) str(fc) ## ----model2, include=FALSE, message=FALSE, warning=FALSE------------------------- model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model2 <- mvgam( # count ~ s(year_fac, bs = "re") + # ndvi_ma12 - 1, # family = poisson(), # data = data_train, # newdata = data_test # ) ## ----class.output="scroll-300"--------------------------------------------------- summary(model2) ## ----Posterior quantiles of model coefficients----------------------------------- coef(model2) ## -------------------------------------------------------------------------------- beta_post <- as.data.frame(model2, variable = "betas") dplyr::glimpse(beta_post) ## ----Histogram of NDVI effects--------------------------------------------------- hist( beta_post$ndvi_ma12, xlim = c( -1 * max(abs(beta_post$ndvi_ma12)), max(abs(beta_post$ndvi)) ), col = "darkred", border = "white", xlab = expression(beta[NDVI]), ylab = "", yaxt = "n", main = "", lwd = 2 ) abline(v = 0, lwd = 2.5) ## ----warning=FALSE--------------------------------------------------------------- conditional_effects(model2) ## ----model3, include=FALSE, message=FALSE, warning=FALSE------------------------- model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model3 <- mvgam( # count ~ s(time, bs = "bs", k = 15) + # ndvi_ma12, # family = poisson(), # data = data_train, # newdata = data_test # ) ## -------------------------------------------------------------------------------- summary(model3) ## ----warning=FALSE--------------------------------------------------------------- conditional_effects(model3, type = "link") ## ----class.output="scroll-300"--------------------------------------------------- stancode(model3) ## -------------------------------------------------------------------------------- plot(model3, type = "forecast", newdata = data_test) ## ----Plot extrapolated temporal functions using newdata-------------------------- plot_mvgam_smooth( model3, smooth = "s(time)", # pass newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame( time = 1:max(data_test$time), ndvi_ma12 = 0 ) ) abline(v = max(data_train$time), lty = "dashed", lwd = 2) ## ----model4, include=FALSE------------------------------------------------------- model4 <- mvgam( count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR(), parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model4 <- mvgam( # count ~ s(ndvi_ma12, k = 6), # family = poisson(), # data = data_train, # newdata = data_test, # trend_model = AR() # ) ## ----Summarise the mvgam autocorrelated error model, class.output="scroll-300"---- summary(model4) ## -------------------------------------------------------------------------------- plot(model4, type = "forecast", newdata = data_test) ## -------------------------------------------------------------------------------- plot(model4, type = "trend", newdata = data_test) ## -------------------------------------------------------------------------------- loo_compare(model3, model4) ## -------------------------------------------------------------------------------- fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = "drps") score_mod4 <- score(fc_mod4, score = "drps") sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ================================================ FILE: doc/mvgam_overview.Rmd ================================================ --- title: "Overview of the mvgam package" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Overview of the mvgam package} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to give a general overview of the `mvgam` package and its primary functions. ## Dynamic GAMs `mvgam` is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal *trend* evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"}. Briefly, assume $\tilde{\boldsymbol{y}}_{i,t}$ is the conditional expectation of response variable $\boldsymbol{i}$ at time $\boldsymbol{t}$. Assuming $\boldsymbol{y_i}$ is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as: $$for~i~in~1:N_{series}~...$$ $$for~t~in~1:N_{timepoints}~...$$ $$g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,$$ Here $\alpha$ are the unknown intercepts, the $\boldsymbol{s}$'s are unknown smooth functions of covariates ($\boldsymbol{x}$'s), which can potentially vary among the response series, and $\boldsymbol{z}$ are dynamic latent processes. Each smooth function $\boldsymbol{s_j}$ is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between $\boldsymbol{x}_{j}$ and $g^{-1}(\tilde{\boldsymbol{y}})$. The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see [this blogpost on how to interpret nonlinear effects from Generalized Additive Models](https://ecogambler.netlify.app/blog/interpreting-gams/){target="_blank"}. Latent processes are captured with $\boldsymbol{Z}\boldsymbol{z}_{i,t}$, where $\boldsymbol{Z}$ is an $i~by~k$ matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and $\boldsymbol{z}_{k,t}$ are a set of $K$ latent factors that can also include their own GAM linear predictors (see the [State-Space models vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html)), the [N-mixtures vignette](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html) and the example in [`jsdgam`](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html) to get an idea of how flexible these processes can be. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. `mvgam` supports a number of different observation families, which are summarized below: ## Supported observation families |Distribution | Function | Support | Extra parameter(s) | |:----------------:|:---------------:| :------------------------------------------------:|:--------------------:| |Gaussian (identity link) | `gaussian()` | Real values in $(-\infty, \infty)$ | $\sigma$ | |Student's T (identity link) | `student-t()` | Heavy-tailed real values in $(-\infty, \infty)$ | $\sigma$, $\nu$ | |LogNormal (identity link) | `lognormal()` | Positive real values in $[0, \infty)$ | $\sigma$ | |Gamma (log link) | `Gamma()` | Positive real values in $[0, \infty)$ | $\alpha$ | |Beta (logit link) | `betar()` | Real values (proportional) in $[0,1]$ | $\phi$ | |Bernoulli (logit link) | `bernoulli()` | Binary data in ${0,1}$ | - | |Poisson (log link) | `poisson()` | Non-negative integers in $(0,1,2,...)$ | - | |Negative Binomial2 (log link)| `nb()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Binomial (logit link) | `binomial()` | Non-negative integers in $(0,1,2,...)$ | - | |Beta-Binomial (logit link) | `beta_binomial()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Poisson Binomial N-mixture (log link)| `nmix()` | Non-negative integers in $(0,1,2,...)$ | - | For all supported observation families, any extra parameters that need to be estimated (i.e. the $\sigma$ in a Gaussian model or the $\phi$ in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using `share_obs_params = TRUE` in `mvgam()`. Note that default link functions cannot currently be changed. ## Supported temporal dynamic processes As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the `mvgam()` function, the user chooses between different process models with the `trend_model` argument. Available process models are described in detail below. ### Correlated multivariate processes If more than one observational unit (usually referred to as 'series') is included in `data` $(N_{series} > 1)$, use `trend_model = ZMVN()` to set up a model where the outcomes for different observational units may be correlated according to: \begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*} The covariance matrix $\Sigma$ will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Note that this `trend_model` does not assume that measurements occur over *time*, as users can specify what variable in the `data` represents the unit of analysis (i.e. outcomes could be counts of different *species* across different *sites* or *regions*, for example; see [`?ZMVN()](https://nicholasjclark.github.io/mvgam/reference/ZMVN.html) for guidelines). ### Independent Random Walks Use `trend_model = 'RW'` or `trend_model = RW()` to set up a model where each series in `data` has independent latent temporal dynamics of the form: \begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*} Process error parameters $\sigma$ are modeled independently for each series. If a moving average process is required, use `trend_model = RW(ma = TRUE)` to set up the following: \begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*} Moving average coefficients $\theta$ are independently estimated for each series and will be forced to be stationary by default $(abs(\theta)<1)$. Only moving averages of order $q=1$ are currently allowed. ### Multivariate Random Walks If more than one series is included in `data` $(N_{series} > 1)$, a multivariate Random Walk can be set up using `trend_model = RW(cor = TRUE)`, resulting in the following: \begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*} Where the latent process estimate $z_t$ now takes the form of a vector. The covariance matrix $\Sigma$ will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients $\theta$ will be parameterised as an $N_{series} * N_{series}$ matrix ### Autoregressive processes Autoregressive models up to $p=3$, in which the autoregressive coefficients are estimated independently for each series, can be used by specifying `trend_model = 'AR1'`, `trend_model = 'AR2'`, `trend_model = 'AR3'`, or `trend_model = AR(p = 1, 2, or 3)`. For example, a univariate AR(1) model takes the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*} All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See `?get_mvgam_priors` for more details. ### Vector Autoregressive processes A Vector Autoregression of order $p=1$ can be specified if $N_{series} > 1$ using `trend_model = 'VAR1'` or `trend_model = VAR()`. A VAR(1) model takes the form: \begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*} Where $A$ is an $N_{series} * N_{series}$ matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time $t$ on its own estimate at time $t+1$), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time $t$ on the process for another series at time $t+1$). By default, the covariance matrix $\Sigma$ will assume no process error covariance by fixing the off-diagonals to $0$. To allow for correlated errors, use `trend_model = 'VAR1cor'` or `trend_model = VAR(cor = TRUE)`. A moving average of order $q=1$ can also be included using `trend_model = VAR(ma = TRUE, cor = TRUE)`. Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in [Heaps 2022](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648) Heaps, Sarah E. "[Enforcing stationarity through the prior in vector autoregressions.](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648)" *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. ### Hierarchical processes Several of the above-mentioned `trend_model` options can be modified to account for grouping structures in `data` by setting up hierarchical latent processes. If an optional grouping variable (`gr`; which must be a `factor` in the supplied `data`) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of `gr` are modelled hierarchically: \begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*} where $\Omega_{global}$ is a *global* correlation matrix, $\Omega_{group, local}$ is a *local deviation* correlation matrix and $\alpha_{cor}$ is a weighting parameter controlling how strongly the local correlation matrix $\Omega_{group}$ (i.e. the derived correlation matrix that will be used for each level of the grouping factor `gr`) is shrunk towards the global correlation matrix $\Omega_{global}$ (larger values of $\alpha_{cor}$ indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. *financial assets* or *species*, for example) are measured in different strata (i.e. *regions*, *countries* or *experimental units*, for example). Currently hierarchical correlations can be included for `AR()`, `VAR()` or `ZMVN()` `trend_model` options. ### Gaussian Processes The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using `trend_model = 'GP'`. The dynamics for each latent process are modelled as: \begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*} The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on $\rho$ (often called the length scale parameter) to control how quickly the correlations between the model's errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter $\alpha$, which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. `mvgam` capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which [considerably speed up computation at little cost to accuracy or prediction performance](https://link.springer.com/article/10.1007/s11222-022-10167-2){target="_blank"}. ### Piecewise logistic and linear trends Modeling growth for many types of time series is often similar to modeling population growth in natural ecosystems, where there series exhibits nonlinear growth that saturates at some particular carrying capacity. The logistic trend model available in {`mvgam`} allows for a time-varying capacity $C(t)$ as well as a non-constant growth rate. Changes in the base growth rate $k$ are incorporated by explicitly defining changepoints throughout the training period where the growth rate is allowed to vary. The changepoint vector $a$ is represented as a vector of `1`s and `0`s, and the rate of growth at time $t$ is represented as $k+a(t)^T\delta$. Potential changepoints are selected uniformly across the training period, and the number of changepoints, as well as the flexibility of the potential rate changes at these changepoints, can be controlled using `trend_model = PW()`. The full piecewise logistic growth model is then: \begin{align*} z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*} For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as: \begin{align*} z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*} In both trend models, $m$ is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends [in this paper by Taylor and Letham](https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1380080){target="_blank"}. Sean J. Taylor and Benjamin Letham. "[Forecasting at scale.](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1380080)" *The American Statistician* 72.1 (2018): 37-45. ### Continuous time AR(1) processes Most trend models in the `mvgam()` function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we'd like to model autoregressive properties of these. The `trend_model = CAR()` can be useful to set up these models, which currently only support autoregressive processes of order `1`. The evolution of the latent dynamic process follows the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*} Where $distance$ is a vector of non-negative measurements of the time differences between successive observations. These models are perhaps more widely known as Ornstein–Uhlenbeck processes. See the **Examples** section in `?CAR` for an illustration of how to set these models up. ## Regression formulae `mvgam` supports an observation model regression formula, built off the `mgcv` package, as well as an optional process model regression formula. The formulae supplied to `mvgam()` are exactly like those supplied to `glm()` except that smooth terms, `s()`, `te()`, `ti()` and `t2()`, time-varying effects using `dynamic()`, monotonically increasing (using `s(x, bs = 'moi')`) or decreasing splines (using `s(x, bs = 'mod')`; see `?smooth.construct.moi.smooth.spec` for details), as well as Gaussian Process functions using `gp()`, can be added to the right hand side (and `.` is not supported in `mvgam` formulae). See `?mvgam_formulae` for more guidance. For setting up State-Space models, the optional process model formula can be used (see [the State-Space model vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) and [the shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) for guidance on using trend formulae). ## Example time series data The 'portal_data' object contains time series of rodent captures from the Portal Project, [a long-term monitoring study based near the town of Portal, Arizona](https://portal.weecology.org/){target="_blank"}. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970's. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc...). You can read about the full sampling protocol [in this preprint by Ernest et al on the Biorxiv](https://www.biorxiv.org/content/10.1101/332783v3.full){target="_blank"}. ```{r Access time series data} data("portal_data") ``` As the data come pre-loaded with the `mvgam` package, you can read a little about it in the help page using `?portal_data`. Before working with data, it is important to inspect how the data are structured, first using `head()`: ```{r Inspect data format and structure} head(portal_data) ``` But the `glimpse()` function in `dplyr` is also useful for understanding how variables are structured ```{r} dplyr::glimpse(portal_data) ``` We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse *Chaetodipus penicillatus*. This species is interesting in that it goes into a kind of "hibernation" during the colder months, leading to very low captures during the winter period ## Manipulating data for modelling Manipulating the data into a 'long' format is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ``` Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data. We also must supply a variable labelled `time` to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the [State-Space vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html). Below are the steps needed to shape our `portal_data` object into the correct form. First, we create a `time` variable, select the column representing counts of our target species (`PP`), and select appropriate variables that we can use as predictors ```{r Wrangle data for modelling} portal_data %>% # Filter the data to only contain captures of the 'PP' dplyr::filter(series == 'PP') %>% droplevels() %>% dplyr::mutate(count = captures) %>% # Add a 'year' variable dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data ``` The data now contain six variables: `series`, a factor indexing which time series each observation belongs to `year`, the year of sampling `time`, the indicator of which time step each observation belongs to `count`, the response variable representing the number of captures of the species `PP` in each sampling observation `mintemp`, the monthly average minimum temperature at each time step `ndvi_ma12`, a 12-month moving average of the monthly Normalized Difference Vegetation Index at each time step Now check the data structure again ```{r} head(model_data) ``` ```{r} dplyr::glimpse(model_data) ``` You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values ```{r Summarise variables} summary(model_data) ``` We have some `NA`s in our response variable `count`. These observations will generally be thrown out by most modelling packages in \R. But as you will see when we work through the tutorials, `mvgam` keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using `plot_mvgam_series()`: ```{r} plot_mvgam_series(data = model_data, series = 1, y = "count") ``` ## GLMs with temporal random effects Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our `count` observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for `year`. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc...). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert `year` to a factor so that we can use a random effect basis in `mvgam`. See `?smooth.terms` and `?smooth.construct.re.smooth.spec` for details about the `re` basis construction that is used by both `mvgam` and `mgcv` ```{r} model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data ``` Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data ```{r} dplyr::glimpse(model_data) levels(model_data$year_fac) ``` We are now ready for our first `mvgam` model. The syntax will be familiar to users who have previously built models with `mgcv`. But for a refresher, see `?formula.gam` and the examples in `?gam`. Random effects can be specified using the `s` wrapper with the `re` basis. Note that we can also suppress the primary intercept using the usual `R` formula syntax `- 1`. `mvgam` has a number of possible observation families that can be used, see `?mvgam_families` for more information. We will use `Stan` as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the `Cmdstan` backend when possible, so it is recommended that users have an up-to-date installation of `Cmdstan` and the associated `cmdstanr` interface on their machines (note that you can set the backend yourself using the `backend` argument: see `?mvgam` for details). Interested users should consult the [`Stan` user's guide](https://mc-stan.org/docs/stan-users-guide/index.html){target="_blank"} for more information about the software and the enormous variety of models that can be tackled with HMC. ```{r model1, include=FALSE, results='hide'} model1 <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data, parallel = FALSE ) ``` ```{r eval=FALSE} model1 <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` The model can be described mathematically for each timepoint $t$ as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*} Where the $\beta_{year}$ effects are drawn from a *population* distribution that is parameterized by a common mean $(\mu_{year})$ and variance $(\sigma_{year})$. Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in `brms`. For example, the default priors on $(\mu_{year})$ and $(\sigma_{year})$ can be viewed using the following code: ```{r} get_mvgam_priors(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` See examples in `?get_mvgam_priors` to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the `summary()` to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters ```{r} summary(model1) ``` The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class `brmsfit` can (see `?mvgam::mvgam_draws` for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the $\beta$'s) into a `data.frame` using: ```{r Extract coefficient posteriors} beta_post <- as.data.frame(model1, variable = "betas") dplyr::glimpse(beta_post) ``` With any model fitted in `mvgam`, the underlying `Stan` code can be viewed using the `stancode()` function: ```{r} stancode(model1) ``` ### Plotting effects and residuals Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using `plot.mvgam()` with `type = 're'`. See `?plot.mvgam` for more details about the types of plots that can be produced from fitted `mvgam` objects ```{r Plot random effect estimates} plot(model1, type = "re") ``` ### `bayesplot` support We can also capitalize on most of the useful MCMC plotting functions from the `bayesplot` package to visualize posterior distributions and diagnostics (see `?mvgam::mcmc_plot.mvgam` for details): ```{r} mcmc_plot( object = model1, variable = "betas", type = "areas" ) ``` We can also use the wide range of posterior checking functions available in `bayesplot` (see `?mvgam::ppc_check.mvgam` for details): ```{r} pp_check(object = model1) ``` There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using `plot.mvgam()` with `type = 'forecast'` ```{r Plot posterior hindcasts} plot(model1, type = "forecast") ``` If you wish to extract these hindcasts for other downstream analyses, the `hindcast()` function can be used. This will return a list object of class `mvgam_forecast`. In the `hindcasts` slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example): ```{r Extract posterior hindcast} hc <- hindcast(model1) str(hc) ``` You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation: ```{r Extract hindcasts on the linear predictor scale} hc <- hindcast(model1, type = "link") range(hc$hindcasts$PP) ``` In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the [Dunn-Smyth, or randomized quantile, residual](https://www.jstor.org/stable/1390802){target="_blank"}. Inspect Dunn-Smyth residuals from the model using `plot.mvgam()` with `type = 'residuals'` ```{r Plot posterior residuals} plot(model1, type = "residuals") ``` ## Automatic forecasting for new data These temporal random effects do not have a sense of "time". Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as `newdata`. For splitting, we will make use of the `filter()` function from `dplyr` ```{r} model_data %>% dplyr::filter(time <= 70) -> data_train model_data %>% dplyr::filter(time > 70) -> data_test ``` ```{r include=FALSE, message=FALSE, warning=FALSE} model1b <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model1b <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test ) ``` We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set ```{r Plotting predictions against test data} plot(model1b, type = "forecast", newdata = data_test) ``` As with the `hindcast()` function, we can use the `forecast()` function to automatically extract the posterior distributions for these predictions. This also returns an object of class `mvgam_forecast`, but now it will contain both the hindcasts and forecasts for each series in the data: ```{r Extract posterior forecasts} fc <- forecast(model1b) str(fc) ``` ## Adding predictors as "fixed" effects Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of `ndvi_ma12` as a linear predictor: ```{r model2, include=FALSE, message=FALSE, warning=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the $\beta_{year}$ effects are the same as before but we now have another predictor $(\beta_{ndvi})$ that applies to the `ndvi_ma12` value at each timepoint $t$. Inspect the summary of this model ```{r, class.output="scroll-300"} summary(model2) ``` Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of `ndvi` (and other linear predictor coefficients) using `coef`: ```{r Posterior quantiles of model coefficients} coef(model2) ``` Look at the estimated effect of `ndvi` using using a histogram. This can be done by first extracting the posterior coefficients: ```{r} beta_post <- as.data.frame(model2, variable = "betas") dplyr::glimpse(beta_post) ``` The posterior distribution for the effect of `ndvi_ma12` is stored in the `ndvi_ma12` column. A quick histogram confirms our inference that `log(counts)` respond positively to increases in `ndvi`: ```{r Histogram of NDVI effects} hist(beta_post$ndvi_ma12, xlim = c( -1 * max(abs(beta_post$ndvi_ma12)), max(abs(beta_post$ndvi)) ), col = "darkred", border = "white", xlab = expression(beta[NDVI]), ylab = "", yaxt = "n", main = "", lwd = 2 ) abline(v = 0, lwd = 2.5) ``` ### `marginaleffects` support Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the `marginaleffects` package makes this relatively straightforward. Objects of class `mvgam` can be used with `marginaleffects` to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like `brms`, `mvgam` has the simple `conditional_effects()` function to make quick and informative plots for main effects, which rely on `marginaleffects` support. This will likely be your go-to function for quickly understanding patterns from fitted `mvgam` models ```{r warning=FALSE} conditional_effects(model2) ``` ## Adding predictors as smooths Smooth functions, using penalized splines, are a major feature of `mvgam`. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package `mgcv` will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We'll create a set of 6 basis functions to represent possible variation in the effect of `time` on our outcome.In addition to constructing the basis functions, `mgcv` also creates a penalty matrix $S$, which contains **known** coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters ($\lambda$) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the $\lambda$'s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of `time`. We will need a reasonably complex function (large `k`) to try and accommodate the temporal variation in our observations. Following some [useful advice by Gavin Simpson](https://fromthebottomoftheheap.net/2020/06/03/extrapolating-with-gams/){target="_blank"}, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no `-1` in the formula now): ```{r model3, include=FALSE, message=FALSE, warning=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the smooth function $f_{time}$ is built by summing across a set of weighted basis functions. The basis functions $(b)$ are constructed using a thin plate regression basis in `mgcv`. The weights $(\beta_{smooth})$ are drawn from a penalized multivariate normal distribution where the precision matrix $(\Omega$) is multiplied by a smoothing penalty $(\lambda)$. If $\lambda$ becomes large, this acts to *squeeze* the covariances among the weights $(\beta_{smooth})$, leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before ```{r} summary(model3) ``` The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of `time`. We can visualize `conditional_effects` as before: ```{r warning=FALSE} conditional_effects(model3, type = "link") ``` Inspect the underlying `Stan` code to gain some idea of how the spline is being penalized: ```{r, class.output="scroll-300"} stancode(model3) ``` The line below `// prior for s(time)...` shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix $S$ is penalized by two different smoothing parameters (the $\lambda$'s) to enforce smoothness and reduce overfitting ## Latent dynamics in `mvgam` Forecasts from the above model are not ideal: ```{r} plot(model3, type = "forecast", newdata = data_test) ``` Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly *forever* beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions: ```{r Plot extrapolated temporal functions using newdata} plot_mvgam_smooth( model3, smooth = "s(time)", # pass newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame( time = 1:max(data_test$time), ndvi_ma12 = 0 ) ) abline(v = max(data_train$time), lty = "dashed", lwd = 2) ``` This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of `time`. Now onto another prominent feature of `mvgam`: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the `trend_model` argument (see `?mvgam_trends` for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of `ndvi_ma12` in this model, rather than the parametric term that was used above, to showcase that `mvgam` can include combinations of smooths and dynamic components: ```{r model4, include=FALSE} model4 <- mvgam(count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR(), parallel = FALSE ) ``` ```{r eval=FALSE} model4 <- mvgam( count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR() ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*} Here the term $z_t$ captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process: ```{r Summarise the mvgam autocorrelated error model, class.output="scroll-300"} summary(model4) ``` View posterior hindcasts / forecasts and compare against the out of sample test data ```{r} plot(model4, type = "forecast", newdata = data_test) ``` The trend is evolving as an AR1 process, which we can also view: ```{r} plot(model4, type = "trend", newdata = data_test) ``` In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the `loo` package (a higher value is preferred for this metric): ```{r} loo_compare(model3, model4) ``` The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data. Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the `forecast` and `score` functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric) ```{r} fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = "drps") score_mod4 <- score(fc_mod4, score = "drps") sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ``` A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3) ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/mvgam_overview.html ================================================ Overview of the mvgam package

Overview of the mvgam package

Nicholas J Clark

2026-01-19

The purpose of this vignette is to give a general overview of the mvgam package and its primary functions.

Dynamic GAMs

mvgam is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal trend evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: Ecological Forecasting with Dynamic Generalized Additive Models. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\) is the conditional expectation of response variable \(\boldsymbol{i}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y_i}\) is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as:

\[for~i~in~1:N_{series}~...\] \[for~t~in~1:N_{timepoints}~...\]

\[g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,\] Here \(\alpha\) are the unknown intercepts, the \(\boldsymbol{s}\)’s are unknown smooth functions of covariates (\(\boldsymbol{x}\)’s), which can potentially vary among the response series, and \(\boldsymbol{z}\) are dynamic latent processes. Each smooth function \(\boldsymbol{s_j}\) is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between \(\boldsymbol{x}_{j}\) and \(g^{-1}(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see this blogpost on how to interpret nonlinear effects from Generalized Additive Models. Latent processes are captured with \(\boldsymbol{Z}\boldsymbol{z}_{i,t}\), where \(\boldsymbol{Z}\) is an \(i~by~k\) matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and \(\boldsymbol{z}_{k,t}\) are a set of \(K\) latent factors that can also include their own GAM linear predictors (see the State-Space models vignette), the N-mixtures vignette and the example in jsdgam to get an idea of how flexible these processes can be.

Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. mvgam supports a number of different observation families, which are summarized below:

Supported observation families

Distribution Function Support Extra parameter(s)
Gaussian (identity link) gaussian() Real values in \((-\infty, \infty)\) \(\sigma\)
Student’s T (identity link) student-t() Heavy-tailed real values in \((-\infty, \infty)\) \(\sigma\), \(\nu\)
LogNormal (identity link) lognormal() Positive real values in \([0, \infty)\) \(\sigma\)
Gamma (log link) Gamma() Positive real values in \([0, \infty)\) \(\alpha\)
Beta (logit link) betar() Real values (proportional) in \([0,1]\) \(\phi\)
Bernoulli (logit link) bernoulli() Binary data in \({0,1}\) -
Poisson (log link) poisson() Non-negative integers in \((0,1,2,...)\) -
Negative Binomial2 (log link) nb() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Binomial (logit link) binomial() Non-negative integers in \((0,1,2,...)\) -
Beta-Binomial (logit link) beta_binomial() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Poisson Binomial N-mixture (log link) nmix() Non-negative integers in \((0,1,2,...)\) -

For all supported observation families, any extra parameters that need to be estimated (i.e. the \(\sigma\) in a Gaussian model or the \(\phi\) in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using share_obs_params = TRUE in mvgam(). Note that default link functions cannot currently be changed.

Supported temporal dynamic processes

As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the mvgam() function, the user chooses between different process models with the trend_model argument. Available process models are described in detail below.

Correlated multivariate processes

If more than one observational unit (usually referred to as ‘series’) is included in data \((N_{series} > 1)\), use trend_model = ZMVN() to set up a model where the outcomes for different observational units may be correlated according to:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*}\]

The covariance matrix \(\Sigma\) will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution. Note that this trend_model does not assume that measurements occur over time, as users can specify what variable in the data represents the unit of analysis (i.e. outcomes could be counts of different species across different sites or regions, for example; see `?ZMVN() for guidelines).

Independent Random Walks

Use trend_model = 'RW' or trend_model = RW() to set up a model where each series in data has independent latent temporal dynamics of the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*}\]

Process error parameters \(\sigma\) are modeled independently for each series. If a moving average process is required, use trend_model = RW(ma = TRUE) to set up the following:

\[\begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*}\]

Moving average coefficients \(\theta\) are independently estimated for each series and will be forced to be stationary by default \((abs(\theta)<1)\). Only moving averages of order \(q=1\) are currently allowed.

Multivariate Random Walks

If more than one series is included in data \((N_{series} > 1)\), a multivariate Random Walk can be set up using trend_model = RW(cor = TRUE), resulting in the following:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*}\]

Where the latent process estimate \(z_t\) now takes the form of a vector. The covariance matrix \(\Sigma\) will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution.

Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients \(\theta\) will be parameterised as an \(N_{series} * N_{series}\) matrix

Autoregressive processes

Autoregressive models up to \(p=3\), in which the autoregressive coefficients are estimated independently for each series, can be used by specifying trend_model = 'AR1', trend_model = 'AR2', trend_model = 'AR3', or trend_model = AR(p = 1, 2, or 3). For example, a univariate AR(1) model takes the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*}\]

All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See ?get_mvgam_priors for more details.

Vector Autoregressive processes

A Vector Autoregression of order \(p=1\) can be specified if \(N_{series} > 1\) using trend_model = 'VAR1' or trend_model = VAR(). A VAR(1) model takes the form:

\[\begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*}\]

Where \(A\) is an \(N_{series} * N_{series}\) matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time \(t\) on its own estimate at time \(t+1\)), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time \(t\) on the process for another series at time \(t+1\)). By default, the covariance matrix \(\Sigma\) will assume no process error covariance by fixing the off-diagonals to \(0\). To allow for correlated errors, use trend_model = 'VAR1cor' or trend_model = VAR(cor = TRUE). A moving average of order \(q=1\) can also be included using trend_model = VAR(ma = TRUE, cor = TRUE).

Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in Heaps 2022

Heaps, Sarah E. “Enforcing stationarity through the prior in vector autoregressions.Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hierarchical processes

Several of the above-mentioned trend_model options can be modified to account for grouping structures in data by setting up hierarchical latent processes. If an optional grouping variable (gr; which must be a factor in the supplied data) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of gr are modelled hierarchically:

\[\begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*}\]

where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) (i.e. the derived correlation matrix that will be used for each level of the grouping factor gr) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. financial assets or species, for example) are measured in different strata (i.e. regions, countries or experimental units, for example). Currently hierarchical correlations can be included for AR(), VAR() or ZMVN() trend_model options.

Gaussian Processes

The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using trend_model = 'GP'. The dynamics for each latent process are modelled as:

\[\begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*}\]

The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on \(\rho\) (often called the length scale parameter) to control how quickly the correlations between the model’s errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter \(\alpha\), which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. mvgam capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which considerably speed up computation at little cost to accuracy or prediction performance.

Continuous time AR(1) processes

Most trend models in the mvgam() function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we’d like to model autoregressive properties of these. The trend_model = CAR() can be useful to set up these models, which currently only support autoregressive processes of order 1. The evolution of the latent dynamic process follows the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}\]

Where \(distance\) is a vector of non-negative measurements of the time differences between successive observations. These models are perhaps more widely known as Ornstein–Uhlenbeck processes. See the Examples section in ?CAR for an illustration of how to set these models up.

Regression formulae

mvgam supports an observation model regression formula, built off the mgcv package, as well as an optional process model regression formula. The formulae supplied to mvgam() are exactly like those supplied to glm() except that smooth terms, s(), te(), ti() and t2(), time-varying effects using dynamic(), monotonically increasing (using s(x, bs = 'moi')) or decreasing splines (using s(x, bs = 'mod'); see ?smooth.construct.moi.smooth.spec for details), as well as Gaussian Process functions using gp(), can be added to the right hand side (and . is not supported in mvgam formulae). See ?mvgam_formulae for more guidance.

For setting up State-Space models, the optional process model formula can be used (see the State-Space model vignette and the shared latent states vignette for guidance on using trend formulae).

Example time series data

The ‘portal_data’ object contains time series of rodent captures from the Portal Project, a long-term monitoring study based near the town of Portal, Arizona. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970’s. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc…). You can read about the full sampling protocol in this preprint by Ernest et al on the Biorxiv.

data("portal_data")

As the data come pre-loaded with the mvgam package, you can read a little about it in the help page using ?portal_data. Before working with data, it is important to inspect how the data are structured, first using head():

head(portal_data)
#>   time series captures  ndvi_ma12    mintemp
#> 1    1     DM       20 -0.1721441 -0.7963381
#> 2    1     DO        2 -0.1721441 -0.7963381
#> 3    1     PB        0 -0.1721441 -0.7963381
#> 4    1     PP        0 -0.1721441 -0.7963381
#> 5    2     DM       NA -0.2373635 -1.3347160
#> 6    2     DO       NA -0.2373635 -1.3347160

But the glimpse() function in dplyr is also useful for understanding how variables are structured

dplyr::glimpse(portal_data)
#> Rows: 320
#> Columns: 5
#> $ time      <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
#> $ series    <fct> DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, …
#> $ captures  <int> 20, 2, 0, 0, NA, NA, NA, NA, 36, 5, 0, 0, 40, 3, 0, 1, 29, 3…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.172144125, -0.172144125, -0.172144125, -0.2…
#> $ mintemp   <dbl> -0.79633807, -0.79633807, -0.79633807, -0.79633807, -1.33471…

We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse Chaetodipus penicillatus. This species is interesting in that it goes into a kind of “hibernation” during the colder months, leading to very low captures during the winter period

Manipulating data for modelling

Manipulating the data into a ‘long’ format is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the sim_mvgam() function. See ?sim_mvgam for more details

data <- sim_mvgam(n_series = 4, T = 24)
head(data$data_train, 12)
#>    y season year   series time
#> 1  1      1    1 series_1    1
#> 2  0      1    1 series_2    1
#> 3  1      1    1 series_3    1
#> 4  3      1    1 series_4    1
#> 5  1      2    1 series_1    2
#> 6  1      2    1 series_2    2
#> 7  0      2    1 series_3    2
#> 8  5      2    1 series_4    2
#> 9  0      3    1 series_1    3
#> 10 1      3    1 series_2    3
#> 11 0      3    1 series_3    3
#> 12 3      3    1 series_4    3

Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data. We also must supply a variable labelled time to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the State-Space vignette. Below are the steps needed to shape our portal_data object into the correct form. First, we create a time variable, select the column representing counts of our target species (PP), and select appropriate variables that we can use as predictors

portal_data %>%
  # Filter the data to only contain captures of the 'PP' 
  dplyr::filter(series == 'PP') %>%
  droplevels() %>%
  dplyr::mutate(count = captures) %>%
  # Add a 'year' variable
  dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>%
  # Select the variables of interest to keep in the model_data
  dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data

The data now contain six variables:
series, a factor indexing which time series each observation belongs to
year, the year of sampling
time, the indicator of which time step each observation belongs to
count, the response variable representing the number of captures of the species PP in each sampling observation
mintemp, the monthly average minimum temperature at each time step
ndvi_ma12, a 12-month moving average of the monthly Normalized Difference Vegetation Index at each time step

Now check the data structure again

head(model_data)
#>   series year time count     mintemp   ndvi_ma12
#> 1     PP    1    1     0 -0.79633807 -0.17214413
#> 2     PP    1    2    NA -1.33471597 -0.23736348
#> 3     PP    1    3     0 -1.24166462 -0.21212064
#> 4     PP    1    4     1 -1.08048145 -0.16043812
#> 5     PP    1    5     7 -0.42447625 -0.08267729
#> 6     PP    1    6     7  0.06532892 -0.03692877
dplyr::glimpse(model_data)
#> Rows: 80
#> Columns: 6
#> $ series    <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, …
#> $ year      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
#> $ time      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ count     <int> 0, NA, 0, 1, 7, 7, 8, 8, 4, NA, 0, 0, 0, 0, 0, 0, NA, 2, 4, …
#> $ mintemp   <dbl> -0.79633807, -1.33471597, -1.24166462, -1.08048145, -0.42447…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.237363477, -0.212120638, -0.160438125, -0.0…

You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values

summary(model_data)
#>  series       year           time           count           mintemp       
#>  PP:80   Min.   :1.00   Min.   : 1.00   Min.   : 0.000   Min.   :-2.0978  
#>          1st Qu.:2.00   1st Qu.:20.75   1st Qu.: 1.000   1st Qu.:-1.0808  
#>          Median :4.00   Median :40.50   Median : 5.000   Median :-0.4091  
#>          Mean   :3.85   Mean   :40.50   Mean   : 5.222   Mean   :-0.2151  
#>          3rd Qu.:5.25   3rd Qu.:60.25   3rd Qu.: 8.000   3rd Qu.: 0.6133  
#>          Max.   :7.00   Max.   :80.00   Max.   :21.000   Max.   : 1.4530  
#>                                         NA's   :17                        
#>    ndvi_ma12       
#>  Min.   :-0.66884  
#>  1st Qu.:-0.20869  
#>  Median :-0.16517  
#>  Mean   :-0.09501  
#>  3rd Qu.:-0.03440  
#>  Max.   : 0.74831  
#> 

We have some NAs in our response variable count. These observations will generally be thrown out by most modelling packages in . But as you will see when we work through the tutorials, mvgam keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using plot_mvgam_series():

plot_mvgam_series(data = model_data, series = 1, y = "count")

GLMs with temporal random effects

Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our count observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for year. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc…). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert year to a factor so that we can use a random effect basis in mvgam. See ?smooth.terms and ?smooth.construct.re.smooth.spec for details about the re basis construction that is used by both mvgam and mgcv

model_data %>%
  # Create a 'year_fac' factor version of 'year'
  dplyr::mutate(year_fac = factor(year)) -> model_data

Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data

dplyr::glimpse(model_data)
#> Rows: 80
#> Columns: 7
#> $ series    <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, …
#> $ year      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
#> $ time      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ count     <int> 0, NA, 0, 1, 7, 7, 8, 8, 4, NA, 0, 0, 0, 0, 0, 0, NA, 2, 4, …
#> $ mintemp   <dbl> -0.79633807, -1.33471597, -1.24166462, -1.08048145, -0.42447…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.237363477, -0.212120638, -0.160438125, -0.0…
#> $ year_fac  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
levels(model_data$year_fac)
#> [1] "1" "2" "3" "4" "5" "6" "7"

We are now ready for our first mvgam model. The syntax will be familiar to users who have previously built models with mgcv. But for a refresher, see ?formula.gam and the examples in ?gam. Random effects can be specified using the s wrapper with the re basis. Note that we can also suppress the primary intercept using the usual R formula syntax - 1. mvgam has a number of possible observation families that can be used, see ?mvgam_families for more information. We will use Stan as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the Cmdstan backend when possible, so it is recommended that users have an up-to-date installation of Cmdstan and the associated cmdstanr interface on their machines (note that you can set the backend yourself using the backend argument: see ?mvgam for details). Interested users should consult the Stan user’s guide for more information about the software and the enormous variety of models that can be tackled with HMC.

model1 <- mvgam(
  count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = model_data
)

The model can be described mathematically for each timepoint \(t\) as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*}\]

Where the \(\beta_{year}\) effects are drawn from a population distribution that is parameterized by a common mean \((\mu_{year})\) and variance \((\sigma_{year})\). Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in brms. For example, the default priors on \((\mu_{year})\) and \((\sigma_{year})\) can be viewed using the following code:

get_mvgam_priors(count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = model_data
)
#>                      param_name param_length           param_info
#> 1             vector[1] mu_raw;            1 s(year_fac) pop mean
#> 2 vector<lower=0>[1] sigma_raw;            1   s(year_fac) pop sd
#>                                  prior                 example_change
#> 1               mu_raw ~ std_normal();  mu_raw ~ normal(-0.88, 0.73);
#> 2 sigma_raw ~ inv_gamma(1.418, 0.452); sigma_raw ~ exponential(0.15);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

See examples in ?get_mvgam_priors to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the summary() to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters

summary(model1)
#> GAM formula:
#> count ~ s(year_fac, bs = "re") - 1
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                2.5% 50% 97.5% Rhat n_eff
#> s(year_fac).1 0.930 1.3   1.6    1  2517
#> s(year_fac).2 0.870 1.2   1.5    1  2716
#> s(year_fac).3 0.085 0.6   1.1    1  2154
#> s(year_fac).4 2.000 2.3   2.5    1  2367
#> s(year_fac).5 1.100 1.5   1.8    1  2517
#> s(year_fac).6 1.500 1.8   2.1    1  2511
#> s(year_fac).7 1.800 2.1   2.3    1  2228
#> 
#> GAM group-level estimates:
#>                   2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 0.87 1.5   1.9 1.02   368
#> sd(s(year_fac))   0.35 0.6   1.2 1.01   345
#> 
#> Approximate significance of GAM smooths:
#>               edf Ref.df Chi.sq p-value    
#> s(year_fac) 6.095      7  234.5  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class brmsfit can (see ?mvgam::mvgam_draws for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the \(\beta\)’s) into a data.frame using:

beta_post <- as.data.frame(model1, variable = "betas")
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 7
#> $ `s(year_fac).1` <dbl> 1.42562, 1.13259, 1.60469, 1.05618, 1.30829, 1.36421, …
#> $ `s(year_fac).2` <dbl> 1.360710, 1.224610, 1.352340, 1.080130, 1.495370, 1.24…
#> $ `s(year_fac).3` <dbl> 0.726486, 0.540769, 0.706619, 0.477383, 0.872224, 0.77…
#> $ `s(year_fac).4` <dbl> 2.30283, 2.09318, 2.36101, 2.18330, 2.24543, 2.51212, …
#> $ `s(year_fac).5` <dbl> 1.338800, 0.903048, 1.296670, 1.423650, 1.654660, 1.51…
#> $ `s(year_fac).6` <dbl> 1.90255, 1.88174, 1.72255, 1.94652, 2.00091, 1.78989, …
#> $ `s(year_fac).7` <dbl> 2.26354, 2.15511, 2.05374, 2.10885, 2.23140, 2.23759, …

With any model fitted in mvgam, the underlying Stan code can be viewed using the stancode() function:

stancode(model1)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 7] = mu_raw[1] + b_raw[1 : 7] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ inv_gamma(1.418, 0.452);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior (non-centred) for s(year_fac)...
#>   b_raw[1 : 7] ~ std_normal();
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   array[n, n_series] int ypred;
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Plotting effects and residuals

Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using plot.mvgam() with type = 're'. See ?plot.mvgam for more details about the types of plots that can be produced from fitted mvgam objects

plot(model1, type = "re")

bayesplot support

We can also capitalize on most of the useful MCMC plotting functions from the bayesplot package to visualize posterior distributions and diagnostics (see ?mvgam::mcmc_plot.mvgam for details):

mcmc_plot(
  object = model1,
  variable = "betas",
  type = "areas"
)

We can also use the wide range of posterior checking functions available in bayesplot (see ?mvgam::ppc_check.mvgam for details):

pp_check(object = model1)

There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using plot.mvgam() with type = 'forecast'

plot(model1, type = "forecast")

If you wish to extract these hindcasts for other downstream analyses, the hindcast() function can be used. This will return a list object of class mvgam_forecast. In the hindcasts slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example):

hc <- hindcast(model1)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000018e48f5c728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr "PP"
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:80] 0 NA 0 1 7 7 8 8 4 NA ...
#>  $ train_times       :List of 1
#>   ..$ PP: int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:80] 7 5 6 4 4 8 0 4 5 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:80] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"

You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation:

hc <- hindcast(model1, type = "link")
range(hc$hindcasts$PP)
#> [1] -0.306975  2.594950

In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the Dunn-Smyth, or randomized quantile, residual. Inspect Dunn-Smyth residuals from the model using plot.mvgam() with type = 'residuals'

plot(model1, type = "residuals")

Automatic forecasting for new data

These temporal random effects do not have a sense of “time”. Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as newdata. For splitting, we will make use of the filter() function from dplyr

model_data %>%
  dplyr::filter(time <= 70) -> data_train
model_data %>%
  dplyr::filter(time > 70) -> data_test
model1b <- mvgam(
  count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set

plot(model1b, type = "forecast", newdata = data_test)

As with the hindcast() function, we can use the forecast() function to automatically extract the posterior distributions for these predictions. This also returns an object of class mvgam_forecast, but now it will contain both the hindcasts and forecasts for each series in the data:

fc <- forecast(model1b)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000018e48f5c728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 1 level "PP": 1
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:70] 0 NA 0 1 7 7 8 8 4 NA ...
#>  $ train_times       :List of 1
#>   ..$ PP: int [1:70] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 1
#>   ..$ PP: int [1:10] NA 4 11 8 5 2 5 8 14 14
#>  $ test_times        :List of 1
#>   ..$ PP: int [1:10] 71 72 73 74 75 76 77 78 79 80
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:70] 3 2 4 2 4 1 1 2 3 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:70] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         :List of 1
#>   ..$ PP: num [1:2000, 1:10] 6 7 6 1 5 4 2 5 7 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:10] "ypred[71,1]" "ypred[72,1]" "ypred[73,1]" "ypred[74,1]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

Adding predictors as “fixed” effects

Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of ndvi_ma12 as a linear predictor:

model2 <- mvgam(
  count ~ s(year_fac, bs = "re") +
    ndvi_ma12 - 1,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the \(\beta_{year}\) effects are the same as before but we now have another predictor \((\beta_{ndvi})\) that applies to the ndvi_ma12 value at each timepoint \(t\). Inspect the summary of this model

summary(model2)
#> GAM formula:
#> count ~ ndvi_ma12 + s(year_fac, bs = "re") - 1
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> ndvi_ma12     -0.390 0.045   0.5    1  1595
#> s(year_fac).1  0.900 1.300   1.6    1  2451
#> s(year_fac).2  0.870 1.200   1.5    1  2633
#> s(year_fac).3  0.083 0.590   1.0    1  2163
#> s(year_fac).4  2.000 2.300   2.5    1  1831
#> s(year_fac).5  1.100 1.500   1.8    1  2202
#> s(year_fac).6  1.600 1.800   2.1    1  3045
#> s(year_fac).7 -0.310 1.400   2.8    1  1313
#> 
#> GAM group-level estimates:
#>                   2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 0.72 1.3   1.8 1.01   485
#> sd(s(year_fac))   0.33 0.6   1.3 1.00   507
#> 
#> Approximate significance of GAM smooths:
#>               edf Ref.df Chi.sq p-value    
#> s(year_fac) 5.261      7  177.6  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of ndvi (and other linear predictor coefficients) using coef:

coef(model2)
#>                      2.5%       50%     97.5% Rhat n_eff
#> ndvi_ma12     -0.39239295 0.0454631 0.5002432    1  1595
#> s(year_fac).1  0.90208663 1.2598350 1.5620722    1  2451
#> s(year_fac).2  0.87452045 1.2055800 1.5018923    1  2633
#> s(year_fac).3  0.08311501 0.5945600 1.0277267    1  2163
#> s(year_fac).4  2.01673475 2.2694600 2.4863652    1  1831
#> s(year_fac).5  1.06976925 1.4577300 1.7888065    1  2202
#> s(year_fac).6  1.57686075 1.8460300 2.1110565    1  3045
#> s(year_fac).7 -0.30962118 1.3544650 2.7908662    1  1313

Look at the estimated effect of ndvi using using a histogram. This can be done by first extracting the posterior coefficients:

beta_post <- as.data.frame(model2, variable = "betas")
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 8
#> $ ndvi_ma12       <dbl> -0.59960500, 0.45922600, 0.55956400, 0.39627800, 0.178…
#> $ `s(year_fac).1` <dbl> 1.121480, 1.560650, 1.280340, 1.279720, 1.355980, 1.29…
#> $ `s(year_fac).2` <dbl> 1.10624, 1.37323, 1.15708, 1.02976, 1.20075, 1.17569, …
#> $ `s(year_fac).3` <dbl> 0.7412040, 0.8188330, 0.6476260, 0.4650750, 0.8031380,…
#> $ `s(year_fac).4` <dbl> 2.06531, 2.35775, 2.48328, 2.38348, 2.29324, 2.29980, …
#> $ `s(year_fac).5` <dbl> 1.80219, 1.27466, 1.17318, 1.19170, 1.29867, 1.46982, …
#> $ `s(year_fac).6` <dbl> 1.81798, 1.93282, 1.71232, 1.74702, 2.08540, 1.70745, …
#> $ `s(year_fac).7` <dbl> 1.530030, 2.455950, 1.508500, 1.704790, 2.151070, 1.56…

The posterior distribution for the effect of ndvi_ma12 is stored in the ndvi_ma12 column. A quick histogram confirms our inference that log(counts) respond positively to increases in ndvi:

hist(beta_post$ndvi_ma12,
  xlim = c(
    -1 * max(abs(beta_post$ndvi_ma12)),
    max(abs(beta_post$ndvi))
  ),
  col = "darkred",
  border = "white",
  xlab = expression(beta[NDVI]),
  ylab = "",
  yaxt = "n",
  main = "",
  lwd = 2
)
abline(v = 0, lwd = 2.5)

marginaleffects support

Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the marginaleffects package makes this relatively straightforward. Objects of class mvgam can be used with marginaleffects to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like brms, mvgam has the simple conditional_effects() function to make quick and informative plots for main effects, which rely on marginaleffects support. This will likely be your go-to function for quickly understanding patterns from fitted mvgam models

conditional_effects(model2)

Adding predictors as smooths

Smooth functions, using penalized splines, are a major feature of mvgam. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package mgcv will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We’ll create a set of 6 basis functions to represent possible variation in the effect of time on our outcome.In addition to constructing the basis functions, mgcv also creates a penalty matrix \(S\), which contains known coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters (\(\lambda\)) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the \(\lambda\)’s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of time. We will need a reasonably complex function (large k) to try and accommodate the temporal variation in our observations. Following some useful advice by Gavin Simpson, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no -1 in the formula now):

model3 <- mvgam(
  count ~ s(time, bs = "bs", k = 15) +
    ndvi_ma12,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the smooth function \(f_{time}\) is built by summing across a set of weighted basis functions. The basis functions \((b)\) are constructed using a thin plate regression basis in mgcv. The weights \((\beta_{smooth})\) are drawn from a penalized multivariate normal distribution where the precision matrix \((\Omega\)) is multiplied by a smoothing penalty \((\lambda)\). If \(\lambda\) becomes large, this acts to squeeze the covariances among the weights \((\beta_{smooth})\), leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before

summary(model3)
#> GAM formula:
#> count ~ s(time, bs = "bs", k = 15) + ndvi_ma12
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50% 97.5% Rhat n_eff
#> (Intercept)   0.82  1.10   1.3 1.00   853
#> ndvi_ma12     0.37  1.90   3.5 1.00  1085
#> s(time).1    -9.80 -5.80  -2.6 1.00   348
#> s(time).2     1.20  3.40   6.0 1.00   387
#> s(time).3   -10.00 -6.20  -3.1 1.00   318
#> s(time).4    -1.60  0.78   3.3 1.00   265
#> s(time).5    -3.00 -0.42   2.0 1.00   246
#> s(time).6    -6.40 -3.80  -1.1 1.00   380
#> s(time).7    -1.90  0.56   3.0 1.00   225
#> s(time).8    -2.40 -0.12   2.1 1.01   222
#> s(time).9    -0.55  2.10   4.7 1.00   233
#> s(time).10   -5.70 -3.30  -1.0 1.00   294
#> s(time).11   -2.50  0.57   4.0 1.00   374
#> s(time).12   -6.80 -5.00  -3.2 1.00   491
#> s(time).13    1.90  5.00   8.5 1.00   235
#> s(time).14  -11.00 -3.10   4.0 1.00   241
#> 
#> Approximate significance of GAM smooths:
#>           edf Ref.df Chi.sq  p-value    
#> s(time) 11.76     14  102.7 7.98e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of time. We can visualize conditional_effects as before:

conditional_effects(model3, type = "link")

Inspect the underlying Stan code to gain some idea of how the spline is being penalized:

stancode(model3)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[14, 28] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.4, 2.5);
#>   
#>   // prior for ndvi_ma12...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(time)...
#>   b_raw[3 : 16] ~ multi_normal_prec(zero[3 : 16],
#>                                     S1[1 : 14, 1 : 14] * lambda[1]
#>                                     + S1[1 : 14, 15 : 28] * lambda[2]);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

The line below // prior for s(time)... shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix \(S\) is penalized by two different smoothing parameters (the \(\lambda\)’s) to enforce smoothness and reduce overfitting

Latent dynamics in mvgam

Forecasts from the above model are not ideal:

plot(model3, type = "forecast", newdata = data_test)

Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions:

plot_mvgam_smooth(
  model3,
  smooth = "s(time)",
  # pass newdata to the plot function to generate
  # predictions of the temporal smooth to the end of the
  # testing period
  newdata = data.frame(
    time = 1:max(data_test$time),
    ndvi_ma12 = 0
  )
)
abline(v = max(data_train$time), lty = "dashed", lwd = 2)

This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of time. Now onto another prominent feature of mvgam: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the trend_model argument (see ?mvgam_trends for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of ndvi_ma12 in this model, rather than the parametric term that was used above, to showcase that mvgam can include combinations of smooths and dynamic components:

model4 <- mvgam(
  count ~ s(ndvi_ma12, k = 6),
  family = poisson(),
  data = data_train,
  newdata = data_test,
  trend_model = AR()
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*}\]

Here the term \(z_t\) captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process:

summary(model4)
#> GAM formula:
#> count ~ s(ndvi_ma12, k = 6)
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                  2.5%     50% 97.5% Rhat n_eff
#> (Intercept)    -0.910  0.7700 1.700 1.04   121
#> s(ndvi_ma12).1 -0.140  0.0047 0.190 1.00   830
#> s(ndvi_ma12).2 -0.210 -0.0049 0.200 1.00   685
#> s(ndvi_ma12).3 -0.085 -0.0014 0.069 1.00   810
#> s(ndvi_ma12).4 -0.430  0.0250 0.600 1.00   750
#> s(ndvi_ma12).5 -0.260  0.0640 0.490 1.00   921
#> 
#> Approximate significance of GAM smooths:
#>                edf Ref.df Chi.sq p-value
#> s(ndvi_ma12) 1.285      5  0.357   0.999
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.58 0.81   1.1 1.01   345
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1] 0.78 1.5     3    1   358
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.61 0.82  0.97 1.01   436
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

View posterior hindcasts / forecasts and compare against the out of sample test data

plot(model4, type = "forecast", newdata = data_test)

The trend is evolving as an AR1 process, which we can also view:

plot(model4, type = "trend", newdata = data_test)

In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the loo package (a higher value is preferred for this metric):

loo_compare(model3, model4)
#>        elpd_diff se_diff
#> model3    0.0       0.0 
#> model4 -891.7     153.7

The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data.

Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the forecast and score functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric)

fc_mod3 <- forecast(model3)
fc_mod4 <- forecast(model4)
score_mod3 <- score(fc_mod3, score = "drps")
score_mod4 <- score(fc_mod4, score = "drps")
sum(score_mod4$PP$score, na.rm = TRUE) - 
  sum(score_mod3$PP$score, na.rm = TRUE)
#> [1] -619.4987

A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3)

Further reading

The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice:

Clark, Nicholas J. and Wells, K. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. (2023): 14, 771-784.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

de Sousa, Heitor C., et al. Severe fire regimes decrease resilience of ectothermic populations. Journal of Animal Ecology (2024): 93(11), 1656-1669.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis (2023): 179, 107659.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Zhu, L., et al. Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts. Functional Ecology (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/nmixtures.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) library(dplyr) # A custom ggplot2 theme theme_set( theme_classic(base_size = 12, base_family = "serif") + theme( axis.line.x.bottom = element_line( colour = "black", size = 1 ), axis.line.y.left = element_line( colour = "black", size = 1 ) ) ) options( ggplot2.discrete.colour = c( "#A25050", "#00008b", "darkred", "#010048" ), ggplot2.discrete.fill = c( "#A25050", "#00008b", "darkred", "#010048" ) ) ## -------------------------------------------------------------------------------- set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_1", # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate) ) ## -------------------------------------------------------------------------------- testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) ## -------------------------------------------------------------------------------- dplyr::glimpse(testdat) head(testdat, 12) ## -------------------------------------------------------------------------------- testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ## ----include = FALSE, results='hide'--------------------------------------------- mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ## ----eval = FALSE---------------------------------------------------------------- # mod <- mvgam( # # the observation formula sets up linear predictors for # # detection probability on the logit scale # formula = obs ~ species - 1, # # # the trend_formula sets up the linear predictors for # # the latent abundance processes on the log scale # trend_formula = ~ s(time, by = trend, k = 4) + species, # # # the trend_map takes care of the mapping # trend_map = trend_map, # # # nmix() family and data # family = nmix(), # data = testdat, # # # priors can be set in the usual way # priors = c( # prior(std_normal(), class = b), # prior(normal(1, 1.5), class = Intercept_trend) # ), # samples = 1000 # ) ## -------------------------------------------------------------------------------- code(mod) ## -------------------------------------------------------------------------------- summary(mod) ## -------------------------------------------------------------------------------- loo(mod) ## -------------------------------------------------------------------------------- plot(mod, type = "smooths", trend_effects = TRUE) ## -------------------------------------------------------------------------------- marginaleffects::plot_predictions( mod, condition = "species", type = "detection" ) + ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + theme(legend.position = "none") ## -------------------------------------------------------------------------------- hc <- hindcast(mod, type = "latent_N") # Function to plot latent abundance estimates vs truth plot_latentN <- function(hindcasts, data, species = "sp_1") { all_series <- unique( data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series) ) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call( rbind, lapply(all_series, function(x) { ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] }) ) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) { quantile( x, probs = c( 0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95 ) ) }))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") + geom_line(aes(x = time, y = truth), colour = "black", linewidth = 1) + geom_point( aes(x = time, y = truth), shape = 21, colour = "white", fill = "black", size = 2.5 ) + geom_jitter( data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = "darkred", colour = "white", size = 2.5 ) + labs( y = "Latent abundance (N)", x = "Time", title = species ) } ## -------------------------------------------------------------------------------- plot_latentN(hc, testdat, species = "sp_1") plot_latentN(hc, testdat, species = "sp_2") ## -------------------------------------------------------------------------------- # Date link load(url( "https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda" )) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[,] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) ## -------------------------------------------------------------------------------- mod_data <- do.call( rbind, lapply(1:NROW(data.one.sp$y), function(x) { data.frame( y = data.one.sp$y[x, ], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x, ], det_cov2 = det.cov2[x, ], replicate = 1:NCOL(data.one.sp$y), site = paste0("site", x) ) }) ) %>% dplyr::mutate( species = "sp_1", series = as.factor(paste0(site, "_", species, "_", replicate)) ) %>% dplyr::mutate( site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20 ) ## -------------------------------------------------------------------------------- NROW(mod_data) dplyr::glimpse(mod_data) head(mod_data) ## -------------------------------------------------------------------------------- mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ## ----include = FALSE, results='hide'--------------------------------------------- mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam( # # effects of covariates on detection probability; # # here we use penalized splines for both continuous covariates # formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4), # # # effects of the covariates on latent abundance; # # here we use a penalized spline for the continuous covariate and # # hierarchical intercepts for the factor covariate # trend_formula = ~ s(abund_cov, k = 4) + # s(abund_fac, bs = "re"), # # # link multiple observations to each site # trend_map = trend_map, # # # nmix() family and supplied data # family = nmix(), # data = mod_data, # # # standard normal priors on key regression parameters # priors = c( # prior(std_normal(), class = "b"), # prior(std_normal(), class = "Intercept"), # prior(std_normal(), class = "Intercept_trend"), # prior(std_normal(), class = "sigma_raw_trend") # ), # # # use Stan's variational inference for quicker results # algorithm = "meanfield", # # # no need to compute "series-level" residuals # residuals = FALSE, # samples = 1000 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- marginaleffects::avg_predictions(mod, type = "detection") ## -------------------------------------------------------------------------------- abund_plots <- plot( conditional_effects( mod, type = "link", effects = c( "abund_cov", "abund_fac" ) ), plot = FALSE ) ## -------------------------------------------------------------------------------- abund_plots[[1]] + ylab("Expected latent abundance") ## -------------------------------------------------------------------------------- abund_plots[[2]] + ylab("Expected latent abundance") ## -------------------------------------------------------------------------------- det_plots <- plot( conditional_effects( mod, type = "detection", effects = c( "det_cov", "det_cov2" ) ), plot = FALSE ) ## -------------------------------------------------------------------------------- det_plots[[1]] + ylab("Pr(detection)") det_plots[[2]] + ylab("Pr(detection)") ## -------------------------------------------------------------------------------- fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2) marginaleffects::plot_predictions( mod, newdata = marginaleffects::datagrid( det_cov = unique, det_cov2 = fivenum_round ), by = c("det_cov", "det_cov2"), type = "detection" ) + theme_classic() + ylab("Pr(detection)") ================================================ FILE: doc/nmixtures.Rmd ================================================ --- title: "N-mixtures in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{N-mixtures in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) library(dplyr) # A custom ggplot2 theme theme_set(theme_classic(base_size = 12, base_family = "serif") + theme( axis.line.x.bottom = element_line( colour = "black", size = 1 ), axis.line.y.left = element_line( colour = "black", size = 1 ) )) options( ggplot2.discrete.colour = c( "#A25050", "#00008b", "darkred", "#010048" ), ggplot2.discrete.fill = c( "#A25050", "#00008b", "darkred", "#010048" ) ) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection. ## N-mixture models An N-mixture model is a fairly recent addition to the ecological modeller's toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect ([Royle 2004](https://onlinelibrary.wiley.com/doi/10.1111/j.0006-341X.2004.00142.x){target="_blank"}). Briefly, assume $\boldsymbol{Y_{i,r}}$ is the number of individuals recorded at site $i$ during replicate sampling observation $r$ (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren't perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true "latent" abundance $N$ and a detection probability $p$: \begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*} Using a set of linear predictors, we can estimate effects of covariates $\boldsymbol{X}$ on the expected latent abundance (with a log link for $\lambda$) and, jointly, effects of possibly different covariates (call them $\boldsymbol{Q}$) on detection probability (with a logit link for $p$): \begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*} `mvgam` can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the "true" but unknown population size. This is very convenient because we can incorporate any of the package's diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc...) into the linear predictors. All that is required for this to work is a marginalization trick that allows `Stan`'s sampling algorithms to handle discrete parameters (see more about how this method of "integrating out" discrete parameters works in [this nice blog post by Maxwell Joseph](https://mbjoseph.github.io/posts/2020-04-28-a-step-by-step-guide-to-marginalizing-over-discrete-parameters-for-ecologists-using-stan/){target="_blank"}). The family `nmix()` is used to set up N-mixture models in `mvgam`, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation `series` and `trend_map` arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a `cap` variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done. ## Example 1: a two-species system with nonlinear trends First we will use a simple simulation in which multiple replicate observations are taken at each timepoint for two different species. The simulation produces observations at a single site over six years, with five replicate surveys per year. Each species is simulated to have different nonlinear temporal trends and different detection probabilities. For now, detection probability is fixed (i.e. it does not change over time or in association with any covariates). Notice that we add the `cap` variable, which does not need to be static, to define the maximum possible value that we think the latent abundance could be for each timepoint. This simply needs to be large enough that we get a reasonable idea of which latent N values are most likely, without adding too much computational cost: ```{r} set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_1", # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows(data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate)) ``` This data format isn't too difficult to set up, but it does differ from the traditional multidimensional array setup that is commonly used for fitting N-mixture models in other software packages. Next we ensure that species and series IDs are included as factor variables, in case we'd like to allow certain effects to vary by species ```{r} testdat$species <- factor(testdat$species, levels = unique(testdat$species) ) testdat$series <- factor(testdat$series, levels = unique(testdat$series) ) ``` Preview the dataset to get an idea of how it is structured: ```{r} dplyr::glimpse(testdat) head(testdat, 12) ``` ### Setting up the `trend_map` Finally, we need to set up the `trend_map` object. This is crucial for allowing multiple observations to be linked to the same latent process model (see more information about this argument in the [Shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/shared_states.html){target="_blank"}). In this case, the mapping operates by species and site to state that each set of replicate observations from the same time point should all share the exact same latent abundance model: ```{r} testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ``` Notice how all of the replicates for species 1 in site 1 share the same process (i.e. the same `trend`). This will ensure that all replicates are Binomial draws of the same latent N. ### Modelling with the `nmix()` family Now we are ready to fit a model using `mvgam()`. This model will allow each species to have different detection probabilities and different temporal trends. We will use `Cmdstan` as the backend, which by default will use Hamiltonian Monte Carlo for full Bayesian inference ```{r include = FALSE, results='hide'} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` ```{r eval = FALSE} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` View the automatically-generated `Stan` code to get a sense of how the marginalization over latent N works ```{r} code(mod) ``` The posterior summary of this model shows that it has converged nicely ```{r} summary(mod) ``` `loo()` functionality works just as it does for all `mvgam` models to aid in model comparison / selection (though note that Pareto K values often give warnings for mixture models so these may not be too helpful) ```{r} loo(mod) ``` Plot the estimated smooths of time from each species' latent abundance process (on the log scale) ```{r} plot(mod, type = "smooths", trend_effects = TRUE) ``` `marginaleffects` support allows for more useful prediction-based interrogations on different scales (though note that at the time of writing this Vignette, you must have the development version of `marginaleffects` installed for `nmix()` models to be supported; use `remotes::install_github('vincentarelbundock/marginaleffects')` to install). Objects that use family `nmix()` have a few additional prediction scales that can be used (i.e. `link`, `response`, `detection` or `latent_N`). For example, here are the estimated detection probabilities per species, which show that the model has done a nice job of estimating these parameters: ```{r} marginaleffects::plot_predictions(mod, condition = "species", type = "detection" ) + ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + theme(legend.position = "none") ``` A common goal in N-mixture modelling is to estimate the true latent abundance. The model has automatically generated predictions for the unknown latent abundance that are conditional on the observations. We can extract these and produce decent plots using a small function ```{r} hc <- hindcast(mod, type = "latent_N") # Function to plot latent abundance estimates vs truth plot_latentN <- function(hindcasts, data, species = "sp_1") { all_series <- unique(data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series)) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call(rbind, lapply(all_series, function(x) { ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] })) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) { quantile(x, probs = c( 0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95 )) }))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") + geom_line(aes(x = time, y = truth), colour = "black", linewidth = 1 ) + geom_point(aes(x = time, y = truth), shape = 21, colour = "white", fill = "black", size = 2.5 ) + geom_jitter( data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = "darkred", colour = "white", size = 2.5 ) + labs( y = "Latent abundance (N)", x = "Time", title = species ) } ``` Latent abundance plots vs the simulated truths for each species are shown below. Here, the red points show the imperfect observations, the black line shows the true latent abundance, and the ribbons show credible intervals of our estimates: ```{r} plot_latentN(hc, testdat, species = "sp_1") plot_latentN(hc, testdat, species = "sp_2") ``` We can see that estimates for both species have correctly captured the true temporal variation and magnitudes in abundance ## Example 2: a larger survey with possible nonlinear effects Now for another example with a larger dataset. We will use data from [Jeff Doser's simulation example from the wonderful `spAbundance` package](https://doserlab.com/files/spabundance-web/articles/nmixturemodels){target="_blank"}. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models. Download the data and grab observations / covariate measurements for one species ```{r} # Date link load(url("https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda")) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[, ] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) ``` Next we wrangle into the appropriate 'long' data format, adding indicators of `time` and `series` for working in `mvgam`. We also add the `cap` variable to represent the maximum latent N to marginalize over for each observation ```{r} mod_data <- do.call( rbind, lapply(1:NROW(data.one.sp$y), function(x) { data.frame( y = data.one.sp$y[x, ], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x, ], det_cov2 = det.cov2[x, ], replicate = 1:NCOL(data.one.sp$y), site = paste0("site", x) ) }) ) %>% dplyr::mutate( species = "sp_1", series = as.factor(paste0(site, "_", species, "_", replicate)) ) %>% dplyr::mutate( site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20 ) ``` The data include observations for 225 sites with three replicates per site, though some observations are missing ```{r} NROW(mod_data) dplyr::glimpse(mod_data) head(mod_data) ``` The final step for data preparation is of course the `trend_map`, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above ```{r} mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ``` Now we are ready to fit a model using `mvgam()`. Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how `mvgam` can make use of the different approximation algorithms available in `Stan` by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example) ```{r include = FALSE, results='hide'} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` ```{r eval=FALSE} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 4) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` Inspect the model summary but don't bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model ```{r} summary(mod, include_betas = FALSE) ``` Again we can make use of `marginaleffects` support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability ```{r} marginaleffects::avg_predictions(mod, type = "detection") ``` Next investigate estimated effects of covariates on latent abundance using the `conditional_effects()` function and specifying `type = 'link'`; this will return plots on the expectation scale ```{r} abund_plots <- plot( conditional_effects(mod, type = "link", effects = c( "abund_cov", "abund_fac" ) ), plot = FALSE ) ``` The effect of the continuous covariate on expected latent abundance ```{r} abund_plots[[1]] + ylab("Expected latent abundance") ``` The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect ```{r} abund_plots[[2]] + ylab("Expected latent abundance") ``` Now we can investigate estimated effects of covariates on detection probability using `type = 'detection'` ```{r} det_plots <- plot( conditional_effects(mod, type = "detection", effects = c( "det_cov", "det_cov2" ) ), plot = FALSE ) ``` The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful ```{r} det_plots[[1]] + ylab("Pr(detection)") det_plots[[2]] + ylab("Pr(detection)") ``` More targeted predictions are also easy with `marginaleffects` support. For example, we can ask: How does detection probability change as we change *both* detection covariates? ```{r} fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2) marginaleffects::plot_predictions(mod, newdata = marginaleffects::datagrid( det_cov = unique, det_cov2 = fivenum_round ), by = c("det_cov", "det_cov2"), type = "detection" ) + theme_classic() + ylab("Pr(detection)") ``` The model has found support for some important covariate effects, but of course we'd want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths). ## Further reading The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations: Guélat, Jérôme, and Kéry, Marc. “[Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12983)” *Methods in Ecology and Evolution* 9 (2018): 1614–25. Kéry, Marc, and Royle Andrew J. "[Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models](https://shop.elsevier.com/books/applied-hierarchical-modeling-in-ecology-analysis-of-distribution-abundance-and-species-richness-in-r-and-bugs/kery/978-0-12-809585-0)". London, UK: Academic Press (2020). Royle, Andrew J. "[N‐mixture models for estimating population size from spatially replicated counts.](https://onlinelibrary.wiley.com/doi/full/10.1111/j.0006-341X.2004.00142.x)" *Biometrics* 60.1 (2004): 108-115. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/nmixtures.html ================================================ N-mixtures in mvgam

N-mixtures in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection.

N-mixture models

An N-mixture model is a fairly recent addition to the ecological modeller’s toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect (Royle 2004). Briefly, assume \(\boldsymbol{Y_{i,r}}\) is the number of individuals recorded at site \(i\) during replicate sampling observation \(r\) (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren’t perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true “latent” abundance \(N\) and a detection probability \(p\):

\[\begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*}\]

Using a set of linear predictors, we can estimate effects of covariates \(\boldsymbol{X}\) on the expected latent abundance (with a log link for \(\lambda\)) and, jointly, effects of possibly different covariates (call them \(\boldsymbol{Q}\)) on detection probability (with a logit link for \(p\)):

\[\begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*}\]

mvgam can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the “true” but unknown population size. This is very convenient because we can incorporate any of the package’s diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc…) into the linear predictors. All that is required for this to work is a marginalization trick that allows Stan’s sampling algorithms to handle discrete parameters (see more about how this method of “integrating out” discrete parameters works in this nice blog post by Maxwell Joseph).

The family nmix() is used to set up N-mixture models in mvgam, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation series and trend_map arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a cap variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done.

Example 2: a larger survey with possible nonlinear effects

Now for another example with a larger dataset. We will use data from Jeff Doser’s simulation example from the wonderful spAbundance package. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models.

Download the data and grab observations / covariate measurements for one species

# Date link
load(url("https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda"))
data.one.sp <- dataNMixSim

# Pull out observations for one species
data.one.sp$y <- data.one.sp$y[1, , ]

# Abundance covariates that don't change across repeat sampling observations
abund.cov <- dataNMixSim$abund.covs[, 1]
abund.factor <- as.factor(dataNMixSim$abund.covs[, 2])

# Detection covariates that can change across repeat sampling observations
# Note that `NA`s are not allowed for covariates in mvgam, so we randomly
# impute them here
det.cov <- dataNMixSim$det.covs$det.cov.1[, ]
det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov))))
det.cov2 <- dataNMixSim$det.covs$det.cov.2
det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2))))

Next we wrangle into the appropriate ‘long’ data format, adding indicators of time and series for working in mvgam. We also add the cap variable to represent the maximum latent N to marginalize over for each observation

mod_data <- do.call(
  rbind,
  lapply(1:NROW(data.one.sp$y), function(x) {
    data.frame(
      y = data.one.sp$y[x, ],
      abund_cov = abund.cov[x],
      abund_fac = abund.factor[x],
      det_cov = det.cov[x, ],
      det_cov2 = det.cov2[x, ],
      replicate = 1:NCOL(data.one.sp$y),
      site = paste0("site", x)
    )
  })
) %>%
  dplyr::mutate(
    species = "sp_1",
    series = as.factor(paste0(site, "_", species, "_", replicate))
  ) %>%
  dplyr::mutate(
    site = factor(site, levels = unique(site)),
    species = factor(species, levels = unique(species)),
    time = 1,
    cap = max(data.one.sp$y, na.rm = TRUE) + 20
  )

The data include observations for 225 sites with three replicates per site, though some observations are missing

NROW(mod_data)
#> [1] 675
dplyr::glimpse(mod_data)
#> Rows: 675
#> Columns: 11
#> $ y         <int> 1, NA, NA, NA, 2, 2, NA, 1, NA, NA, 0, 1, 0, 0, 0, 0, NA, NA…
#> $ abund_cov <dbl> -0.3734384, -0.3734384, -0.3734384, 0.7064305, 0.7064305, 0.…
#> $ abund_fac <fct> 3, 3, 3, 4, 4, 4, 9, 9, 9, 2, 2, 2, 3, 3, 3, 2, 2, 2, 1, 1, …
#> $ det_cov   <dbl> -1.28279990, 1.11996398, -1.26741746, -1.29426683, 0.1954808…
#> $ det_cov2  <dbl> 2.03047314, 1.61128100, 0.06661865, -0.94290689, 1.04555361,…
#> $ replicate <int> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …
#> $ site      <fct> site1, site1, site1, site2, site2, site2, site3, site3, site…
#> $ species   <fct> sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, …
#> $ series    <fct> site1_sp_1_1, site1_sp_1_2, site1_sp_1_3, site2_sp_1_1, site…
#> $ time      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cap       <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, …
head(mod_data)
#>    y  abund_cov abund_fac    det_cov    det_cov2 replicate  site species
#> 1  1 -0.3734384         3 -1.2827999  2.03047314         1 site1    sp_1
#> 2 NA -0.3734384         3  1.1199640  1.61128100         2 site1    sp_1
#> 3 NA -0.3734384         3 -1.2674175  0.06661865         3 site1    sp_1
#> 4 NA  0.7064305         4 -1.2942668 -0.94290689         1 site2    sp_1
#> 5  2  0.7064305         4  0.1954809  1.04555361         2 site2    sp_1
#> 6  2  0.7064305         4  0.9673034  1.91971178         3 site2    sp_1
#>         series time cap
#> 1 site1_sp_1_1    1  33
#> 2 site1_sp_1_2    1  33
#> 3 site1_sp_1_3    1  33
#> 4 site2_sp_1_1    1  33
#> 5 site2_sp_1_2    1  33
#> 6 site2_sp_1_3    1  33

The final step for data preparation is of course the trend_map, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above

mod_data %>%
  # each unique combination of site*species is a separate process
  dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>%
  dplyr::select(trend, series) %>%
  dplyr::distinct() -> trend_map

trend_map %>%
  dplyr::arrange(trend) %>%
  head(12)
#>    trend         series
#> 1      1 site100_sp_1_1
#> 2      1 site100_sp_1_2
#> 3      1 site100_sp_1_3
#> 4      2 site101_sp_1_1
#> 5      2 site101_sp_1_2
#> 6      2 site101_sp_1_3
#> 7      3 site102_sp_1_1
#> 8      3 site102_sp_1_2
#> 9      3 site102_sp_1_3
#> 10     4 site103_sp_1_1
#> 11     4 site103_sp_1_2
#> 12     4 site103_sp_1_3

Now we are ready to fit a model using mvgam(). Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how mvgam can make use of the different approximation algorithms available in Stan by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example)

mod <- mvgam(
  # effects of covariates on detection probability;
  # here we use penalized splines for both continuous covariates
  formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4),

  # effects of the covariates on latent abundance;
  # here we use a penalized spline for the continuous covariate and
  # hierarchical intercepts for the factor covariate
  trend_formula = ~ s(abund_cov, k = 4) +
    s(abund_fac, bs = "re"),

  # link multiple observations to each site
  trend_map = trend_map,

  # nmix() family and supplied data
  family = nmix(),
  data = mod_data,

  # standard normal priors on key regression parameters
  priors = c(
    prior(std_normal(), class = "b"),
    prior(std_normal(), class = "Intercept"),
    prior(std_normal(), class = "Intercept_trend"),
    prior(std_normal(), class = "sigma_raw_trend")
  ),

  # use Stan's variational inference for quicker results
  algorithm = "meanfield",

  # no need to compute "series-level" residuals
  residuals = FALSE,
  samples = 1000
)

Inspect the model summary but don’t bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ s(det_cov, k = 3) + s(det_cov2, k = 3)
#> <environment: 0x000001a57291b728>
#> 
#> GAM process formula:
#> ~s(abund_cov, k = 3) + s(abund_fac, bs = "re")
#> <environment: 0x000001a57291b728>
#> 
#> Family:
#> nmix
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N process models:
#> 225 
#> 
#> N series:
#> 675 
#> 
#> N timepoints:
#> 1 
#> 
#> Status:
#> Fitted using Stan 
#> 1 chains, each with iter = 1000; warmup = ; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> GAM observation model coefficient (beta) estimates:
#>              2.5%  50% 97.5% Rhat n.eff
#> (Intercept) 0.076 0.37  0.67  NaN   NaN
#> 
#> Approximate significance of GAM observation smooths:
#>               edf Ref.df Chi.sq  p-value    
#> s(det_cov)  1.041      2  176.9 0.000177 ***
#> s(det_cov2) 1.011      2  548.1  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> GAM process model coefficient (beta) estimates:
#>                      2.5%  50% 97.5% Rhat n.eff
#> (Intercept)_trend -0.0083 0.14  0.29  NaN   NaN
#> 
#> GAM process model group-level estimates:
#>                           2.5%   50%  97.5% Rhat n.eff
#> mean(s(abund_fac))_trend -0.44 -0.26 -0.076  NaN   NaN
#> sd(s(abund_fac))_trend    0.29  0.42  0.620  NaN   NaN
#> 
#> Approximate significance of GAM process smooths:
#>               edf Ref.df Chi.sq p-value  
#> s(abund_cov) 1.42      2  1.622  0.2656  
#> s(abund_fac) 8.85     10 14.771  0.0918 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Posterior approximation used: no diagnostics to compute
#> 
#> Use how_to_cite() to get started describing this model

Again we can make use of marginaleffects support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability

marginaleffects::avg_predictions(mod, type = "detection")
#> 
#>  Estimate 2.5 % 97.5 %
#>     0.575 0.515  0.636
#> 
#> Type: detection

Next investigate estimated effects of covariates on latent abundance using the conditional_effects() function and specifying type = 'link'; this will return plots on the expectation scale

abund_plots <- plot(
  conditional_effects(mod,
    type = "link",
    effects = c(
      "abund_cov",
      "abund_fac"
    )
  ),
  plot = FALSE
)

The effect of the continuous covariate on expected latent abundance

abund_plots[[1]] +
  ylab("Expected latent abundance")

The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect

abund_plots[[2]] +
  ylab("Expected latent abundance")

Now we can investigate estimated effects of covariates on detection probability using type = 'detection'

det_plots <- plot(
  conditional_effects(mod,
    type = "detection",
    effects = c(
      "det_cov",
      "det_cov2"
    )
  ),
  plot = FALSE
)

The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful

det_plots[[1]] +
  ylab("Pr(detection)")

det_plots[[2]] +
  ylab("Pr(detection)")

More targeted predictions are also easy with marginaleffects support. For example, we can ask: How does detection probability change as we change both detection covariates?

fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2)

marginaleffects::plot_predictions(mod,
  newdata = marginaleffects::datagrid(
    det_cov = unique,
    det_cov2 = fivenum_round
  ),
  by = c("det_cov", "det_cov2"),
  type = "detection"
) +
  theme_classic() +
  ylab("Pr(detection)")

The model has found support for some important covariate effects, but of course we’d want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths).

Further reading

The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations:

Guélat, Jérôme, and Kéry, Marc. “Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.Methods in Ecology and Evolution 9 (2018): 1614–25.

Kéry, Marc, and Royle Andrew J. “Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models”. London, UK: Academic Press (2020).

Royle, Andrew J. “N‐mixture models for estimating population size from spatially replicated counts.Biometrics 60.1 (2004): 108-115.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/shared_states.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(122) simdat <- sim_mvgam( trend_model = AR(), prop_trend = 0.6, mu = c(0, 1, 2), family = poisson() ) trend_map <- data.frame( series = unique(simdat$data_train$series), trend = c(1, 1, 2) ) trend_map ## -------------------------------------------------------------------------------- all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ## -------------------------------------------------------------------------------- fake_mod <- mvgam( y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = "cc", k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics; processes are estimated using the noncentred # parameterisation for improved efficiency trend_model = AR(), noncentred = TRUE, # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE ) ## -------------------------------------------------------------------------------- stancode(fake_mod) ## -------------------------------------------------------------------------------- fake_mod$model_data$Z ## ----full_mod, include = FALSE, results='hide'----------------------------------- full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # full_mod <- mvgam( # y ~ series - 1, # trend_formula = ~ s(season, bs = "cc", k = 6), # trend_model = AR(), # noncentred = TRUE, # trend_map = trend_map, # family = poisson(), # data = simdat$data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(full_mod) ## -------------------------------------------------------------------------------- plot(full_mod, type = "trend", series = 1) plot(full_mod, type = "trend", series = 2) plot(full_mod, type = "trend", series = 3) ## -------------------------------------------------------------------------------- set.seed(123) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1) # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector( scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1)) ) ## -------------------------------------------------------------------------------- plot( true_signal, type = "l", bty = "l", lwd = 2, ylab = "True signal", xlab = "Time" ) ## -------------------------------------------------------------------------------- # Function to simulate a monotonic response to a covariate sim_monotonic <- function(x, a = 2.2, b = 2) { out <- exp(a * x) / (6 + exp(b * x)) * -1 return(2.5 * as.vector(scale(out))) } # Simulated temperature covariate temperature <- runif(100, -2, 2) # Simulate the three series sim_series <- function(n_series = 3, true_signal) { temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05) alphas <- rnorm(n_series, sd = 2) do.call( rbind, lapply(seq_len(n_series), function(series) { data.frame( observed = rnorm( length(true_signal), mean = alphas[series] + sim_monotonic(temperature, runif(1, 2.2, 3), runif(1, 2.2, 3)) + true_signal, sd = runif(1, 1, 2) ), series = paste0("sensor_", series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal ) }) ) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ## -------------------------------------------------------------------------------- plot_mvgam_series( data = model_dat, y = "observed", series = "all" ) ## -------------------------------------------------------------------------------- plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_1"), pch = 16, bty = "l", ylab = "Sensor 1", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_2"), pch = 16, bty = "l", ylab = "Sensor 2", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_3"), pch = 16, bty = "l", ylab = "Sensor 3", xlab = "Temperature" ) ## ----sensor_mod, include = FALSE, results='hide'--------------------------------- mod <- mvgam( # formula for observations, allowing for different # intercepts and smooth effects of temperature formula = observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), # formula for the latent signal, which can depend # nonlinearly on productivity trend_formula = ~ s(productivity, k = 8) - 1, # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation trend_model = AR(), noncentred = TRUE, # trend_map forces all sensors to track the same # latent signal trend_map = data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), burnin = 600, control = list(adapt_delta = 0.95), data = model_dat, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam( # formula = # # formula for observations, allowing for different # # intercepts and hierarchical smooth effects of temperature # observed ~ series + # s(temperature, k = 10) + # s(series, temperature, bs = "sz", k = 8), # trend_formula = # # formula for the latent signal, which can depend # # nonlinearly on productivity # ~ s(productivity, k = 8) - 1, # trend_model = # # in addition to productivity effects, the signal is # # assumed to exhibit temporal autocorrelation # AR(), # noncentred = TRUE, # trend_map = # # trend_map forces all sensors to track the same # # latent signal # data.frame( # series = unique(model_dat$series), # trend = c(1, 1, 1) # ), # # # informative priors on process error # # and observation error will help with convergence # priors = c( # prior(normal(2, 0.5), class = sigma), # prior(normal(1, 0.5), class = sigma_obs) # ), # # # Gaussian observations # family = gaussian(), # data = model_dat, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- conditional_effects(mod, type = "link") ## -------------------------------------------------------------------------------- plot_predictions( mod, condition = c("temperature", "series", "series"), points = 0.5 ) + theme(legend.position = "none") ## -------------------------------------------------------------------------------- plot(mod, type = "trend") + ggplot2::geom_point( data = data.frame(time = 1:100, y = true_signal), mapping = ggplot2::aes(x = time, y = y) ) ================================================ FILE: doc/shared_states.Rmd ================================================ --- title: "Shared latent states in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Shared latent states in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how `mvgam` can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html). ## The `trend_map` argument The `trend_map` argument in the `mvgam()` function is an optional `data.frame` that can be used to specify which series should depend on which latent process models (called "trends" in `mvgam`). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting `use_lv = TRUE` and using the supplied `trend_map` to set up the shared trends. Users familiar with the `MARSS` family of packages will recognize this as a way of specifying the $Z$ matrix. This `data.frame` needs to have column names `series` and `trend`, with integer values in the `trend` column to state which trend each series should depend on. The `series` column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the `series` variable in `data`). For example, if we were to simulate a collection of three integer-valued time series (using `sim_mvgam`), the following `trend_map` would force the first two series to share the same latent trend process: ```{r} set.seed(122) simdat <- sim_mvgam( trend_model = AR(), prop_trend = 0.6, mu = c(0, 1, 2), family = poisson() ) trend_map <- data.frame( series = unique(simdat$data_train$series), trend = c(1, 1, 2) ) trend_map ``` We can see that the factor levels in `trend_map` match those in the data: ```{r} all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ``` ### Checking `trend_map` with `run_model = FALSE` Supplying this `trend_map` to the `mvgam` function for a simple model, but setting `run_model = FALSE`, allows us to inspect the constructed `Stan` code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the `mvgam` framework: ```{r} fake_mod <- mvgam( y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = "cc", k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics; processes are estimated using the noncentred # parameterisation for improved efficiency trend_model = AR(), noncentred = TRUE, # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE ) ``` Inspecting the `Stan` code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied `trend_map`: ```{r} stancode(fake_mod) ``` Notice the line that states "lv_coefs = Z;". This uses the supplied $Z$ matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you'd use if you were to create a similar model in the `MARSS` package: ```{r} fake_mod$model_data$Z ``` ### Fitting and inspecting the model Though this model doesn't perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like: ```{r full_mod, include = FALSE, results='hide'} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` ```{r eval=FALSE} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well ```{r} summary(full_mod) ``` Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different: ```{r} plot(full_mod, type = "trend", series = 1) plot(full_mod, type = "trend", series = 2) plot(full_mod, type = "trend", series = 3) ``` However, forecasts for series' 1 and 2 will differ because they have different intercepts in the observation model ## Example: signal detection Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called `productivity`, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation: ```{r} set.seed(123) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1) # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector(scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1))) ``` Plot the signal to inspect it's evolution over time ```{r} plot( true_signal, type = "l", bty = "l", lwd = 2, ylab = "True signal", xlab = "Time" ) ``` Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called `temperature` in this example. Again this makes use of `gamSim` ```{r} # Function to simulate a monotonic response to a covariate sim_monotonic <- function(x, a = 2.2, b = 2) { out <- exp(a * x) / (6 + exp(b * x)) * -1 return(2.5 * as.vector(scale(out))) } # Simulated temperature covariate temperature <- runif(100, -2, 2) # Simulate the three series sim_series <- function(n_series = 3, true_signal) { temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05) alphas <- rnorm(n_series, sd = 2) do.call(rbind, lapply(seq_len(n_series), function(series) { data.frame( observed = rnorm(length(true_signal), mean = alphas[series] + sim_monotonic(temperature, runif(1, 2.2, 3), runif(1, 2.2, 3)) + true_signal, sd = runif(1, 1, 2) ), series = paste0("sensor_", series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal ) })) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ``` Plot the sensor observations ```{r} plot_mvgam_series( data = model_dat, y = "observed", series = "all" ) ``` And now plot the observed relationships between the three sensors and the `temperature` covariate ```{r} plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_1"), pch = 16, bty = "l", ylab = "Sensor 1", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_2"), pch = 16, bty = "l", ylab = "Sensor 2", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_3"), pch = 16, bty = "l", ylab = "Sensor 3", xlab = "Temperature" ) ``` ### The shared signal model Now we can formulate and fit a model that allows each sensor's observation error to depend nonlinearly on `temperature` while allowing the true signal to depend nonlinearly on `productivity`. By fixing all of the values in the `trend` column to `1` in the `trend_map`, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error ```{r sensor_mod, include = FALSE, results='hide'} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), burnin = 600, control = list(adapt_delta = 0.95), data = model_dat, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and hierarchical smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), data = model_dat, silent = 2 ) ``` View a reduced version of the model summary because there will be many spline coefficients in this model ```{r} summary(mod, include_betas = FALSE) ``` ### Inspecting effects on both process and observation models Don't pay much attention to the approximate *p*-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don't tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with `conditional_effects`: ```{r} conditional_effects(mod, type = "link") ``` `conditional_effects` is simply a wrapper to the more flexible `plot_predictions` function from the `marginaleffects` package. We can get more useful plots of these effects using this function for further customisation: ```{r} plot_predictions( mod, condition = c("temperature", "series", "series"), points = 0.5 ) + theme(legend.position = "none") ``` We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time. ### Recovering the hidden signal A final but very key question is whether we can successfully recover the true hidden signal. The `trend` slot in the returned model parameters has the estimates for this signal, which we can easily plot using the `mvgam` S3 method for `plot`. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it: ```{r} plot(mod, type = "trend") + ggplot2::geom_point(data = data.frame(time = 1:100, y = true_signal), mapping = ggplot2::aes(x = time, y = y)) ``` ## Further reading The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. ["A guide to state–space modeling of ecological time series.](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470)" *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Ward, Eric J., et al. "[Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x)" *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/shared_states.html ================================================ Shared latent states in mvgam

Shared latent states in mvgam

Nicholas J Clark

2026-01-19

This vignette gives an example of how mvgam can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic mvgam functionality, please see the introductory vignette.

The trend_map argument

The trend_map argument in the mvgam() function is an optional data.frame that can be used to specify which series should depend on which latent process models (called “trends” in mvgam). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting use_lv = TRUE and using the supplied trend_map to set up the shared trends. Users familiar with the MARSS family of packages will recognize this as a way of specifying the \(Z\) matrix. This data.frame needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the series variable in data). For example, if we were to simulate a collection of three integer-valued time series (using sim_mvgam), the following trend_map would force the first two series to share the same latent trend process:

set.seed(122)
simdat <- sim_mvgam(
  trend_model = AR(),
  prop_trend = 0.6,
  mu = c(0, 1, 2),
  family = poisson()
)
trend_map <- data.frame(
  series = unique(simdat$data_train$series),
  trend = c(1, 1, 2)
)
trend_map
#>     series trend
#> 1 series_1     1
#> 2 series_2     1
#> 3 series_3     2

We can see that the factor levels in trend_map match those in the data:

all.equal(levels(trend_map$series), 
          levels(simdat$data_train$series))
#> [1] TRUE

Checking trend_map with run_model = FALSE

Supplying this trend_map to the mvgam function for a simple model, but setting run_model = FALSE, allows us to inspect the constructed Stan code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the mvgam framework:

fake_mod <- mvgam(
  y ~
    # observation model formula, which has a
    # different intercept per series
    series - 1,

  # process model formula, which has a shared seasonal smooth
  # (each latent process model shares the SAME smooth)
  trend_formula = ~ s(season, bs = "cc", k = 6),

  # AR1 dynamics (each latent process model has DIFFERENT)
  # dynamics; processes are estimated using the noncentred
  # parameterisation for improved efficiency
  trend_model = AR(),
  noncentred = TRUE,

  # supplied trend_map
  trend_map = trend_map,

  # data and observation family
  family = poisson(),
  data = simdat$data_train,
  run_model = FALSE
)

Inspecting the Stan code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied trend_map:

stancode(fake_mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp_trend; // number of trend smoothing parameters
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent states
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   int<lower=0> num_basis_trend; // number of trend basis coefficients
#>   vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   array[n, n_lv] int ytimes_trend;
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   matrix[4, 4] S_trend1; // mgcv smooth penalty matrix S_trend1
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   vector[num_basis_trend] b_raw_trend;
#>   
#>   // latent state SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // latent state AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // raw latent states
#>   matrix[n, n_lv] LV_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp_trend] lambda_trend;
#> }
#> transformed parameters {
#>   // raw latent states
#>   vector[n * n_lv] trend_mus;
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent states
#>   matrix[n, n_lv] LV;
#>   vector[num_basis_trend] b_trend;
#>   
#>   // observation model basis coefficients
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // process model basis coefficients
#>   b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend];
#>   
#>   // latent process linear predictors
#>   trend_mus = X_trend * b_trend;
#>   LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));
#>   for (j in 1 : n_lv) {
#>     LV[1, j] += trend_mus[ytimes_trend[1, j]];
#>     for (i in 2 : n) {
#>       LV[i, j] += trend_mus[ytimes_trend[i, j]]
#>                   + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);
#>     }
#>   }
#>   
#>   // derived latent states
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for seriesseries_1...
#>   b_raw[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_2...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_3...
#>   b_raw[3] ~ student_t(3, 0, 2);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for latent state SD parameters
#>   sigma ~ inv_gamma(1.418, 0.452);
#>   to_vector(LV_raw) ~ std_normal();
#>   
#>   // dynamic process models
#>   
#>   // prior for (Intercept)_trend...
#>   b_raw_trend[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(season)_trend...
#>   b_raw_trend[2 : 5] ~ multi_normal_prec(zero_trend[2 : 5],
#>                                          S_trend1[1 : 4, 1 : 4]
#>                                          * lambda_trend[1]);
#>   lambda_trend ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp_trend] rho_trend;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   penalty = 1.0 / (sigma .* sigma);
#>   rho_trend = log(lambda_trend);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Notice the line that states “lv_coefs = Z;”. This uses the supplied \(Z\) matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you’d use if you were to create a similar model in the MARSS package:

fake_mod$model_data$Z
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1

Fitting and inspecting the model

Though this model doesn’t perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like:

full_mod <- mvgam(
  y ~ series - 1,
  trend_formula = ~ s(season, bs = "cc", k = 6),
  trend_model = AR(),
  noncentred = TRUE,
  trend_map = trend_map,
  family = poisson(),
  data = simdat$data_train,
  silent = 2
)

The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well

summary(full_mod)
#> GAM observation formula:
#> y ~ series - 1
#> <environment: 0x000002063451d728>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 6)
#> <environment: 0x000002063451d728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> seriesseries_1 -2.70 -0.63   1.4 1.01   700
#> seriesseries_2 -1.70  0.31   2.3 1.01   702
#> seriesseries_3 -0.76  1.30   3.3 1.01   703
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.37 0.53  0.71    1   701
#> sigma[2] 0.49 0.61  0.78    1   693
#> 
#> autoregressive coef 1:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar1[1] -0.59 -0.240  0.17 1.00   462
#> ar1[2] -0.25  0.048  0.35 1.01   349
#> 
#> GAM process model coefficient (beta) estimates:
#>                     2.5%    50% 97.5% Rhat n_eff
#> (Intercept)_trend -1.300  0.730 2.800 1.01   694
#> s(season).1_trend -0.310 -0.061 0.190 1.00   947
#> s(season).2_trend -0.062  0.230 0.510 1.00   753
#> s(season).3_trend -0.480 -0.200 0.097 1.00   620
#> s(season).4_trend  0.350  0.690 0.960 1.00   584
#> 
#> Approximate significance of GAM process smooths:
#>             edf Ref.df Chi.sq  p-value    
#> s(season) 2.324      4   24.2 8.27e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different:

plot(full_mod, type = "trend", series = 1)

plot(full_mod, type = "trend", series = 2)

plot(full_mod, type = "trend", series = 3)

However, forecasts for series’ 1 and 2 will differ because they have different intercepts in the observation model

Example: signal detection

Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called productivity, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation:

set.seed(123)
# simulate a nonlinear relationship using the mgcv function gamSim
signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1)
#> Gu & Wahba 4 term additive model

# productivity is one of the variables in the simulated data
productivity <- signal_dat$x2

# simulate the true signal, which already has a nonlinear relationship
# with productivity; we will add in a fairly strong AR1 process to
# contribute to the signal
true_signal <- as.vector(scale(signal_dat$y) +
  arima.sim(100, model = list(ar = 0.8, sd = 0.1)))

Plot the signal to inspect it’s evolution over time

plot(
  true_signal,
  type = "l",
  bty = "l", lwd = 2,
  ylab = "True signal",
  xlab = "Time"
)

Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called temperature in this example. Again this makes use of gamSim

# Function to simulate a monotonic response to a covariate
sim_monotonic <- function(x, a = 2.2, b = 2) {
  out <- exp(a * x) / (6 + exp(b * x)) * -1
  return(2.5 * as.vector(scale(out)))
}

# Simulated temperature covariate
temperature <- runif(100, -2, 2)

# Simulate the three series
sim_series <- function(n_series = 3, true_signal) {
  temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05)
  alphas <- rnorm(n_series, sd = 2)

  do.call(rbind, lapply(seq_len(n_series), function(series) {
    data.frame(
      observed = rnorm(length(true_signal),
        mean = alphas[series] +
          sim_monotonic(temperature, 
                            runif(1, 2.2, 3),
                            runif(1, 2.2, 3)) +
          true_signal,
        sd = runif(1, 1, 2)
      ),
      series = paste0("sensor_", series),
      time = 1:length(true_signal),
      temperature = temperature,
      productivity = productivity,
      true_signal = true_signal
    )
  }))
}
model_dat <- sim_series(true_signal = true_signal) %>%
  dplyr::mutate(series = factor(series))
#> Gu & Wahba 4 term additive model, correlated predictors

Plot the sensor observations

plot_mvgam_series(
  data = model_dat, y = "observed",
  series = "all"
)

And now plot the observed relationships between the three sensors and the temperature covariate

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_1"),
  pch = 16, bty = "l",
  ylab = "Sensor 1",
  xlab = "Temperature"
)

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_2"),
  pch = 16, bty = "l",
  ylab = "Sensor 2",
  xlab = "Temperature"
)

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_3"),
  pch = 16, bty = "l",
  ylab = "Sensor 3",
  xlab = "Temperature"
)

The shared signal model

Now we can formulate and fit a model that allows each sensor’s observation error to depend nonlinearly on temperature while allowing the true signal to depend nonlinearly on productivity. By fixing all of the values in the trend column to 1 in the trend_map, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error

mod <- mvgam(
  formula =
  # formula for observations, allowing for different
  # intercepts and hierarchical smooth effects of temperature
    observed ~ series +
      s(temperature, k = 10) +
      s(series, temperature, bs = "sz", k = 8),
  trend_formula =
  # formula for the latent signal, which can depend
  # nonlinearly on productivity
    ~ s(productivity, k = 8) - 1,
  trend_model =
  # in addition to productivity effects, the signal is
  # assumed to exhibit temporal autocorrelation
    AR(),
  noncentred = TRUE,
  trend_map =
  # trend_map forces all sensors to track the same
  # latent signal
    data.frame(
      series = unique(model_dat$series),
      trend = c(1, 1, 1)
    ),

  # informative priors on process error
  # and observation error will help with convergence
  priors = c(
    prior(normal(2, 0.5), class = sigma),
    prior(normal(1, 0.5), class = sigma_obs)
  ),

  # Gaussian observations
  family = gaussian(),
  data = model_dat,
  silent = 2
)

View a reduced version of the model summary because there will be many spline coefficients in this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> observed ~ series + s(temperature, k = 10) + s(series, temperature, 
#>     bs = "sz", k = 8)
#> <environment: 0x000002063451d728>
#> 
#> GAM process formula:
#> ~s(productivity, k = 8) - 1
#> <environment: 0x000002063451d728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1100; warmup = 600; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1]  1.3 1.5   1.8    1  1791
#> sigma_obs[2]  1.2 1.4   1.6    1  2040
#> sigma_obs[3]  1.3 1.5   1.8    1  1607
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> (Intercept)     0.11  1.20  4.10 1.01   464
#> seriessensor_2 -2.40 -1.60 -0.65 1.00   831
#> seriessensor_3 -0.59  0.49  1.60 1.00  1316
#> 
#> Approximate significance of GAM observation smooths:
#>                         edf Ref.df   Chi.sq p-value    
#> s(temperature)        5.335      9 1446.722  <2e-16 ***
#> s(series,temperature) 2.545     16    1.042       1    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> standard deviation:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.89 1.2   1.5    1   576
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.56 0.77  0.97 1.01   378
#> 
#> Approximate significance of GAM process smooths:
#>                   edf Ref.df Chi.sq p-value
#> s(productivity) 1.865      7  41.97   0.228
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Inspecting effects on both process and observation models

Don’t pay much attention to the approximate p-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don’t tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with conditional_effects:

conditional_effects(mod, type = "link")

conditional_effects is simply a wrapper to the more flexible plot_predictions function from the marginaleffects package. We can get more useful plots of these effects using this function for further customisation:

plot_predictions(
  mod,
  condition = c("temperature", "series", "series"),
  points = 0.5
) +
  theme(legend.position = "none")

We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time.

Recovering the hidden signal

A final but very key question is whether we can successfully recover the true hidden signal. The trend slot in the returned model parameters has the estimates for this signal, which we can easily plot using the mvgam S3 method for plot. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it:

plot(mod, 
     type = "trend") +
  ggplot2::geom_point(data = data.frame(time = 1:100,
                                        y = true_signal),
                      mapping = ggplot2::aes(x = time,
                                             y = y))

Further reading

The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice:

Auger‐Méthé, Marie, et al. “A guide to state–space modeling of ecological time series.Ecological Monographs 91.4 (2021): e01470.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Ward, Eric J., et al. “Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.Journal of Applied Ecology 47.1 (2010): 47-56.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/time_varying_effects.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N) + 0.5 ## ----fig.alt = "Simulating time-varying effects in mvgam and R"------------------ plot( beta_temp, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Coefficient", col = "darkred" ) box(bty = "l", lwd = 2) ## -------------------------------------------------------------------------------- temp <- rnorm(N, sd = 1) ## ----fig.alt = "Simulating time-varying effects in mvgam and R"------------------ out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25) time <- seq_along(temp) plot( out, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Outcome", col = "darkred" ) box(bty = "l", lwd = 2) ## -------------------------------------------------------------------------------- data <- data.frame(out, temp, time) data_train <- data[1:190, ] data_test <- data[191:200, ] ## ----include=FALSE--------------------------------------------------------------- mod <- mvgam( out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), # family = gaussian(), # data = data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ## -------------------------------------------------------------------------------- require(marginaleffects) range_round <- function(x) { round(range(x, na.rm = TRUE), 2) } plot_predictions( mod, newdata = datagrid( time = unique, temp = range_round ), by = c("time", "temp", "temp"), type = "link" ) ## -------------------------------------------------------------------------------- fc <- forecast(mod, newdata = data_test) plot(fc) ## ----include=FALSE--------------------------------------------------------------- mod <- mvgam( out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam(out ~ dynamic(temp, k = 40), # family = gaussian(), # data = data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ## -------------------------------------------------------------------------------- load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda")) dplyr::glimpse(SalmonSurvCUI) ## -------------------------------------------------------------------------------- SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor("salmon")) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = model_data, y = "survival") ## ----include = FALSE------------------------------------------------------------- mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ## ----eval = FALSE---------------------------------------------------------------- # mod0 <- mvgam( # formula = survival ~ 1, # trend_model = AR(), # noncentred = TRUE, # priors = prior(normal(-3.5, 0.5), class = Intercept), # family = betar(), # data = model_data, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod0) ## -------------------------------------------------------------------------------- plot(mod0, type = "trend") ## ----include=FALSE--------------------------------------------------------------- mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, control = list(adapt_delta = 0.99), silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod1 <- mvgam( # formula = survival ~ 1, # trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, # trend_model = AR(), # noncentred = TRUE, # priors = prior(normal(-3.5, 0.5), class = Intercept), # family = betar(), # data = model_data, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod1, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot(mod1, type = "trend") ## -------------------------------------------------------------------------------- plot(mod1, type = "forecast") ## -------------------------------------------------------------------------------- # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod0") mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod1") sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 require(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() ## -------------------------------------------------------------------------------- plot(mod1, type = "smooths", trend_effects = TRUE) ## -------------------------------------------------------------------------------- loo_compare(mod0, mod1) ## ----include=FALSE--------------------------------------------------------------- lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ## ----eval=FALSE------------------------------------------------------------------ # lfo_mod0 <- lfo_cv(mod0, min_t = 30) # lfo_mod1 <- lfo_cv(mod1, min_t = 30) ## -------------------------------------------------------------------------------- sum(lfo_mod0$elpds) sum(lfo_mod1$elpds) ## ----fig.alt = "Comparing forecast skill for dynamic beta regression models in mvgam and R"---- plot( x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = "ELPDmod0 - ELPDmod1", xlab = "Evaluation time point", pch = 16, col = "darkred", bty = "l" ) abline(h = 0, lty = "dashed") ================================================ FILE: doc/time_varying_effects.Rmd ================================================ --- title: "Time-varying effects in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Time-varying effects in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to estimate and forecast regression coefficients that vary through time. ## Time-varying effects Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In `mvgam`, the `dynamic()` formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc...) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the `gp` basis in `mgcv` of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the 'global' dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn't do well if we used splines to estimate the effect. An example below illustrates. ### Simulating time-varying effects Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale $\rho$=10. We will do this using an internal function from `mvgam` (the `sim_gp` function): ```{r} set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N ) + 0.5 ``` A plot of the time-varying coefficient shows that it changes smoothly through time: ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} plot(beta_temp, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Coefficient", col = "darkred" ) box(bty = "l", lwd = 2) ``` Next we need to simulate the values of the covariate, which we will call `temp` (to represent $temperature$). In this case we just use a standard normal distribution to simulate this covariate: ```{r} temp <- rnorm(N, sd = 1) ``` Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of $temperature$ ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25 ) time <- seq_along(temp) plot(out, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Outcome", col = "darkred" ) box(bty = "l", lwd = 2) ``` Gather the data into a `data.frame` for fitting models, and split the data into training and testing folds. ```{r} data <- data.frame(out, temp, time) data_train <- data[1:190, ] data_test <- data[191:200, ] ``` ### The `dynamic()` function Time-varying coefficients can be fairly easily set up using the `s()` or `gp()` wrapper functions in `mvgam` formulae by fitting a nonlinear effect of `time` and using the covariate of interest as the numeric `by` variable (see `?mgcv::s` or `?brms::gp` for more details). The `dynamic()` formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to `dynamic`, it will either set up a low-rank GP smooth function using `s()` with `bs = 'gp'` and a fixed value of the length scale parameter $\rho$, or it will set up a Hilbert space approximate GP using the `gp()` function with `c=5/4` so that $\rho$ is estimated (see `?dynamic` for more details). In this first example we will use the `s()` option, and will mis-specify the $\rho$ parameter here as, in practice, it is never known. This call to `dynamic()` will set up the following smooth: `s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)` ```{r, include=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r, eval=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` Inspect the model summary, which shows how the `dynamic()` wrapper was used to construct a low-rank Gaussian Process smooth function: ```{r} summary(mod, include_betas = FALSE) ``` Because this model used a spline with a `gp` basis, it's smooths can be visualised just like any other `gam`. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the `newdata` argument in `plot_mvgam_smooth()` to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it's dynamics in both the training and testing data partitions ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` We can also use `plot_predictions()` from the `marginaleffects` package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of $temperature$: ```{r} require(marginaleffects) range_round <- function(x) { round(range(x, na.rm = TRUE), 2) } plot_predictions(mod, newdata = datagrid( time = unique, temp = range_round ), by = c("time", "temp", "temp"), type = "link" ) ``` This results in sensible forecasts of the observations as well ```{r} fc <- forecast(mod, newdata = data_test) plot(fc) ``` The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the `rho` argument in `dynamic()` to make this happen. This will set up a call similar to `gp(time, by = 'temp', c = 5/4, k = 40)`. ```{r include=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function: ```{r} summary(mod, include_betas = FALSE) ``` Effects for `gp()` terms can also be plotted as smooths: ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` ## Salmon survival example Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. [Scheuerell and Williams (2005)](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2419.2005.00346.x) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the `MARSS` package: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda")) dplyr::glimpse(SalmonSurvCUI) ``` First we need to prepare the data for modelling. The variable `CUI.apr` will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a `time` indicator and a `series` indicator for working in `mvgam`: ```{r} SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor("salmon")) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data ``` Inspect the data ```{r} dplyr::glimpse(model_data) ``` Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model: ```{r} plot_mvgam_series(data = model_data, y = "survival") ``` ### A State-Space Beta regression `mvgam` can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the `mgcv` function `betar()`, see `?mgcv::betar` for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model: ```{r include = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` ```{r eval = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters: ```{r} summary(mod0) ``` A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series: ```{r} plot(mod0, type = "trend") ``` ### Including time-varying upwelling effects Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a `dynamic()` effect of `CUI.apr` in the latent process model. We do not specify the $\rho$ parameter, instead opting to estimate it using a Hilbert space approximate GP: ```{r include=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, control = list(adapt_delta = 0.99), silent = 2 ) ``` ```{r eval=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary for this model now includes estimates for the time-varying GP parameters: ```{r} summary(mod1, include_betas = FALSE) ``` The estimates for the underlying dynamic process, and for the hindcasts, haven't changed much: ```{r} plot(mod1, type = "trend") ``` ```{r} plot(mod1, type = "forecast") ``` But the process error parameter $\sigma$ is slightly smaller for this model than for the first model: ```{r} # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod0") mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod1") sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 require(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() ``` Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using `plot()`: ```{r} plot(mod1, type = "smooths", trend_effects = TRUE) ``` ### Comparing model predictive performances A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in `mvgam` for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular `loo` package: ```{r} loo_compare(mod0, mod1) ``` The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, `mvgam` also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don't need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020). ```{r include=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` ```{r eval=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD ```{r} sum(lfo_mod0$elpds) sum(lfo_mod1$elpds) ``` We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts: ```{r, fig.alt = "Comparing forecast skill for dynamic beta regression models in mvgam and R"} plot( x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = "ELPDmod0 - ELPDmod1", xlab = "Evaluation time point", pch = 16, col = "darkred", bty = "l" ) abline(h = 0, lty = "dashed") ``` A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in `mvgam()`. But for now, we will leave the model as-is. ## Further reading The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice: Bürkner, PC, Gabry, J and Vehtari, A [Approximate leave-future-out cross-validation for Bayesian time series models](https://www.tandfonline.com/doi/full/10.1080/00949655.2020.1783262). *Journal of Statistical Computation and Simulation*. 90:14 (2020) 2499-2523. Herrero, Asier, et al. [From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12527). *Journal of Ecology* 104.2 (2016): 430-442. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Scheuerell, Mark D., and John G. Williams. [Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (*Oncorhynchus Tshawytscha*)](https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2419.2005.00346.x) *Fisheries Oceanography* 14 (2005): 448–57. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/time_varying_effects.html ================================================ Time-varying effects in mvgam

Time-varying effects in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to estimate and forecast regression coefficients that vary through time.

Time-varying effects

Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In mvgam, the dynamic() formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc…) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the gp basis in mgcv of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the ‘global’ dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn’t do well if we used splines to estimate the effect. An example below illustrates.

Simulating time-varying effects

Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale \(\rho\)=10. We will do this using an internal function from mvgam (the sim_gp function):

set.seed(1111)
N <- 200
beta_temp <- mvgam:::sim_gp(rnorm(1),
  alpha_gp = 0.75,
  rho_gp = 10,
  h = N
) + 0.5

A plot of the time-varying coefficient shows that it changes smoothly through time:

plot(beta_temp,
  type = "l", lwd = 3,
  bty = "l", xlab = "Time", ylab = "Coefficient",
  col = "darkred"
)
box(bty = "l", lwd = 2)

Simulating time-varying effects in mvgam and R

Next we need to simulate the values of the covariate, which we will call temp (to represent \(temperature\)). In this case we just use a standard normal distribution to simulate this covariate:

temp <- rnorm(N, sd = 1)

Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of \(temperature\)

out <- rnorm(N,
  mean = 4 + beta_temp * temp,
  sd = 0.25
)
time <- seq_along(temp)
plot(out,
  type = "l", lwd = 3,
  bty = "l", xlab = "Time", ylab = "Outcome",
  col = "darkred"
)
box(bty = "l", lwd = 2)

Simulating time-varying effects in mvgam and R

Gather the data into a data.frame for fitting models, and split the data into training and testing folds.

data <- data.frame(out, temp, time)
data_train <- data[1:190, ]
data_test <- data[191:200, ]

The dynamic() function

Time-varying coefficients can be fairly easily set up using the s() or gp() wrapper functions in mvgam formulae by fitting a nonlinear effect of time and using the covariate of interest as the numeric by variable (see ?mgcv::s or ?brms::gp for more details). The dynamic() formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to dynamic, it will either set up a low-rank GP smooth function using s() with bs = 'gp' and a fixed value of the length scale parameter \(\rho\), or it will set up a Hilbert space approximate GP using the gp() function with c=5/4 so that \(\rho\) is estimated (see ?dynamic for more details). In this first example we will use the s() option, and will mis-specify the \(\rho\) parameter here as, in practice, it is never known. This call to dynamic() will set up the following smooth: s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)

mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40),
  family = gaussian(),
  data = data_train,
  silent = 2
)

Inspect the model summary, which shows how the dynamic() wrapper was used to construct a low-rank Gaussian Process smooth function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.23 0.25  0.28    1  2113
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  3264
#> 
#> Approximate significance of GAM smooths:
#>                edf Ref.df Chi.sq p-value    
#> s(time):temp 16.35     40  168.2  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Because this model used a spline with a gp basis, it’s smooths can be visualised just like any other gam. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the newdata argument in plot_mvgam_smooth() to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it’s dynamics in both the training and testing data partitions

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = "dashed", lwd = 2)
lines(beta_temp, lwd = 2.5, col = "white")
lines(beta_temp, lwd = 2)

We can also use plot_predictions() from the marginaleffects package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of \(temperature\):

require(marginaleffects)
range_round <- function(x) {
  round(range(x, na.rm = TRUE), 2)
}
plot_predictions(mod,
  newdata = datagrid(
    time = unique,
    temp = range_round
  ),
  by = c("time", "temp", "temp"),
  type = "link"
)

This results in sensible forecasts of the observations as well

fc <- forecast(mod, newdata = data_test)
plot(fc)

The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the rho argument in dynamic() to make this happen. This will set up a call similar to gp(time, by = 'temp', c = 5/4, k = 40).

mod <- mvgam(out ~ dynamic(temp, k = 40),
  family = gaussian(),
  data = data_train,
  silent = 2
)

This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ gp(time, by = temp, c = 5/4, k = 40, scale = TRUE)
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.24 0.26  0.29    1  2547
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  2704
#> 
#> GAM gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                      2.5%   50% 97.5% Rhat n_eff
#> alpha_gp(time):temp 0.630 0.880 1.400    1   734
#> rho_gp(time):temp   0.024 0.051 0.068    1   611
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Effects for gp() terms can also be plotted as smooths:

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = "dashed", lwd = 2)
lines(beta_temp, lwd = 2.5, col = "white")
lines(beta_temp, lwd = 2)

Salmon survival example

Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. Scheuerell and Williams (2005) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the MARSS package:

load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda"))
dplyr::glimpse(SalmonSurvCUI)
#> Rows: 42
#> Columns: 3
#> $ year    <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 19…
#> $ logit.s <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82,…
#> $ CUI.apr <int> 57, 5, 43, 11, 47, -21, 25, -2, -1, 43, 2, 35, 0, 1, -1, 6, -7…

First we need to prepare the data for modelling. The variable CUI.apr will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a time indicator and a series indicator for working in mvgam:

SalmonSurvCUI %>%
  # create a time variable
  dplyr::mutate(time = dplyr::row_number()) %>%
  # create a series variable
  dplyr::mutate(series = as.factor("salmon")) %>%
  # z-score the covariate CUI.apr
  dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>%
  # convert logit-transformed survival back to proportional
  dplyr::mutate(survival = plogis(logit.s)) -> model_data

Inspect the data

dplyr::glimpse(model_data)
#> Rows: 42
#> Columns: 6
#> $ year     <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1…
#> $ logit.s  <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82…
#> $ CUI.apr  <dbl> 2.37949804, 0.03330223, 1.74782994, 0.30401713, 1.92830654, -…
#> $ time     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ series   <fct> salmon, salmon, salmon, salmon, salmon, salmon, salmon, salmo…
#> $ survival <dbl> 0.030472033, 0.034891409, 0.027119717, 0.046088827, 0.0263393…

Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model:

plot_mvgam_series(data = model_data, y = "survival")

A State-Space Beta regression

mvgam can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the mgcv function betar(), see ?mgcv::betar for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model:

mod0 <- mvgam(
  formula = survival ~ 1,
  trend_model = AR(),
  noncentred = TRUE,
  priors = prior(normal(-3.5, 0.5), class = Intercept),
  family = betar(),
  data = model_data,
  silent = 2
)

The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters:

summary(mod0)
#> GAM formula:
#> survival ~ 1
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]   82 220   560 1.01   173
#> 
#> GAM coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.6 -4.3    -4    1   524
#> 
#> standard deviation:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.11 0.4  0.65 1.01   160
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1]  2.4 6.4    77 1.01   273
#> 
#> autoregressive coef 1:
#>         2.5%  50% 97.5% Rhat n_eff
#> ar1[1] -0.32 0.68  0.98 1.01   310
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series:

plot(mod0, type = "trend")

Including time-varying upwelling effects

Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a dynamic() effect of CUI.apr in the latent process model. We do not specify the \(\rho\) parameter, instead opting to estimate it using a Hilbert space approximate GP:

mod1 <- mvgam(
  formula = survival ~ 1,
  trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1,
  trend_model = AR(),
  noncentred = TRUE,
  priors = prior(normal(-3.5, 0.5), class = Intercept),
  family = betar(),
  data = model_data,
  silent = 2
)

The summary for this model now includes estimates for the time-varying GP parameters:

summary(mod1, include_betas = FALSE)
#> GAM observation formula:
#> survival ~ 1
#> <environment: 0x0000017caa517728>
#> 
#> GAM process formula:
#> ~dynamic(CUI.apr, k = 25, scale = FALSE) - 1
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]  170 340   650 1.01   794
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.4 -3.8  -2.9    1   682
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.17 0.32  0.51    1   661
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.57 0.93     1    1   498
#> 
#> GAM process model gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                         2.5%  50% 97.5% Rhat n_eff
#> alpha_gp(time):CUI.apr 0.036 0.32   1.5 1.00   999
#> rho_gp(time):CUI.apr   1.300 5.80  43.0 1.01   607
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The estimates for the underlying dynamic process, and for the hindcasts, haven’t changed much:

plot(mod1, type = "trend")

plot(mod1, type = "forecast")

But the process error parameter \(\sigma\) is slightly smaller for this model than for the first model:

# Extract estimates of the process error 'sigma' for each model
mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>%
  dplyr::mutate(model = "Mod0")
mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>%
  dplyr::mutate(model = "Mod1")
sigmas <- rbind(mod0_sigma, mod1_sigma)

# Plot using ggplot2
require(ggplot2)
ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) +
  geom_density(alpha = 0.3, colour = NA) +
  coord_flip()

Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using plot():

plot(mod1, type = "smooths", trend_effects = TRUE)

Comparing model predictive performances

A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in mvgam for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular loo package:

loo_compare(mod0, mod1)
#>      elpd_diff se_diff
#> mod0     0.0       0.0
#> mod1 -1308.3     135.3

The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, mvgam also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don’t need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020).

lfo_mod0 <- lfo_cv(mod0, min_t = 30)
lfo_mod1 <- lfo_cv(mod1, min_t = 30)

The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD

sum(lfo_mod0$elpds)
#> [1] 35.92439
sum(lfo_mod1$elpds)
#> [1] 37.0323

We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts:

plot(
  x = 1:length(lfo_mod0$elpds) + 30,
  y = lfo_mod0$elpds - lfo_mod1$elpds,
  ylab = "ELPDmod0 - ELPDmod1",
  xlab = "Evaluation time point",
  pch = 16,
  col = "darkred",
  bty = "l"
)
abline(h = 0, lty = "dashed")

Comparing forecast skill for dynamic beta regression models in mvgam and R

A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in mvgam(). But for now, we will leave the model as-is.

Further reading

The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice:

Bürkner, PC, Gabry, J and Vehtari, A Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation. 90:14 (2020) 2499-2523.

Herrero, Asier, et al. From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits. Journal of Ecology 104.2 (2016): 430-442.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Scheuerell, Mark D., and John G. Williams. Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (Oncorhynchus Tshawytscha) Fisheries Oceanography 14 (2005): 448–57.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: doc/trend_formulas.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda")) ## -------------------------------------------------------------------------------- outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae") ## -------------------------------------------------------------------------------- # loop across each plankton group to create the long datframe plankton_data <- do.call( rbind, lapply(outcomes, function(x) { # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame( year = lakeWAplanktonTrans[, "Year"], month = lakeWAplanktonTrans[, "Month"], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, "Temp"] ) }) ) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() ## -------------------------------------------------------------------------------- head(plankton_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(plankton_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = plankton_data, series = "all") ## -------------------------------------------------------------------------------- plankton_data %>% dplyr::filter(series == "Other.algae") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3) + geom_line(aes(y = y), col = "darkred", size = 1.1) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Other algae (red)") ## -------------------------------------------------------------------------------- plankton_data %>% dplyr::filter(series == "Diatoms") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3) + geom_line(aes(y = y), col = "darkred", size = 1.1) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Diatoms (red)") ## -------------------------------------------------------------------------------- plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112) ## ----notrend_mod, include = FALSE, results='hide'-------------------------------- notrend_mod <- mvgam( y ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ## ----eval=FALSE------------------------------------------------------------------ # notrend_mod <- mvgam( # y ~ # # tensor of temp and month to capture # # "global" seasonality # te(temp, month, k = c(4, 4)) + # # # series-specific deviation tensor products # te(temp, month, k = c(4, 4), by = series) - 1, # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # trend_model = "None" # ) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 1) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 2) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 3) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 1) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 2) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 3) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "residuals", series = 1) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "residuals", series = 3) ## -------------------------------------------------------------------------------- priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train ) ## -------------------------------------------------------------------------------- priors[, 3] ## -------------------------------------------------------------------------------- priors[, 4] ## -------------------------------------------------------------------------------- priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ## ----var_mod, include = FALSE, results='hide'------------------------------------ var_mod <- mvgam( y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1 , family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(), priors = priors, adapt_delta = 0.99, burnin = 1000 ) ## ----eval=FALSE------------------------------------------------------------------ # var_mod <- mvgam( # # observation formula, which is empty # forumla = y ~ -1, # # # process model formula, which includes the smooth functions # trend_formula = ~ te(temp, month, k = c(4, 4)) + # te(temp, month, k = c(4, 4), by = trend) - 1, # # # VAR1 model with uncorrelated process errors # trend_model = VAR(), # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # # # include the updated priors # priors = priors, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(var_mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot(var_mod, "smooths", trend_effects = TRUE) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( var_mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( var_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist") ## -------------------------------------------------------------------------------- priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ## ----varcor_mod, include = FALSE, results='hide'--------------------------------- varcor_mod <- mvgam( y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1 , family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(cor = TRUE), burnin = 1000, adapt_delta = 0.99, priors = priors ) ## ----eval=FALSE------------------------------------------------------------------ # varcor_mod <- mvgam( # # observation formula, which remains empty # formula = y ~ -1, # # # process model formula, which includes the smooth functions # trend_formula = ~ te(temp, month, k = c(4, 4)) + # te(temp, month, k = c(4, 4), by = trend) - 1, # # # VAR1 model with correlated process errors # trend_model = VAR(cor = TRUE), # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # # # include the updated priors # priors = priors, # silent = 2 # ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( varcor_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## -------------------------------------------------------------------------------- Sigma_post <- as.matrix( varcor_mod, variable = "Sigma", regex = TRUE ) median_correlations <- cov2cor( matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5) ) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ## -------------------------------------------------------------------------------- irfs <- irf(varcor_mod, h = 12) ## -------------------------------------------------------------------------------- summary(irfs) ## -------------------------------------------------------------------------------- plot(irfs, series = 3) ## -------------------------------------------------------------------------------- fevds <- fevd(varcor_mod, h = 12) plot(fevds) ## -------------------------------------------------------------------------------- # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "variogram")$all_series$score - score(fcvar, score = "variogram")$all_series$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1]) ) abline(h = 0, lty = "dashed") ## -------------------------------------------------------------------------------- # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "energy")$all_series$score - score(fcvar, score = "energy")$all_series$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1]) ) abline(h = 0, lty = "dashed") ## -------------------------------------------------------------------------------- description <- how_to_cite(varcor_mod) ## ----eval = FALSE---------------------------------------------------------------- # description ## ----echo=FALSE------------------------------------------------------------------ cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ## ----echo=FALSE------------------------------------------------------------------ cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ================================================ FILE: doc/trend_formulas.Rmd ================================================ --- title: "State-Space models in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{State-Space models in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate State-Space models with nonlinear effects. ## State-Space Models ![Illustration of a basic State-Space model, which assumes that a latent dynamic *process* (X) can evolve independently from the way we take *observations* (Y) of that process](SS_model.svg){width=85%}
State-Space models allow us to separately make inferences about the underlying dynamic *process model* that we are interested in (i.e. the evolution of a time series or a collection of time series) and the *observation model* (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariates might impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of `mvgam` is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes. ### Lake Washington plankton data The data we will use to illustrate how we can fit State-Space models in `mvgam` are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the `MARSS` package and can be downloaded using the following: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda")) ``` We will work with five different groups of plankton: ```{r} outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae") ``` As usual, preparing the data into the correct format for `mvgam` modelling takes a little bit of wrangling in `dplyr`: ```{r} # loop across each plankton group to create the long datframe plankton_data <- do.call(rbind, lapply(outcomes, function(x) { # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame( year = lakeWAplanktonTrans[, "Year"], month = lakeWAplanktonTrans[, "Month"], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, "Temp"] ) })) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() ``` Inspect the data structure ```{r} head(plankton_data) ``` ```{r} dplyr::glimpse(plankton_data) ``` Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables) ```{r} plot_mvgam_series(data = plankton_data, series = "all") ``` We have some missing observations, but this isn't an issue for modelling in `mvgam`. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month: ```{r} plankton_data %>% dplyr::filter(series == "Other.algae") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Other algae (red)") ``` ```{r} plankton_data %>% dplyr::filter(series == "Diatoms") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Diatoms (red)") ``` We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits: ```{r} plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112) ``` Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of `mvgam`). ### Capturing seasonality First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a "global" tensor product of the `month` and `temp` variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group's seasonality differs from the overall "global" seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0. ```{r notrend_mod, include = FALSE, results='hide'} notrend_mod <- mvgam( y ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` ```{r eval=FALSE} notrend_mod <- mvgam( y ~ # tensor of temp and month to capture # "global" seasonality te(temp, month, k = c(4, 4)) + # series-specific deviation tensor products te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` The "global" tensor product smooth function can be quickly visualized: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 1) ``` On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the "global" pattern: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 2) ``` ```{r} plot_mvgam_smooth(notrend_mod, smooth = 3) ``` These multidimensional smooths have done a good job of capturing the seasonal variation in our observations: ```{r} plot(notrend_mod, type = "forecast", series = 1) ``` ```{r} plot(notrend_mod, type = "forecast", series = 2) ``` ```{r} plot(notrend_mod, type = "forecast", series = 3) ``` This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series: ```{r} plot(notrend_mod, type = "residuals", series = 1) ``` ```{r} plot(notrend_mod, type = "residuals", series = 3) ``` ### Multiseries dynamics Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*} Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors $\sigma_{obs}$). At present, `mvgam` does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time $t$ $(\mu_{process[t]})$ is a vector that evolves as a function of where the vector-valued process model was at time $t-1$. The $A$ matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by $\Sigma_{process}$, which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using `Stan`'s $LKJcorr$ distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, `mvgam` makes use of a recent breakthrough by [Sarah Heaps to enforce stationarity of Bayesian VAR processes](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). This is advantageous as we often don't expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.
Ok that was a lot to take in. Let's fit some models to try and inspect what is going on and what they assume. But first, we need to update `mvgam`'s default priors for the observation and process errors. By default, `mvgam` uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let's update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in `mvgam`: ```{r} priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train ) ``` Get names of all parameters whose priors can be modified: ```{r} priors[, 3] ``` And their default prior distributions: ```{r} priors[, 4] ``` Setting priors is easy in `mvgam` as you can use `brms` routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. `mvgam` accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated ```{r var_mod, include = FALSE, results='hide'} var_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(), priors = priors, adapt_delta = 0.99, burnin = 1000 ) ``` ```{r eval=FALSE} var_mod <- mvgam( # observation formula, which is empty forumla = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` ### Inspecting SS models This model's summary is a bit different to other `mvgam` summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called `A` in the output) and for the process errors (`Sigma`). Note that we set `include_betas = FALSE` to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret: ```{r} summary(var_mod, include_betas = FALSE) ``` The convergence of this model isn't fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using `trend_effects = TRUE` in the plotting functions: ```{r} plot(var_mod, "smooths", trend_effects = TRUE) ``` The autoregressive coefficient matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately `bayesplot` doesn't know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. Using `dir = 'v'` in the `facet_args` argument will accomplish this: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an *increase* in the process for series 3 (Greens) at time $t$ is expected to impact the process for series 1 (Bluegreens) at time $t+1$. The latent process model is now capturing these effects and the smooth seasonal effects. The process error $(\Sigma)$ captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` The observation error estimates $(\sigma_{obs})$ represent how much the model thinks we might miss the true count when we take our imperfect measurements: ```{r warning=FALSE, message=FALSE} mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist") ``` These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations. ### Correlated process errors Let's see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` And now we can fit the correlated process error model ```{r varcor_mod, include = FALSE, results='hide'} varcor_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(cor = TRUE), burnin = 1000, adapt_delta = 0.99, priors = priors ) ``` ```{r eval=FALSE} varcor_mod <- mvgam( # observation formula, which remains empty formula = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with correlated process errors trend_model = VAR(cor = TRUE), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` The $(\Sigma)$ matrix now captures any evidence of contemporaneously correlated process error: ```{r warning=FALSE, message=FALSE} mcmc_plot( varcor_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations: ```{r} Sigma_post <- as.matrix( varcor_mod, variable = "Sigma", regex = TRUE ) median_correlations <- cov2cor( matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5 ) ) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ``` ### Impulse response functions Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an [Impulse Response Function](https://en.wikipedia.org/wiki/Impulse_response) (IRF). If $h$ represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times $(t+1):h$ if a focal series is given an innovation "shock" at time $t = 0$. `mvgam` can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the `irf()` function and then use the S3 `plot()` function to view the estimated responses. By default, `irf()` will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps: ```{r} irfs <- irf(varcor_mod, h = 12) ``` A summary of the IRFs can be computed using the `summary()` function: ```{r} summary(irfs) ``` But it is easier to understand these responses using plots. For example, we can plot the expected responses of the remaining series to a positive shock for series 3 (Greens) using the `plot()` function: ```{r} plot(irfs, series = 3) ``` This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the $A$ matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models. You can also use these IRFs to calculate a relative contribution from each shock to the forecast error variance for a focal series. This method, known as a [Forecast Error Variance Decomposition](https://en.wikipedia.org/wiki/Variance_decomposition_of_forecast_errors) (FEVD), is useful to get an idea about the amount of information that each series contributes to the evolution of all other series in a Vector Autoregression: ```{r} fevds <- fevd(varcor_mod, h = 12) plot(fevds) ``` The plot above shows the median contribution to forecast error variance for each series. ### Comparing forecast scores But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set: ```{r} # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "variogram")$all_series$score - score(fcvar, score = "variogram")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1]) ) abline(h = 0, lty = "dashed") ``` And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated: ```{r} # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "energy")$all_series$score - score(fcvar, score = "energy")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1]) ) abline(h = 0, lty = "dashed") ``` The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. `mvgam` offers some utilities for doing this (i.e. see `?lfo_cv` for guidance). Alternatively, we could use forecasts from *both* models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the `ensemble()` function in `mvgam` (see `?ensemble` for guidance). Using `how_to_cite()` for models with VAR dynamics will give you information on how they are restricted to remain stationary: ```{r} description <- how_to_cite(varcor_mod) ``` ```{r, eval = FALSE} description ``` ```{r, echo=FALSE} cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ``` ```{r echo=FALSE} cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ``` More advanced hierarchical panel VAR models can also be handled by using the `gr` and `subgr` arguments in `VAR()`. These models are useful if you have a data for the same set of series (`subgr`) that are measured in different regions (`gr`), such as species measured in different sampling regions or financial series measured in different countries. ## Further reading The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. [A guide to state–space modeling of ecological time series](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470). *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Heaps, Sarah E. [Enforcing stationarity through the prior in vector autoregressions](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant](https://doi.org/10.1016/j.csda.2022.107659). *Computational Statistics & Data Analysis* 179 (2023): 107659. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. [MARSS: multivariate autoregressive state-space models for analyzing time-series data](https://journal.r-project.org/articles/RJ-2012-002/). *R Journal*. 4.1 (2012): 11. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Ward, Eric J., et al. [Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x). *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: doc/trend_formulas.html ================================================ State-Space models in mvgam

State-Space models in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate State-Space models with nonlinear effects.

State-Space Models

Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process
Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process


State-Space models allow us to separately make inferences about the underlying dynamic process model that we are interested in (i.e. the evolution of a time series or a collection of time series) and the observation model (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariates might impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of mvgam is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes.

Lake Washington plankton data

The data we will use to illustrate how we can fit State-Space models in mvgam are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the MARSS package and can be downloaded using the following:

load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda"))

We will work with five different groups of plankton:

outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae")

As usual, preparing the data into the correct format for mvgam modelling takes a little bit of wrangling in dplyr:

# loop across each plankton group to create the long datframe
plankton_data <- do.call(rbind, lapply(outcomes, function(x) {
  # create a group-specific dataframe with counts labelled 'y'
  # and the group name in the 'series' variable
  data.frame(
    year = lakeWAplanktonTrans[, "Year"],
    month = lakeWAplanktonTrans[, "Month"],
    y = lakeWAplanktonTrans[, x],
    series = x,
    temp = lakeWAplanktonTrans[, "Temp"]
  )
})) %>%
  # change the 'series' label to a factor
  dplyr::mutate(series = factor(series)) %>%
  # filter to only include some years in the data
  dplyr::filter(year >= 1965 & year < 1975) %>%
  dplyr::arrange(year, month) %>%
  dplyr::group_by(series) %>%
  # z-score the counts so they are approximately standard normal
  dplyr::mutate(y = as.vector(scale(y))) %>%
  # add the time indicator
  dplyr::mutate(time = dplyr::row_number()) %>%
  dplyr::ungroup()

Inspect the data structure

head(plankton_data)
#> # A tibble: 6 × 6
#>    year month       y series       temp  time
#>   <dbl> <dbl>   <dbl> <fct>       <dbl> <int>
#> 1  1965     1 -0.542  Greens      -1.23     1
#> 2  1965     1 -0.344  Bluegreens  -1.23     1
#> 3  1965     1 -0.0768 Diatoms     -1.23     1
#> 4  1965     1 -1.52   Unicells    -1.23     1
#> 5  1965     1 -0.491  Other.algae -1.23     1
#> 6  1965     2 NA      Greens      -1.32     2
dplyr::glimpse(plankton_data)
#> Rows: 600
#> Columns: 6
#> $ year   <dbl> 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196…
#> $ month  <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …
#> $ y      <dbl> -0.54241769, -0.34410776, -0.07684901, -1.52243490, -0.49055442…
#> $ series <fct> Greens, Bluegreens, Diatoms, Unicells, Other.algae, Greens, Blu…
#> $ temp   <dbl> -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.…
#> $ time   <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …

Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables)

plot_mvgam_series(data = plankton_data, series = "all")

We have some missing observations, but this isn’t an issue for modelling in mvgam. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month:

plankton_data %>%
  dplyr::filter(series == "Other.algae") %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y),
    col = "white",
    size = 1.3
  ) +
  geom_line(aes(y = y),
    col = "darkred",
    size = 1.1
  ) +
  ylab("z-score") +
  xlab("Time") +
  ggtitle("Temperature (black) vs Other algae (red)")

plankton_data %>%
  dplyr::filter(series == "Diatoms") %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y),
    col = "white",
    size = 1.3
  ) +
  geom_line(aes(y = y),
    col = "darkred",
    size = 1.1
  ) +
  ylab("z-score") +
  xlab("Time") +
  ggtitle("Temperature (black) vs Diatoms (red)")

We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits:

plankton_train <- plankton_data %>%
  dplyr::filter(time <= 112)
plankton_test <- plankton_data %>%
  dplyr::filter(time > 112)

Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of mvgam).

Capturing seasonality

First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a “global” tensor product of the month and temp variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group’s seasonality differs from the overall “global” seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0.

notrend_mod <- mvgam(
  y ~
    # tensor of temp and month to capture
    # "global" seasonality
    te(temp, month, k = c(4, 4)) +

    # series-specific deviation tensor products
    te(temp, month, k = c(4, 4), by = series) - 1,
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  trend_model = "None"
)

The “global” tensor product smooth function can be quickly visualized:

plot_mvgam_smooth(notrend_mod, smooth = 1)

On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the “global” pattern:

plot_mvgam_smooth(notrend_mod, smooth = 2)

plot_mvgam_smooth(notrend_mod, smooth = 3)

These multidimensional smooths have done a good job of capturing the seasonal variation in our observations:

plot(notrend_mod, type = "forecast", series = 1)

plot(notrend_mod, type = "forecast", series = 2)

plot(notrend_mod, type = "forecast", series = 3)

This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series:

plot(notrend_mod, type = "residuals", series = 1)

plot(notrend_mod, type = "residuals", series = 3)

Multiseries dynamics

Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows:

\[\begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*}\]

Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors \(\sigma_{obs}\)). At present, mvgam does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time \(t\) \((\mu_{process[t]})\) is a vector that evolves as a function of where the vector-valued process model was at time \(t-1\). The \(A\) matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by \(\Sigma_{process}\), which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using Stan’s \(LKJcorr\) distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, mvgam makes use of a recent breakthrough by Sarah Heaps to enforce stationarity of Bayesian VAR processes. This is advantageous as we often don’t expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.


Ok that was a lot to take in. Let’s fit some models to try and inspect what is going on and what they assume. But first, we need to update mvgam’s default priors for the observation and process errors. By default, mvgam uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let’s update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in mvgam:

priors <- get_mvgam_priors(
  # observation formula, which has no terms in it
  y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train
)

Get names of all parameters whose priors can be modified:

priors[, 3]
#>  [1] "(Intercept)"                                                                                                                                                                                                                                                           
#>  [2] "process error sd"                                                                                                                                                                                                                                                      
#>  [3] "diagonal autocorrelation population mean"                                                                                                                                                                                                                              
#>  [4] "off-diagonal autocorrelation population mean"                                                                                                                                                                                                                          
#>  [5] "diagonal autocorrelation population variance"                                                                                                                                                                                                                          
#>  [6] "off-diagonal autocorrelation population variance"                                                                                                                                                                                                                      
#>  [7] "shape1 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#>  [8] "shape1 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#>  [9] "shape2 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#> [10] "shape2 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#> [11] "observation error sd"                                                                                                                                                                                                                                                  
#> [12] "te(temp,month) smooth parameters, te(temp,month):trendtrend1 smooth parameters, te(temp,month):trendtrend2 smooth parameters, te(temp,month):trendtrend3 smooth parameters, te(temp,month):trendtrend4 smooth parameters, te(temp,month):trendtrend5 smooth parameters"

And their default prior distributions:

priors[, 4]
#>  [1] "(Intercept) ~ student_t(3, -0.1, 2.5);"
#>  [2] "sigma ~ inv_gamma(1.418, 0.452);"      
#>  [3] "es[1] = 0;"                            
#>  [4] "es[2] = 0;"                            
#>  [5] "fs[1] = sqrt(0.455);"                  
#>  [6] "fs[2] = sqrt(0.455);"                  
#>  [7] "gs[1] = 1.365;"                        
#>  [8] "gs[2] = 1.365;"                        
#>  [9] "hs[1] = 0.071175;"                     
#> [10] "hs[2] = 0.071175;"                     
#> [11] "sigma_obs ~ inv_gamma(1.418, 0.452);"  
#> [12] "lambda_trend ~ normal(5, 30);"

Setting priors is easy in mvgam as you can use brms routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors:

priors <- c(
  prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
  prior(normal(0.5, 0.25), class = sigma)
)

You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. mvgam accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated

var_mod <- mvgam(
  # observation formula, which is empty
  forumla = y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,

  # include the updated priors
  priors = priors,
  silent = 2
)

Inspecting SS models

This model’s summary is a bit different to other mvgam summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called A in the output) and for the process errors (Sigma). Note that we set include_betas = FALSE to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret:

summary(var_mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x0000017ff1154728>
#> 
#> GAM process formula:
#> ~te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), 
#>     by = trend) - 1
#> <environment: 0x0000017ff1154728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> VAR()
#> 
#> N process models:
#> 5 
#> 
#> N series:
#> 5 
#> 
#> N timepoints:
#> 120 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1500; warmup = 1000; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.20 0.26  0.34 1.01   412
#> sigma_obs[2] 0.24 0.40  0.54 1.02   193
#> sigma_obs[3] 0.43 0.65  0.83 1.15    29
#> sigma_obs[4] 0.25 0.37  0.49 1.01   242
#> sigma_obs[5] 0.31 0.43  0.56 1.03   226
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.26 0.34  0.42 1.01   463
#> sigma[2] 0.23 0.39  0.55 1.09    35
#> sigma[3] 0.10 0.49  0.80 1.33    14
#> sigma[4] 0.33 0.45  0.59 1.02   201
#> sigma[5] 0.22 0.35  0.51 1.04   142
#> 
#> var coefficient matrix:
#>          2.5%    50% 97.5% Rhat n_eff
#> A[1,1]  0.600  0.790 0.910 1.03   163
#> A[1,2] -0.420 -0.140 0.041 1.04   111
#> A[1,3] -0.210  0.016 0.310 1.00   345
#> A[1,4] -0.055  0.056 0.220 1.01   543
#> A[1,5] -0.037  0.120 0.380 1.03   181
#> A[2,1] -0.540 -0.190 0.025 1.03   128
#> A[2,2]  0.062  0.430 0.740 1.02   242
#> A[2,3] -0.290  0.026 1.300 1.20    26
#> A[2,4] -0.110  0.110 0.370 1.03   117
#> A[2,5] -0.040  0.230 0.640 1.03   133
#> A[3,1] -0.330 -0.030 0.190 1.02   333
#> A[3,2] -0.500 -0.051 0.320 1.03   193
#> A[3,3] -0.034  0.520 0.900 1.08    64
#> A[3,4] -0.090  0.120 0.500 1.04   142
#> A[3,5] -0.290  0.027 0.380 1.01   465
#> A[4,1] -0.450 -0.120 0.086 1.06    89
#> A[4,2] -0.660 -0.180 0.130 1.06   107
#> A[4,3] -0.270  0.086 1.300 1.16    30
#> A[4,4]  0.520  0.730 0.960 1.02   212
#> A[4,5] -0.051  0.190 0.650 1.03   150
#> A[5,1] -0.110  0.053 0.270 1.00   449
#> A[5,2] -0.430 -0.110 0.130 1.01   238
#> A[5,3] -0.150  0.060 0.780 1.12    41
#> A[5,4] -0.210 -0.040 0.110 1.01   373
#> A[5,5]  0.460  0.740 0.950 1.00   370
#> 
#> Approximate significance of GAM process smooths:
#>                               edf Ref.df Chi.sq p-value
#> te(temp,month)              3.374     15 37.656   0.427
#> te(temp,month):seriestrend1 2.798     15  3.441   0.996
#> te(temp,month):seriestrend2 4.454     15 48.402   0.245
#> te(temp,month):seriestrend3 1.748     15  3.363   1.000
#> te(temp,month):seriestrend4 1.352     15  6.409   0.999
#> te(temp,month):seriestrend5 3.085     15  6.703   0.979
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✖ Rhats above 1.05 found for some parameters
#>     Use pairs() and mcmc_plot() to investigate
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The convergence of this model isn’t fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using trend_effects = TRUE in the plotting functions:

plot(var_mod, "smooths", trend_effects = TRUE)

The autoregressive coefficient matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately bayesplot doesn’t know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. Using dir = 'v' in the facet_args argument will accomplish this:

mcmc_plot(
  var_mod,
  variable = 'A',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an increase in the process for series 3 (Greens) at time \(t\) is expected to impact the process for series 1 (Bluegreens) at time \(t+1\). The latent process model is now capturing these effects and the smooth seasonal effects.

The process error \((\Sigma)\) captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes:

mcmc_plot(
  var_mod,
  variable = 'Sigma',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

The observation error estimates \((\sigma_{obs})\) represent how much the model thinks we might miss the true count when we take our imperfect measurements:

mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist")

These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations.

Correlated process errors

Let’s see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors:

priors <- c(
  prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
  prior(normal(0.5, 0.25), class = sigma)
)

And now we can fit the correlated process error model

varcor_mod <- mvgam(
  # observation formula, which remains empty
  formula = y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with correlated process errors
  trend_model = VAR(cor = TRUE),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,

  # include the updated priors
  priors = priors,
  silent = 2
)

The \((\Sigma)\) matrix now captures any evidence of contemporaneously correlated process error:

mcmc_plot(
  varcor_mod,
  variable = 'Sigma',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations:

Sigma_post <- as.matrix(
  varcor_mod, 
  variable = "Sigma", 
  regex = TRUE
)
median_correlations <- cov2cor(
  matrix(apply(Sigma_post, 2, median),
         nrow = 5, 
         ncol = 5
  )
)
rownames(median_correlations) <- 
  colnames(median_correlations) <- 
  levels(plankton_train$series)

round(median_correlations, 2)
#>             Bluegreens Diatoms Greens Other.algae Unicells
#> Bluegreens        1.00   -0.20  -0.04        0.17     0.48
#> Diatoms          -0.20    1.00   0.13        0.45     0.17
#> Greens           -0.04    0.13   1.00        0.30    -0.05
#> Other.algae       0.17    0.45   0.30        1.00     0.28
#> Unicells          0.48    0.17  -0.05        0.28     1.00

Impulse response functions

Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an Impulse Response Function (IRF). If \(h\) represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times \((t+1):h\) if a focal series is given an innovation “shock” at time \(t = 0\). mvgam can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the irf() function and then use the S3 plot() function to view the estimated responses. By default, irf() will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps:

irfs <- irf(varcor_mod, h = 12)

A summary of the IRFs can be computed using the summary() function:

summary(irfs)
#> # A tibble: 300 × 5
#>    shock                  horizon irfQ50 irfQ2.5 irfQ97.5
#>    <chr>                    <int>  <dbl>   <dbl>    <dbl>
#>  1 Process_1 -> Process_1       1 0.350   0.264     0.441
#>  2 Process_1 -> Process_1       2 0.297   0.227     0.374
#>  3 Process_1 -> Process_1       3 0.251   0.190     0.323
#>  4 Process_1 -> Process_1       4 0.214   0.155     0.283
#>  5 Process_1 -> Process_1       5 0.182   0.125     0.253
#>  6 Process_1 -> Process_1       6 0.155   0.0966    0.227
#>  7 Process_1 -> Process_1       7 0.132   0.0744    0.205
#>  8 Process_1 -> Process_1       8 0.113   0.0563    0.185
#>  9 Process_1 -> Process_1       9 0.0967  0.0419    0.167
#> 10 Process_1 -> Process_1      10 0.0833  0.0307    0.152
#> # ℹ 290 more rows

But it is easier to understand these responses using plots. For example, we can plot the expected responses of the remaining series to a positive shock for series 3 (Greens) using the plot() function:

plot(irfs, series = 3)

This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the \(A\) matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models. You can also use these IRFs to calculate a relative contribution from each shock to the forecast error variance for a focal series. This method, known as a Forecast Error Variance Decomposition (FEVD), is useful to get an idea about the amount of information that each series contributes to the evolution of all other series in a Vector Autoregression:

fevds <- fevd(varcor_mod, h = 12)
plot(fevds)

The plot above shows the median contribution to forecast error variance for each series.

Comparing forecast scores

But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set:

# create forecast objects for each model
fcvar <- forecast(var_mod)
fcvarcor <- forecast(varcor_mod)

# plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = "variogram")$all_series$score -
  score(fcvar, score = "variogram")$all_series$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1])
)
abline(h = 0, lty = "dashed")

And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated:

# plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = "energy")$all_series$score -
  score(fcvar, score = "energy")$all_series$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1])
)
abline(h = 0, lty = "dashed")

The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. mvgam offers some utilities for doing this (i.e. see ?lfo_cv for guidance). Alternatively, we could use forecasts from both models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the ensemble() function in mvgam (see ?ensemble for guidance).

Using how_to_cite() for models with VAR dynamics will give you information on how they are restricted to remain stationary:

description <- how_to_cite(varcor_mod)
description
#> Methods text skeleton
#> We used the R package mvgam (version 1.1.52; Clark & Wells, 2023) to
#>   construct, fit and interrogate the model. mvgam fits Bayesian
#>   State-Space models that can include flexible predictor effects in both
#>   the process and observation components by incorporating functionalities
#>   from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,
#>   2023) packages. To encourage stability and prevent forecast variance
#>   from increasing indefinitely, we enforced stationarity of the Vector
#>   Autoregressive process following methods described by Heaps (2023) and
#>   Clark et al. (2025). The mvgam-constructed model and observed data were
#>   passed to the probabilistic programming environment Stan (version
#>   2.36.0; Carpenter et al. 2017, Stan Development Team 2026), specifically
#>   through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4
#>   Hamiltonian Monte Carlo chains for 1000 warmup iterations and 500
#>   sampling iterations for joint posterior estimation. Rank normalized
#>   split Rhat (Vehtari et al. 2021) and effective sample sizes were used to
#>   monitor convergence.
#> 
#> Primary references
#> Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models
#>   (DGAMs) for forecasting discrete ecological time series. Methods in
#>   Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974
#> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models
#>   Using Stan. Journal of Statistical Software, 80(1), 1-28.
#>   doi:10.18637/jss.v080.i01
#> Wood, SN (2017). Generalized Additive Models: An Introduction with R
#>   (2nd edition). Chapman and Hall/CRC.
#> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R
#>   Package splines2. Journal of Data Science, 19(3), 498-517.
#>   doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.
#> Heaps, SE (2023). Enforcing stationarity through the prior in vector
#>   autoregressions. Journal of Computational and Graphical Statistics 32,
#>   74-83.
#> Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM,
#>   Karunarathna KANK (2025). Beyond single-species models: leveraging
#>   multispecies forecasts to navigate the dynamics of ecological
#>   predictability. PeerJ 13:e18929.
#> Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M,
#>   Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic
#>   programming language. Journal of Statistical Software 76.
#> Gabry J, Cesnovar R, Johnson A, and Bronder S (2026). cmdstanr: R
#>   Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,
#>   https://discourse.mc-stan.org.
#> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).
#>   Rank-normalization, folding, and localization: An improved Rhat for
#>   assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)
#>   667-718. https://doi.org/10.1214/20-BA1221.
#> 
#> Other useful references
#> Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret
#>   statistical models using marginaleffects for R and Python. Journal of
#>   Statistical Software, 111(9), 1-32.
#>   https://doi.org/10.18637/jss.v111.i09
#> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).
#>   Visualization in Bayesian workflow. Journal of the Royal Statatistical
#>   Society A, 182, 389-402. doi:10.1111/rssa.12378.
#> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model
#>   evaluation using leave-one-out cross-validation and WAIC. Statistics and
#>   Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.
#> Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out
#>   cross-validation for Bayesian time series models. Journal of Statistical
#>   Computation and Simulation, 90(14), 2499-2523.
#>   https://doi.org/10.1080/00949655.2020.1783262

More advanced hierarchical panel VAR models can also be handled by using the gr and subgr arguments in VAR(). These models are useful if you have a data for the same set of series (subgr) that are measured in different regions (gr), such as species measured in different sampling regions or financial series measured in different countries.

Further reading

The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice:

Auger‐Méthé, Marie, et al. A guide to state–space modeling of ecological time series. Ecological Monographs 91.4 (2021): e01470.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

Heaps, Sarah E. Enforcing stationarity through the prior in vector autoregressions. Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis 179 (2023): 107659.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. MARSS: multivariate autoregressive state-space models for analyzing time-series data. R Journal. 4.1 (2012): 11.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Ward, Eric J., et al. Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico. Journal of Applied Ecology 47.1 (2010): 47-56.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/404.html ================================================ Page not found (404) • mvgam Skip to contents
Content not found. Please use links in the navbar.
================================================ FILE: docs/CODE_OF_CONDUCT.html ================================================ Contributor Covenant Code of Conduct • mvgam Skip to contents

Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Our Standards

Examples of behavior that contributes to a positive environment for our community include:

  • Demonstrating empathy and kindness toward other people
  • Being respectful of differing opinions, viewpoints, and experiences
  • Giving and gracefully accepting constructive feedback
  • Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
  • Focusing on what is best not just for us as individuals, but for the overall community

Examples of unacceptable behavior include:

  • The use of sexualized language or imagery, and sexual attention or advances of any kind
  • Trolling, insulting or derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others’ private information, such as a physical or email address, without their explicit permission
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.

Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.

Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at . All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the reporter of any incident.

Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:

1. Correction

Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.

Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.

2. Warning

Community Impact: A violation through a single incident or series of actions.

Consequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.

3. Temporary Ban

Community Impact: A serious violation of community standards, including sustained inappropriate behavior.

Consequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.

4. Permanent Ban

Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.

Consequence: A permanent ban from any sort of public interaction within the community.

Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 2.1, available at https://www.contributor-covenant.org/version/2/1/code_of_conduct.html.

Community Impact Guidelines were inspired by [Mozilla’s code of conduct enforcement ladder][https://github.com/mozilla/inclusion].

For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.

================================================ FILE: docs/CONTRIBUTING.html ================================================ Contributing to mvgam • mvgam Skip to contents

This document outlines how to propose a change to mvgam. For a detailed discussion on contributing to this and other open source R packages, please see the development contributing guide and our code review principles.

Fixing typos

You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the source file. This generally means you’ll need to edit roxygen2 comments in an .R, not a .Rd file. You can find the .R file that generates the .Rd by reading the comment in the first line.

Bigger changes

If you want to make a bigger change, it’s a good idea to first file an issue and make sure someone from the team agrees that it’s needed. If you’ve found a bug, please file an issue that illustrates the bug with a minimal reprex (this will also help you write a unit test, if needed). See the tidyverse guide on how to create a great issue for more advice.

Pull request process

  • Fork the package and clone onto your computer. If you haven’t done this before, we recommend using usethis::create_from_github("nicholasjclark/mvgam", fork = TRUE).

  • Install all development dependencies with devtools::install_dev_deps(), and then make sure the package passes R CMD check by running devtools::check(). If R CMD check doesn’t pass cleanly, it’s a good idea to ask for help before continuing.

  • Create a Git branch for your pull request (PR). We recommend using usethis::pr_init("brief-description-of-change").

  • Make your changes, commit to git, and then create a PR by running usethis::pr_push(), and following the prompts in your browser. The title of your PR should briefly describe the change. The body of your PR should contain Fixes #issue-number.

  • For user-facing changes, add a bullet to the top of NEWS.md (i.e. just below the first header). Follow the style described in https://style.tidyverse.org/news.html.

Code style

  • New code should follow the tidyverse style guide where possible. You can use the styler package to apply these styles, but please don’t restyle code that has nothing to do with your PR.

  • We use roxygen2, with Markdown syntax, for documentation.

  • We use testthat for unit tests. Contributions with test cases included are easier to accept.

Code of Conduct

Please note that the mvgam project is released with a Contributor Code of Conduct. By contributing to this project you agree to abide by its terms.

Roadmap

The mvgam package is in a stable state of development, with some degree of active subsequent development as envisioned by the primary authors.

================================================ FILE: docs/LICENSE-text.html ================================================ License • mvgam Skip to contents
YEAR: 2021
COPYRIGHT HOLDER: Nicholas Clark
================================================ FILE: docs/LICENSE.html ================================================ MIT License • mvgam Skip to contents

Copyright (c) 2021 Nicholas Clark

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

================================================ FILE: docs/articles/data_in_mvgam.html ================================================ Formatting data for use in mvgam • mvgam Skip to contents

This vignette gives an example of how to take raw data and format it for use in mvgam. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for mvgam. For full details on the basic mvgam functionality, please see the introductory vignette and the growing set of walk through video tutorials on mvgam applications.

Required long data format

Manipulating the data into a ‘long’ format is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to pass as data for to the two primary modelling functions, mvgam() and jsdgam(). A simple example can be viewed by simulating data using the sim_mvgam() function. See ?sim_mvgam for more details

simdat <- sim_mvgam(n_series = 4, T = 24, prop_missing = 0.2)
head(simdat$data_train, 16)
#>     y season year   series time
#> 1   1      1    1 series_1    1
#> 2   0      1    1 series_2    1
#> 3  NA      1    1 series_3    1
#> 4   2      1    1 series_4    1
#> 5   1      2    1 series_1    2
#> 6   2      2    1 series_2    2
#> 7   0      2    1 series_3    2
#> 8   0      2    1 series_4    2
#> 9   0      3    1 series_1    3
#> 10 NA      3    1 series_2    3
#> 11  0      3    1 series_3    3
#> 12 NA      3    1 series_4    3
#> 13  1      4    1 series_1    4
#> 14 NA      4    1 series_2    4
#> 15  0      4    1 series_3    4
#> 16  0      4    1 series_4    4

series as a factor variable

Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a factor variable.

class(simdat$data_train$series)
#> [1] "factor"
levels(simdat$data_train$series)
#> [1] "series_1" "series_2" "series_3" "series_4"

It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including mvgam() and get_mvgam_priors()) will give an error if this is not the case, but it may be worth checking anyway:

all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series))
#> [1] TRUE

Note that you can technically supply data that does not have a series indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the ?AR documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the ?ZMVN documentation).

A single outcome variable

You may also have notices that we do not spread the numeric / integer-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data (though the outcome does not have to be labelled y). This is another important requirement in mvgam, but it shouldn’t be too unfamiliar to R users who frequently use modelling packages such as lme4, mgcv, brms or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series:

summary(glm(y ~ series + time,
            data = simdat$data_train,
            family = poisson()))
#> 
#> Call:
#> glm(formula = y ~ series + time, family = poisson(), data = simdat$data_train)
#> 
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)
#> (Intercept)     0.01533    0.34999   0.044    0.965
#> seriesseries_2  0.05442    0.33825   0.161    0.872
#> seriesseries_3 -0.26496    0.36853  -0.719    0.472
#> seriesseries_4 -0.27086    0.37753  -0.717    0.473
#> time            0.01117    0.02526   0.442    0.658
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 74.256  on 58  degrees of freedom
#> Residual deviance: 72.679  on 54  degrees of freedom
#>   (13 observations deleted due to missingness)
#> AIC: 165.69
#> 
#> Number of Fisher Scoring iterations: 5
summary(mgcv::gam(y ~ series + s(time, by = series),
            data = simdat$data_train,
            family = poisson()))
#> 
#> Family: poisson 
#> Link function: log 
#> 
#> Formula:
#> y ~ series + s(time, by = series)
#> 
#> Parametric coefficients:
#>                Estimate Std. Error z value Pr(>|z|)
#> (Intercept)     0.11283    0.24567   0.459    0.646
#> seriesseries_2  0.07173    0.34072   0.211    0.833
#> seriesseries_3 -0.91273    0.58162  -1.569    0.117
#> seriesseries_4 -1.26425    0.85403  -1.480    0.139
#> 
#> Approximate significance of smooth terms:
#>                          edf Ref.df Chi.sq p-value
#> s(time):seriesseries_1 1.000  1.000  0.274   0.601
#> s(time):seriesseries_2 1.000  1.000  0.017   0.896
#> s(time):seriesseries_3 2.904  3.639  6.329   0.142
#> s(time):seriesseries_4 5.916  6.950  7.541   0.375
#> 
#> R-sq.(adj) =  0.222   Deviance explained = 40.3%
#> UBRE = 0.25423  Scale est. = 1         n = 59

Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values >= 1 or <= 0 are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in R will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. mvgam will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be < 1) and attempt a Beta regression in mvgam using the betar family:

gauss_dat <- data.frame(outcome = rnorm(10),
                        series = factor('series1',
                                        levels = 'series1'),
                        time = 1:10)
gauss_dat
#>       outcome  series time
#> 1   0.8191774 series1    1
#> 2   0.4884789 series1    2
#> 3  -0.5435451 series1    3
#> 4  -0.8796024 series1    4
#> 5  -0.8100262 series1    5
#> 6   2.0134615 series1    6
#> 7  -0.1403682 series1    7
#> 8  -0.6321474 series1    8
#> 9  -1.0406050 series1    9
#> 10  0.2040207 series1   10

A call to gam() using the mgcv package leads to a model that actually fits (though it does give an unhelpful warning message):

mgcv::gam(outcome ~ time,
    family = betar(),
    data = gauss_dat)
#> 
#> Family: Beta regression(0.124) 
#> Link function: logit 
#> 
#> Formula:
#> outcome ~ time
#> Total model degrees of freedom 2 
#> 
#> REML score: -177.8074

But the same call to mvgam() gives us something more useful:

mvgam(outcome ~ time,
      family = betar(),
      data = gauss_dat)
#> Error: Values <= 0 not allowed for beta responses

Please see ?mvgam_families for more information on the types of responses that the package can handle and their restrictions

A time variable

The other requirement for most models that can be fit in mvgam is a numeric / integer-classed variable labelled time. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in mvgam (see ?mvgam_trends for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc…). But note that you can have missing observations for some (or all) series. mvgam() will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple dplyr call:

# A function to ensure all timepoints within a sequence are identical
all_times_avail = function(time, min_time, max_time){
    identical(as.numeric(sort(time)),
              as.numeric(seq.int(from = min_time, to = max_time)))
}

# Get min and max times from the data
min_time <- min(simdat$data_train$time)
max_time <- max(simdat$data_train$time)

# Check that all times are recorded for each series
data.frame(series = simdat$data_train$series,
           time = simdat$data_train$time) %>%
    dplyr::group_by(series) %>%
    dplyr::summarise(all_there = all_times_avail(time,
                                                 min_time,
                                                 max_time)) -> checked_times
if(any(checked_times$all_there == FALSE)){
  warning("One or more series in is missing observations for one or more timepoints")
} else {
  cat('All series have observations at all timepoints :)')
}
#> All series have observations at all timepoints :)

Note that models which use dynamic components will assume that smaller values of time are older (i.e. time = 1 came before time = 2, etc…)

Irregular sampling intervals?

Most mvgam dynamic trend models expect time to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the CAR() trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the Examples section in ?CAR. You can also use trend_model = 'None' (the default in mvgam()) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the ?brms::gp for details. But to reiterate the point from above, if you do not have time series data (or don’t want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use trend_model = ZMVN(...) without the need for a time variable (see ?ZMVN for details).

Checking data with get_mvgam_priors()

The get_mvgam_priors() function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn’t anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to get_mvgam_priors(). Here we first simulate some data in which some of the timepoints in the time variable are not included in the data:

bad_times <- data.frame(time = seq(1, 16, by = 2),
                        series = factor('series_1'),
                        outcome = rnorm(8))
bad_times
#>   time   series     outcome
#> 1    1 series_1  0.42056146
#> 2    3 series_1  0.43929390
#> 3    5 series_1 -2.37409657
#> 4    7 series_1  0.07924204
#> 5    9 series_1 -1.10342469
#> 6   11 series_1 -0.27835182
#> 7   13 series_1  0.28225978
#> 8   15 series_1  0.36767722

Next we call get_mvgam_priors() by simply specifying an intercept-only model, which is enough to trigger all the checks:

get_mvgam_priors(outcome ~ 1,
                 data = bad_times,
                 family = gaussian())
#> Error: One or more series in data is missing observations for one or more timepoints

This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don’t have any covariates, it should be pretty easy using expand.grid():

bad_times %>%
  dplyr::right_join(expand.grid(time = seq(min(bad_times$time),
                                           max(bad_times$time)),
                                series = factor(unique(bad_times$series),
                                                levels = levels(bad_times$series)))) %>%
  dplyr::arrange(time) -> good_times
good_times
#>    time   series     outcome
#> 1     1 series_1  0.42056146
#> 2     2 series_1          NA
#> 3     3 series_1  0.43929390
#> 4     4 series_1          NA
#> 5     5 series_1 -2.37409657
#> 6     6 series_1          NA
#> 7     7 series_1  0.07924204
#> 8     8 series_1          NA
#> 9     9 series_1 -1.10342469
#> 10   10 series_1          NA
#> 11   11 series_1 -0.27835182
#> 12   12 series_1          NA
#> 13   13 series_1  0.28225978
#> 14   14 series_1          NA
#> 15   15 series_1  0.36767722

Now the call to get_mvgam_priors(), using our filled in data, should work:

get_mvgam_priors(outcome ~ 1,
                 data = good_times,
                 family = gaussian())
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                  example_change
#> 1 (Intercept) ~ student_t(3, 0.2, 2.5);     (Intercept) ~ normal(0, 1);
#> 2     sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(0.79, 0.68);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

This function should also pick up on misaligned factor levels for the series variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data:

bad_levels <- data.frame(time = 1:8,
                        series = factor('series_1',
                                        levels = c('series_1',
                                                   'series_2')),
                        outcome = rnorm(8))

levels(bad_levels$series)
#> [1] "series_1" "series_2"

Another call to get_mvgam_priors() brings up a useful error:

get_mvgam_priors(outcome ~ 1,
                 data = bad_levels,
                 family = gaussian())
#> Error: Mismatch between factor levels of "series" and unique values of "series"
#> Use
#>   `setdiff(levels(data$series), unique(data$series))` 
#> and
#>   `intersect(levels(data$series), unique(data$series))`
#> for guidance

Following the message’s advice tells us there is a level for series_2 in the series variable, but there are no observations for this series in the data:

setdiff(levels(bad_levels$series), unique(bad_levels$series))
#> [1] "series_2"

Re-assigning the levels fixes the issue:

bad_levels %>%
  dplyr::mutate(series = droplevels(series)) -> good_levels
levels(good_levels$series)
#> [1] "series_1"
get_mvgam_priors(outcome ~ 1,
                 data = good_levels,
                 family = gaussian())
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                  example_change
#> 1 (Intercept) ~ student_t(3, 0.5, 2.5);     (Intercept) ~ normal(0, 1);
#> 2     sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(0.55, 0.12);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

Covariates with no NAs

Covariates can be used in models just as you would when using mgcv (see ?formula.gam for details of the formula syntax). But although the outcome variable can have NAs, covariates cannot. Most regression software will silently drop any raws in the model matrix that have NAs, which is not helpful when debugging. Both the mvgam() and get_mvgam_priors() functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values:

miss_dat <- data.frame(outcome = rnorm(10),
                       cov = c(NA, rnorm(9)),
                       series = factor('series1',
                                       levels = 'series1'),
                       time = 1:10)
miss_dat
#>        outcome        cov  series time
#> 1   0.99175685         NA series1    1
#> 2   0.34167507 -0.4104885 series1    2
#> 3  -1.10727293  1.4905883 series1    3
#> 4   0.57138854  0.5616821 series1    4
#> 5   1.35392769  0.3999663 series1    5
#> 6   0.14515171 -0.2146278 series1    6
#> 7   1.34566674  0.8597645 series1    7
#> 8   0.60786914  0.5446883 series1    8
#> 9  -1.83739244  0.7429980 series1    9
#> 10 -0.03545219  0.9991543 series1   10
get_mvgam_priors(outcome ~ cov,
                 data = miss_dat,
                 family = gaussian())
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                  cov            1     cov fixed effect
#> 3 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                example_change
#> 1 (Intercept) ~ student_t(3, 0.5, 2.5);   (Intercept) ~ normal(0, 1);
#> 2             cov ~ student_t(3, 0, 2);           cov ~ normal(0, 1);
#> 3     sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(-1, 0.27);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA

Just like with the mgcv package, mvgam can also accept data as a list object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by mvgam should still work on these data. Here we change the cov predictor to be a matrix:

miss_dat <- list(outcome = rnorm(10),
                 series = factor('series1',
                                 levels = 'series1'),
                 time = 1:10)
miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10)
miss_dat$cov[2,3] <- NA

A call to get_mvgam_priors() returns the same error:

get_mvgam_priors(outcome ~ cov,
                 data = miss_dat,
                 family = gaussian())
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                 cov1            1    cov1 fixed effect
#> 3                                 cov2            1    cov2 fixed effect
#> 4                                 cov3            1    cov3 fixed effect
#> 5                                 cov4            1    cov4 fixed effect
#> 6                                 cov5            1    cov5 fixed effect
#> 7 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                  example_change
#> 1 (Intercept) ~ student_t(3, 0.4, 2.5);     (Intercept) ~ normal(0, 1);
#> 2            cov1 ~ student_t(3, 0, 2);            cov1 ~ normal(0, 1);
#> 3            cov2 ~ student_t(3, 0, 2);            cov2 ~ normal(0, 1);
#> 4            cov3 ~ student_t(3, 0, 2);            cov3 ~ normal(0, 1);
#> 5            cov4 ~ student_t(3, 0, 2);            cov4 ~ normal(0, 1);
#> 6            cov5 ~ student_t(3, 0, 2);            cov5 ~ normal(0, 1);
#> 7     sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(0.69, 0.21);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA
#> 4             NA             NA
#> 5             NA             NA
#> 6             NA             NA
#> 7             NA             NA

Plotting with plot_mvgam_series()

Plotting the data is a useful way to ensure everything looks ok, once you’ve gone throug the above checks on factor levels and timepoint x series combinations. The plot_mvgam_series() function will take supplied data and plot either a series of line plots (if you choose series = 'all') or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use:

plot_mvgam_series(data = simdat$data_train, 
                  y = 'y', 
                  series = 'all')

Plotting time series features for GAM models in mvgam

Or we can look more closely at the distribution for the first time series:

plot_mvgam_series(data = simdat$data_train, 
                  y = 'y', 
                  series = 1)

Plotting time series features for GAM models in mvgam

If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots:

plot_mvgam_series(data = simdat$data_train,
                  newdata = simdat$data_test,
                  y = 'y', 
                  series = 1)

Plotting time series features for GAM models in mvgam

Example with NEON tick data

To give one example of how data can be reformatted for mvgam modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. Ixodes scapularis is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised long-term monitoring of tick abundances as well as other important indicators of ecological change. Nymphal abundance of I. scapularis is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs.

We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in Clark & Wells 2022. You can read a bit about the data using the call ?all_neon_tick_data

data("all_neon_tick_data")
str(dplyr::ungroup(all_neon_tick_data))
#> tibble [3,505 × 24] (S3: tbl_df/tbl/data.frame)
#>  $ Year                : num [1:3505] 2015 2015 2015 2015 2015 ...
#>  $ epiWeek             : chr [1:3505] "37" "38" "39" "40" ...
#>  $ yearWeek            : chr [1:3505] "201537" "201538" "201539" "201540" ...
#>  $ plotID              : chr [1:3505] "BLAN_005" "BLAN_005" "BLAN_005" "BLAN_005" ...
#>  $ siteID              : chr [1:3505] "BLAN" "BLAN" "BLAN" "BLAN" ...
#>  $ nlcdClass           : chr [1:3505] "deciduousForest" "deciduousForest" "deciduousForest" "deciduousForest" ...
#>  $ decimalLatitude     : num [1:3505] 39.1 39.1 39.1 39.1 39.1 ...
#>  $ decimalLongitude    : num [1:3505] -78 -78 -78 -78 -78 ...
#>  $ elevation           : num [1:3505] 168 168 168 168 168 ...
#>  $ totalSampledArea    : num [1:3505] 162 NA NA NA 162 NA NA NA NA 164 ...
#>  $ amblyomma_americanum: num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ ixodes_scapularis   : num [1:3505] 2 NA NA NA 0 NA NA NA NA 0 ...
#>  $ time                : Date[1:3505], format: "2015-09-13" "2015-09-20" ...
#>  $ RHMin_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMin_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ soi                 : num [1:3505] -18.4 -17.9 -23.5 -28.4 -25.9 ...
#>  $ cum_sdd             : num [1:3505] 173 173 173 173 173 ...
#>  $ cum_gdd             : num [1:3505] 1129 1129 1129 1129 1129 ...

For this exercise, we will use the epiWeek variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the plotID column):

plotIDs <- c('SCBI_013','SCBI_002',
             'SERC_001','SERC_005',
             'SERC_006','SERC_012',
             'BLAN_012','BLAN_005')

Now we can select the target species we want (I. scapularis), filter to the correct plot IDs and convert the epiWeek variable from character to numeric:

model_dat <- all_neon_tick_data %>%
  dplyr::ungroup() %>%
  dplyr::mutate(target = ixodes_scapularis) %>%
  dplyr::filter(plotID %in% plotIDs) %>%
  dplyr::select(Year, epiWeek, plotID, target) %>%
  dplyr::mutate(epiWeek = as.numeric(epiWeek))

Now is the tricky part: we need to fill in missing observations with NAs. The tick data are sparse in that field observers do not go out and sample in each possible epiWeek. So there are many particular weeks in which observations are not included in the data. But we can use expand.grid() again to take care of this:

model_dat %>%
  # Create all possible combos of plotID, Year and epiWeek; 
  # missing outcomes will be filled in as NA
  dplyr::full_join(expand.grid(plotID = unique(model_dat$plotID),
                               Year = unique(model_dat$Year),
                               epiWeek = seq(1, 52))) %>%
  
  # left_join back to original data so plotID and siteID will
  # match up, in case you need the siteID for anything else later on
  dplyr::left_join(all_neon_tick_data %>%
                     dplyr::select(siteID, plotID) %>%
                     dplyr::distinct()) -> model_dat

Create the series variable needed for mvgam modelling:

model_dat %>%
  dplyr::mutate(series = plotID,
                y = target) %>%
  dplyr::mutate(siteID = factor(siteID),
                series = factor(series)) %>%
  dplyr::select(-target, -plotID) %>%
  dplyr::arrange(Year, epiWeek, series) -> model_dat 

Now create the time variable, which needs to track Year and epiWeek for each unique series. The n function from dplyr is often useful if generating a time index for grouped dataframes:

model_dat %>%
  dplyr::ungroup() %>%
  dplyr::group_by(series) %>%
  dplyr::arrange(Year, epiWeek) %>%
  dplyr::mutate(time = seq(1, dplyr::n())) %>%
  dplyr::ungroup() -> model_dat

Check factor levels for the series:

levels(model_dat$series)
#> [1] "BLAN_005" "BLAN_012" "SCBI_002" "SCBI_013" "SERC_001" "SERC_005" "SERC_006"
#> [8] "SERC_012"

This looks good, as does a more rigorous check using get_mvgam_priors():

get_mvgam_priors(y ~ 1,
                 data = model_dat,
                 family = poisson())
#>    param_name param_length  param_info                                  prior
#> 1 (Intercept)            1 (Intercept) (Intercept) ~ student_t(3, -2.3, 2.5);
#>                example_change new_lowerbound new_upperbound
#> 1 (Intercept) ~ normal(0, 1);             NA             NA

We can also set up a model in mvgam() but use run_model = FALSE to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the cmdstanr backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated Stan code for any inefficiencies that can be fixed to lead to sampling performance improvements:

testmod <- mvgam(y ~ s(epiWeek, by = series, bs = 'cc') +
                   s(series, bs = 're'),
                 trend_model = 'AR1',
                 data = model_dat,
                 backend = 'cmdstanr',
                 run_model = FALSE)

This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling:

str(testmod$model_data)
#> List of 25
#>  $ y           : num [1:416, 1:8] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
#>  $ n           : int 416
#>  $ X           : num [1:3328, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:3328] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ zero        : num [1:73] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ S2          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S3          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S4          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S5          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S6          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S7          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S8          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 0.907
#>  $ ytimes      : int [1:416, 1:8] 1 9 17 25 33 41 49 57 65 73 ...
#>  $ n_series    : int 8
#>  $ sp          : Named num [1:9] 0.368 0.368 0.368 0.368 0.368 ...
#>   ..- attr(*, "names")= chr [1:9] "s(epiWeek):seriesBLAN_005" "s(epiWeek):seriesBLAN_012" "s(epiWeek):seriesSCBI_002" "s(epiWeek):seriesSCBI_013" ...
#>  $ y_observed  : num [1:416, 1:8] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ total_obs   : int 3328
#>  $ num_basis   : int 73
#>  $ n_sp        : num 9
#>  $ n_nonmissing: int 400
#>  $ obs_ind     : int [1:400] 89 93 98 101 115 118 121 124 127 130 ...
#>  $ flat_ys     : num [1:400] 2 0 0 0 0 0 0 25 36 14 ...
#>  $ flat_xs     : num [1:400, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:400] "705" "737" "777" "801" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "AR1"
code(testmod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   matrix[8, 8] S2; // mgcv smooth penalty matrix S2
#>   matrix[8, 8] S3; // mgcv smooth penalty matrix S3
#>   matrix[8, 8] S4; // mgcv smooth penalty matrix S4
#>   matrix[8, 8] S5; // mgcv smooth penalty matrix S5
#>   matrix[8, 8] S6; // mgcv smooth penalty matrix S6
#>   matrix[8, 8] S7; // mgcv smooth penalty matrix S7
#>   matrix[8, 8] S8; // mgcv smooth penalty matrix S8
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 65] = b_raw[1 : 65];
#>   b[66 : 73] = mu_raw[1] + b_raw[66 : 73] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, -2.3, 2.5);
#>   
#>   // prior for s(epiWeek):seriesBLAN_005...
#>   b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior for s(epiWeek):seriesBLAN_012...
#>   b_raw[10 : 17] ~ multi_normal_prec(zero[10 : 17],
#>                                      S2[1 : 8, 1 : 8] * lambda[2]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_002...
#>   b_raw[18 : 25] ~ multi_normal_prec(zero[18 : 25],
#>                                      S3[1 : 8, 1 : 8] * lambda[3]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_013...
#>   b_raw[26 : 33] ~ multi_normal_prec(zero[26 : 33],
#>                                      S4[1 : 8, 1 : 8] * lambda[4]);
#>   
#>   // prior for s(epiWeek):seriesSERC_001...
#>   b_raw[34 : 41] ~ multi_normal_prec(zero[34 : 41],
#>                                      S5[1 : 8, 1 : 8] * lambda[5]);
#>   
#>   // prior for s(epiWeek):seriesSERC_005...
#>   b_raw[42 : 49] ~ multi_normal_prec(zero[42 : 49],
#>                                      S6[1 : 8, 1 : 8] * lambda[6]);
#>   
#>   // prior for s(epiWeek):seriesSERC_006...
#>   b_raw[50 : 57] ~ multi_normal_prec(zero[50 : 57],
#>                                      S7[1 : 8, 1 : 8] * lambda[7]);
#>   
#>   // prior for s(epiWeek):seriesSERC_012...
#>   b_raw[58 : 65] ~ multi_normal_prec(zero[58 : 65],
#>                                      S8[1 : 8, 1 : 8] * lambda[8]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[66 : 73] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/forecast_evaluation.html ================================================ Forecasting and forecast evaluation in mvgam • mvgam Skip to contents

The purpose of this vignette is to show how the mvgam package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules.

Simulating discrete time series

We begin by simulating some data to show how forecasts are computed and evaluated in mvgam. The sim_mvgam() function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting trend_model = GP() and prop_trend = 0.75, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing.

set.seed(1)
simdat <- sim_mvgam(
  T = 100,
  n_series = 3,
  mu = 2,
  trend_model = GP(),
  prop_trend = 0.75,
  family = poisson(),
  prop_missing = 0.10
)

The returned object is a list containing training and testing data (sim_mvgam() automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data

str(simdat)
#> List of 6
#>  $ data_train        :'data.frame':  225 obs. of  5 variables:
#>   ..$ y     : int [1:225] 6 NA 11 2 5 20 7 8 NA 11 ...
#>   ..$ season: int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>   ..$ year  : int [1:225] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>  $ data_test         :'data.frame':  75 obs. of  5 variables:
#>   ..$ y     : int [1:75] 4 23 8 3 NA 3 1 20 8 3 ...
#>   ..$ season: int [1:75] 4 4 4 5 5 5 6 6 6 7 ...
#>   ..$ year  : int [1:75] 7 7 7 7 7 7 7 7 7 7 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:75] 76 76 76 77 77 77 78 78 78 79 ...
#>  $ true_corrs        : num [1:3, 1:3] 1 0.0861 0.1161 0.0861 1 ...
#>  $ true_trends       : num [1:100, 1:3] -0.851 -0.758 -0.664 -0.571 -0.48 ...
#>  $ global_seasonality: num [1:100] -0.966 -0.197 0.771 1.083 0.37 ...
#>  $ trend_params      :List of 2
#>   ..$ alpha: num [1:3] 0.883 0.936 1.036
#>   ..$ rho  : num [1:3] 7.54 4.01 7.49

Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts:

plot_mvgam_series(
  data = simdat$data_train,
  series = "all"
)

Plotting time series features for GAM models in mvgam

For individual series, we can plot the training and testing data, as well as some more specific features of the observed data:

plot_mvgam_series(
  data = simdat$data_train,
  newdata = simdat$data_test,
  series = 1
)

Plotting time series features for GAM models in mvgam

Modelling dynamics with splines

The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible:

mod1 <- mvgam(
  y ~ s(season, bs = "cc", k = 8) +
    s(time, by = series, bs = "cr", k = 20),
  knots = list(season = c(0.5, 12.5)),
  trend_model = "None",
  data = simdat$data_train,
  silent = 2
)

The model fits without issue:

summary(mod1, include_betas = FALSE)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.9 1.9     2    1  1458
#> 
#> Approximate significance of GAM smooths:
#>                         edf Ref.df Chi.sq p-value    
#> s(season)              3.50      6   23.0 < 2e-16 ***
#> s(time):seriesseries_1 6.45     19   55.4 0.00042 ***
#> s(time):seriesseries_2 9.70     19   45.5 < 2e-16 ***
#> s(time):seriesseries_3 5.52     19   56.6   9e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 60 of 2000 iterations saturated the maximum tree depth of 10 (3%)
#>  *Run with max_treedepth set to a larger value to avoid saturation
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 9:51:17 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod1) to get started describing this model

And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear

conditional_effects(mod1, type = "link")

Plotting GAM smooth functions using mvgamPlotting GAM smooth functions using mvgam

Modelling dynamics with a correlated AR1

Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See ?AR for more details.

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  silent = 1
)

The summary for this model now contains information on the autoregressive and process error parameters for each time series:

summary(mod2, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 8) - 1
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(cor = TRUE)
#> 
#> 
#> N process models:
#> 3 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.7   2   2.4    1   558
#> 
#> Process model AR parameter estimates:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.74 0.89  0.99 1.01   515
#> ar1[2] 0.63 0.82  0.96 1.01   373
#> ar1[3] 0.87 0.96  1.00 1.01   487
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.25 0.33  0.47 1.01   477
#> sigma[2] 0.32 0.44  0.60 1.01   410
#> sigma[3] 0.19 0.26  0.37 1.01   361
#> 
#> Approximate significance of GAM process smooths:
#>            edf Ref.df Chi.sq p-value    
#> s(season) 2.24      6   19.6 3.2e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 7 of 2000 iterations ended with a divergence (0.35%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 9:52:08 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod2) to get started describing this model

We can plot the posteriors for these parameters, and for any other parameter for that matter, using bayesplot routines. First the autoregressive parameters:

mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

And now the variance (σ\sigma) parameters:

mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

We can again plot the conditional seasonal effect:

conditional_effects(mod2, type = "link")

Plotting latent Gaussian Process effects in mvgam and marginaleffects

The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts

Forecasting with the forecast() function

Probabilistic forecasts can be computed in two main ways in mvgam. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding newdata to the forecast() function. It is crucial that any newdata fed to the forecast() function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the forecast() function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see ?forecast.mvgam for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions

fc_mod1 <- forecast(mod1, newdata = simdat$data_test)
fc_mod2 <- forecast(mod2, newdata = simdat$data_test)

The objects we have created are of class mvgam_forecast, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data:

str(fc_mod1)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 6 2 7 11 8 6 9 11 7 4 ...
#>   ..$ series_2: int [1:75] NA 5 8 2 1 NA 2 4 0 2 ...
#>   ..$ series_3: int [1:75] 11 20 NA 36 44 34 57 50 26 28 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:25] 4 3 1 3 1 NA NA 7 9 8 ...
#>   ..$ series_2: int [1:25] 23 NA 20 20 14 7 6 6 6 1 ...
#>   ..$ series_3: int [1:25] 8 3 8 3 NA 1 1 9 8 NA ...
#>  $ test_times        : int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:75] 1 4 6 5 2 5 6 3 6 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:2000, 1:75] 4 6 3 7 4 9 5 5 7 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:2000, 1:75] 16 10 11 17 11 15 22 16 8 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:25] 1 2 3 1 3 6 1 3 0 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,1]" "ypred[77,1]" "ypred[78,1]" "ypred[79,1]" ...
#>   ..$ series_2: num [1:2000, 1:25] 20 19 30 27 23 40 27 23 22 39 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,2]" "ypred[77,2]" "ypred[78,2]" "ypred[79,2]" ...
#>   ..$ series_3: num [1:2000, 1:25] 1 5 2 3 2 2 2 1 0 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,3]" "ypred[77,3]" "ypred[78,3]" "ypred[79,3]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

We can plot the forecasts for some series from each model using the S3 plot method for objects of this class:

plot(fc_mod1, series = 1)

plot(fc_mod2, series = 1)


plot(fc_mod1, series = 2)

plot(fc_mod2, series = 2)

Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment.

Forecasting with newdata in mvgam()

The second way we can produce forecasts in mvgam is to feed the testing data directly to the mvgam() function as newdata. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the generated quantities block in Stan. As an example, we can refit mod2 but include the testing data for automatic forecasts:

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  newdata = simdat$data_test,
  silent = 2
)

Because the model already contains a forecast distribution, we do not need to feed newdata to the forecast() function:

fc_mod2 <- forecast(mod2)

The forecasts will be nearly identical to those calculated previously:

plot(fc_mod2, series = 1)

Plotting posterior forecast distributions using mvgam and R

Scoring forecast distributions

A primary purpose of the mvgam_forecast class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See ?mvgam::score.mvgam_forecast to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution.

crps_mod1 <- score(fc_mod1, score = "crps")
str(crps_mod1)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 1.014 0.817 0.343 0.96 0.276 ...
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_2  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 1.87 NA 7.22 16.71 20.02 ...
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 0 0 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_3  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 3.644 0.456 4.234 0.49 NA ...
#>   ..$ in_interval   : num [1:25] 0 1 0 1 NA 1 1 0 0 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 6.53 NA 11.8 18.16 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "sum_crps" "sum_crps" "sum_crps" "sum_crps" ...
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   1.0140790           1            0.9            1       crps
#> 2   0.8172095           1            0.9            2       crps
#> 3   0.3430390           1            0.9            3       crps
#> 4   0.9601160           1            0.9            4       crps
#> 5   0.2761875           1            0.9            5       crps
#> 6          NA          NA            0.9            6       crps
#> 7          NA          NA            0.9            7       crps
#> 8   6.1122768           0            0.9            8       crps
#> 9   8.2056642           0            0.9            9       crps
#> 10  7.3489420           0            0.9           10       crps
#> 11 21.3067160           0            0.9           11       crps
#> 12 35.2279993           0            0.9           12       crps
#> 13 37.2142593           0            0.9           13       crps
#> 14 36.2565240           0            0.9           14       crps
#> 15 39.3413357           0            0.9           15       crps
#> 16 42.2836615           0            0.9           16       crps
#> 17 42.4259613           0            0.9           17       crps
#> 18 12.6684920           0            0.9           18       crps
#> 19 13.7129962           0            0.9           19       crps
#> 20  9.6689547           0            0.9           20       crps
#> 21  4.7457437           0            0.9           21       crps
#> 22  4.7838565           0            0.9           22       crps
#> 23  2.7938578           0            0.9           23       crps
#> 24  0.8515640           1            0.9           24       crps
#> 25  3.7254275           0            0.9           25       crps

The returned list contains a data.frame for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the in_interval column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval:

crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6)
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   1.0140790           1            0.6            1       crps
#> 2   0.8172095           1            0.6            2       crps
#> 3   0.3430390           1            0.6            3       crps
#> 4   0.9601160           1            0.6            4       crps
#> 5   0.2761875           1            0.6            5       crps
#> 6          NA          NA            0.6            6       crps
#> 7          NA          NA            0.6            7       crps
#> 8   6.1122768           0            0.6            8       crps
#> 9   8.2056642           0            0.6            9       crps
#> 10  7.3489420           0            0.6           10       crps
#> 11 21.3067160           0            0.6           11       crps
#> 12 35.2279993           0            0.6           12       crps
#> 13 37.2142593           0            0.6           13       crps
#> 14 36.2565240           0            0.6           14       crps
#> 15 39.3413357           0            0.6           15       crps
#> 16 42.2836615           0            0.6           16       crps
#> 17 42.4259613           0            0.6           17       crps
#> 18 12.6684920           0            0.6           18       crps
#> 19 13.7129962           0            0.6           19       crps
#> 20  9.6689547           0            0.6           20       crps
#> 21  4.7457437           0            0.6           21       crps
#> 22  4.7838565           0            0.6           22       crps
#> 23  2.7938578           0            0.6           23       crps
#> 24  0.8515640           0            0.6           24       crps
#> 25  3.7254275           0            0.6           25       crps

We can also compare forecasts against out of sample observations using the Expected Log Predictive Density (ELPD; also known as the log score). The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the forecast() function:

link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link")
score(link_mod1, score = "elpd")$series_1
#>         score eval_horizon score_type
#> 1   -2.154963            1       elpd
#> 2   -1.951288            2       elpd
#> 3   -1.244906            3       elpd
#> 4   -2.188772            4       elpd
#> 5   -1.213424            5       elpd
#> 6          NA            6       elpd
#> 7          NA            7       elpd
#> 8   -6.953353            8       elpd
#> 9   -8.026647            9       elpd
#> 10  -7.606749           10       elpd
#> 11 -12.495600           11       elpd
#> 12 -16.939454           12       elpd
#> 13 -14.939413           13       elpd
#> 14 -14.029803           14       elpd
#> 15 -13.634550           15       elpd
#> 16 -12.172281           16       elpd
#> 17 -13.522049           17       elpd
#> 18  -8.960513           18       elpd
#> 19  -9.513524           19       elpd
#> 20  -7.717374           20       elpd
#> 21  -6.737752           21       elpd
#> 22  -6.851763           22       elpd
#> 23  -5.503431           23       elpd
#> 24  -3.002872           24       elpd
#> 25  -6.004690           25       elpd

Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. mvgam offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute:

energy_mod2 <- score(fc_mod2, score = "energy")
str(energy_mod2)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_2  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 1 1 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_3  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 NA 1 1 1 1 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 4.77 NA 5.06 5.34 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "energy" "energy" "energy" "energy" ...

The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the all_series slot):

energy_mod2$all_series
#>        score eval_horizon score_type
#> 1   4.768684            1     energy
#> 2         NA            2     energy
#> 3   5.062792            3     energy
#> 4   5.336665            4     energy
#> 5         NA            5     energy
#> 6         NA            6     energy
#> 7         NA            7     energy
#> 8   3.987317            8     energy
#> 9   4.279798            9     energy
#> 10        NA           10     energy
#> 11 13.503764           11     energy
#> 12 22.628631           12     energy
#> 13        NA           13     energy
#> 14 21.107137           14     energy
#> 15 24.172967           15     energy
#> 16 25.344728           16     energy
#> 17 28.418573           17     energy
#> 18  6.473433           18     energy
#> 19 10.862253           19     energy
#> 20  4.235033           20     energy
#> 21  2.968658           21     energy
#> 22  3.526743           22     energy
#> 23        NA           23     energy
#> 24  8.938257           24     energy
#> 25  7.794639           25     energy

You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (mod2) is better, while a positive value means the spline model (mod1) is better.

crps_mod1 <- score(fc_mod1, score = "crps")
crps_mod2 <- score(fc_mod2, score = "crps")

diff_scores <- crps_mod2$series_1$score -
  crps_mod1$series_1$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))



diff_scores <- crps_mod2$series_2$score -
  crps_mod1$series_2$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))


diff_scores <- crps_mod2$series_3$score -
  crps_mod1$series_3$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))

The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data

Further reading

The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules:

Clark N.J., Ernest S.K.M., Senyondo H., Simonis J., White E.P., Yenni G.M., Karunarathna K.A.N.K. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ 13:e18929 (2025) https://doi.org/10.7717/peerj.18929

Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018.

Gneiting, Tilmann, and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation Journal of the American statistical Association 102.477 (2007) 359-378.

Simonis, Juniper L., Ethan P. White, and SK Morgan Ernest. Evaluating probabilistic ecological forecasts Ecology 102.8 (2021) e03431.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/index.html ================================================ Articles • mvgam Skip to contents
================================================ FILE: docs/articles/mvgam_overview.html ================================================ Overview of the mvgam package • mvgam Skip to contents

The purpose of this vignette is to give a general overview of the mvgam package and its primary functions.

Dynamic GAMs

mvgam is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal trend evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: Ecological Forecasting with Dynamic Generalized Additive Models. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\) is the conditional expectation of response variable \(\boldsymbol{i}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y_i}\) is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as:

\[for~i~in~1:N_{series}~...\] \[for~t~in~1:N_{timepoints}~...\]

\[g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,\] Here \(\alpha\) are the unknown intercepts, the \(\boldsymbol{s}\)’s are unknown smooth functions of covariates (\(\boldsymbol{x}\)’s), which can potentially vary among the response series, and \(\boldsymbol{z}\) are dynamic latent processes. Each smooth function \(\boldsymbol{s_j}\) is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between \(\boldsymbol{x}_{j}\) and \(g^{-1}(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see this blogpost on how to interpret nonlinear effects from Generalized Additive Models. Latent processes are captured with \(\boldsymbol{Z}\boldsymbol{z}_{i,t}\), where \(\boldsymbol{Z}\) is an \(i~by~k\) matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and \(\boldsymbol{z}_{k,t}\) are a set of \(K\) latent factors that can also include their own GAM linear predictors (see the State-Space models vignette), the N-mixtures vignette and the example in jsdgam to get an idea of how flexible these processes can be.

Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. mvgam supports a number of different observation families, which are summarized below:

Supported observation families

Distribution Function Support Extra parameter(s)
Gaussian (identity link) gaussian() Real values in \((-\infty, \infty)\) \(\sigma\)
Student’s T (identity link) student-t() Heavy-tailed real values in \((-\infty, \infty)\) \(\sigma\), \(\nu\)
LogNormal (identity link) lognormal() Positive real values in \([0, \infty)\) \(\sigma\)
Gamma (log link) Gamma() Positive real values in \([0, \infty)\) \(\alpha\)
Beta (logit link) betar() Real values (proportional) in \([0,1]\) \(\phi\)
Bernoulli (logit link) bernoulli() Binary data in \({0,1}\) -
Poisson (log link) poisson() Non-negative integers in \((0,1,2,...)\) -
Negative Binomial2 (log link) nb() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Binomial (logit link) binomial() Non-negative integers in \((0,1,2,...)\) -
Beta-Binomial (logit link) beta_binomial() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Poisson Binomial N-mixture (log link) nmix() Non-negative integers in \((0,1,2,...)\) -

For all supported observation families, any extra parameters that need to be estimated (i.e. the \(\sigma\) in a Gaussian model or the \(\phi\) in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using share_obs_params = TRUE in mvgam(). Note that default link functions cannot currently be changed.

Supported temporal dynamic processes

As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the mvgam() function, the user chooses between different process models with the trend_model argument. Available process models are described in detail below.

Correlated multivariate processes

If more than one observational unit (usually referred to as ‘series’) is included in data \((N_{series} > 1)\), use trend_model = ZMVN() to set up a model where the outcomes for different observational units may be correlated according to:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*}\]

The covariance matrix \(\Sigma\) will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution. Note that this trend_model does not assume that measurements occur over time, as users can specify what variable in the data represents the unit of analysis (i.e. outcomes could be counts of different species across different sites or regions, for example; see `?ZMVN() for guidelines).

Independent Random Walks

Use trend_model = 'RW' or trend_model = RW() to set up a model where each series in data has independent latent temporal dynamics of the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*}\]

Process error parameters \(\sigma\) are modeled independently for each series. If a moving average process is required, use trend_model = RW(ma = TRUE) to set up the following:

\[\begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*}\]

Moving average coefficients \(\theta\) are independently estimated for each series and will be forced to be stationary by default \((abs(\theta)<1)\). Only moving averages of order \(q=1\) are currently allowed.

Multivariate Random Walks

If more than one series is included in data \((N_{series} > 1)\), a multivariate Random Walk can be set up using trend_model = RW(cor = TRUE), resulting in the following:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*}\]

Where the latent process estimate \(z_t\) now takes the form of a vector. The covariance matrix \(\Sigma\) will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution.

Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients \(\theta\) will be parameterised as an \(N_{series} * N_{series}\) matrix

Autoregressive processes

Autoregressive models up to \(p=3\), in which the autoregressive coefficients are estimated independently for each series, can be used by specifying trend_model = 'AR1', trend_model = 'AR2', trend_model = 'AR3', or trend_model = AR(p = 1, 2, or 3). For example, a univariate AR(1) model takes the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*}\]

All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See ?get_mvgam_priors for more details.

Vector Autoregressive processes

A Vector Autoregression of order \(p=1\) can be specified if \(N_{series} > 1\) using trend_model = 'VAR1' or trend_model = VAR(). A VAR(1) model takes the form:

\[\begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*}\]

Where \(A\) is an \(N_{series} * N_{series}\) matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time \(t\) on its own estimate at time \(t+1\)), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time \(t\) on the process for another series at time \(t+1\)). By default, the covariance matrix \(\Sigma\) will assume no process error covariance by fixing the off-diagonals to \(0\). To allow for correlated errors, use trend_model = 'VAR1cor' or trend_model = VAR(cor = TRUE). A moving average of order \(q=1\) can also be included using trend_model = VAR(ma = TRUE, cor = TRUE).

Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in Heaps 2022

Heaps, Sarah E. “Enforcing stationarity through the prior in vector autoregressions.Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hierarchical processes

Several of the above-mentioned trend_model options can be modified to account for grouping structures in data by setting up hierarchical latent processes. If an optional grouping variable (gr; which must be a factor in the supplied data) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of gr are modelled hierarchically:

\[\begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*}\]

where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) (i.e. the derived correlation matrix that will be used for each level of the grouping factor gr) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. financial assets or species, for example) are measured in different strata (i.e. regions, countries or experimental units, for example). Currently hierarchical correlations can be included for AR(), VAR() or ZMVN() trend_model options.

Gaussian Processes

The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using trend_model = 'GP'. The dynamics for each latent process are modelled as:

\[\begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*}\]

The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on \(\rho\) (often called the length scale parameter) to control how quickly the correlations between the model’s errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter \(\alpha\), which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. mvgam capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which considerably speed up computation at little cost to accuracy or prediction performance.

Modeling growth for many types of time series is often similar to modeling population growth in natural ecosystems, where there series exhibits nonlinear growth that saturates at some particular carrying capacity. The logistic trend model available in {mvgam} allows for a time-varying capacity \(C(t)\) as well as a non-constant growth rate. Changes in the base growth rate \(k\) are incorporated by explicitly defining changepoints throughout the training period where the growth rate is allowed to vary. The changepoint vector \(a\) is represented as a vector of 1s and 0s, and the rate of growth at time \(t\) is represented as \(k+a(t)^T\delta\). Potential changepoints are selected uniformly across the training period, and the number of changepoints, as well as the flexibility of the potential rate changes at these changepoints, can be controlled using trend_model = PW(). The full piecewise logistic growth model is then:

\[\begin{align*} z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*}\]

For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as:

\[\begin{align*} z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*}\]

In both trend models, \(m\) is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends in this paper by Taylor and Letham.

Sean J. Taylor and Benjamin Letham. “Forecasting at scale.The American Statistician 72.1 (2018): 37-45.

Continuous time AR(1) processes

Most trend models in the mvgam() function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we’d like to model autoregressive properties of these. The trend_model = CAR() can be useful to set up these models, which currently only support autoregressive processes of order 1. The evolution of the latent dynamic process follows the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}\]

Where \(distance\) is a vector of non-negative measurements of the time differences between successive observations. See the Examples section in ?CAR for an illustration of how to set these models up.

Regression formulae

mvgam supports an observation model regression formula, built off the mgcv package, as well as an optional process model regression formula. The formulae supplied to are exactly like those supplied to glm() except that smooth terms, s(), te(), ti() and t2(), time-varying effects using dynamic(), monotonically increasing (using s(x, bs = 'moi')) or decreasing splines (using s(x, bs = 'mod'); see ?smooth.construct.moi.smooth.spec for details), as well as Gaussian Process functions using gp(), can be added to the right hand side (and . is not supported in mvgam formulae). See ?mvgam_formulae for more guidance.

For setting up State-Space models, the optional process model formula can be used (see the State-Space model vignette and the shared latent states vignette for guidance on using trend formulae).

Example time series data

The ‘portal_data’ object contains time series of rodent captures from the Portal Project, a long-term monitoring study based near the town of Portal, Arizona. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970’s. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc…). You can read about the full sampling protocol in this preprint by Ernest et al on the Biorxiv.

data("portal_data")

As the data come pre-loaded with the mvgam package, you can read a little about it in the help page using ?portal_data. Before working with data, it is important to inspect how the data are structured, first using head:

head(portal_data)
#>   moon DM DO PP OT year month mintemp precipitation     ndvi
#> 1  329 10  6  0  2 2004     1  -9.710          37.8 1.465889
#> 2  330 14  8  1  0 2004     2  -5.924           8.7 1.558507
#> 3  331  9  1  2  1 2004     3  -0.220          43.5 1.337817
#> 4  332 NA NA NA NA 2004     4   1.931          23.9 1.658913
#> 5  333 15  8 10  1 2004     5   6.568           0.9 1.853656
#> 6  334 NA NA NA NA 2004     6  11.590           1.4 1.761330

But the glimpse function in dplyr is also useful for understanding how variables are structured

dplyr::glimpse(portal_data)
#> Rows: 199
#> Columns: 10
#> $ moon          <int> 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 3…
#> $ DM            <int> 10, 14, 9, NA, 15, NA, NA, 9, 5, 8, NA, 14, 7, NA, NA, 9…
#> $ DO            <int> 6, 8, 1, NA, 8, NA, NA, 3, 3, 4, NA, 3, 8, NA, NA, 3, NA
#> $ PP            <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 1…
#> $ OT            <int> 2, 0, 1, NA, 1, NA, NA, 1, 0, 0, NA, 2, 1, NA, NA, 1, NA
#> $ year          <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
#> $ month         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,…
#> $ mintemp       <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16…
#> $ precipitation <dbl> 37.8, 8.7, 43.5, 23.9, 0.9, 1.4, 20.3, 91.0, 60.5, 25.2,…
#> $ ndvi          <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1…

We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse Chaetodipus penicillatus. This species is interesting in that it goes into a kind of “hibernation” during the colder months, leading to very low captures during the winter period

Manipulating data for modelling

Manipulating the data into a ‘long’ format is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the sim_mvgam function. See ?sim_mvgam for more details

data <- sim_mvgam(n_series = 4, T = 24)
head(data$data_train, 12)
#>    y season year   series time
#> 1  3      1    1 series_1    1
#> 2  3      1    1 series_2    1
#> 3  1      1    1 series_3    1
#> 4  3      1    1 series_4    1
#> 5  1      2    1 series_1    2
#> 6  0      2    1 series_2    2
#> 7  2      2    1 series_3    2
#> 8  3      2    1 series_4    2
#> 9  2      3    1 series_1    3
#> 10 0      3    1 series_2    3
#> 11 1      3    1 series_3    3
#> 12 0      3    1 series_4    3

Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data. We also must supply a variable labelled time to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the State-Space vignette. Below are the steps needed to shape our portal_data object into the correct form. First, we create a time variable, select the column representing counts of our target species (PP), and select appropriate variables that we can use as predictors

portal_data %>%
  
  # mvgam requires a 'time' variable be present in the data to index
  # the temporal observations. This is especially important when tracking 
  # multiple time series. In the Portal data, the 'moon' variable indexes the
  # lunar monthly timestep of the trapping sessions
  dplyr::mutate(time = moon - (min(moon)) + 1) %>%
  
  # We can also provide a more informative name for the outcome variable, which 
  # is counts of the 'PP' species (Chaetodipus penicillatus) across all control
  # plots
  dplyr::mutate(count = PP) %>%
  
  # The other requirement for mvgam is a 'series' variable, which needs to be a
  # factor variable to index which time series each row in the data belongs to.
  # Again, this is more useful when you have multiple time series in the data
  dplyr::mutate(series = as.factor('PP')) %>%
  
  # Select the variables of interest to keep in the model_data
  dplyr::select(series, year, time, count, mintemp, ndvi) -> model_data

The data now contain six variables:
series, a factor indexing which time series each observation belongs to
year, the year of sampling
time, the indicator of which time step each observation belongs to
count, the response variable representing the number of captures of the species PP in each sampling observation
mintemp, the monthly average minimum temperature at each time step
ndvi, the monthly average Normalized Difference Vegetation Index at each time step

Now check the data structure again

head(model_data)
#>   series year time count mintemp     ndvi
#> 1     PP 2004    1     0  -9.710 1.465889
#> 2     PP 2004    2     1  -5.924 1.558507
#> 3     PP 2004    3     2  -0.220 1.337817
#> 4     PP 2004    4    NA   1.931 1.658913
#> 5     PP 2004    5    10   6.568 1.853656
#> 6     PP 2004    6    NA  11.590 1.761330
dplyr::glimpse(model_data)
#> Rows: 199
#> Columns: 6
#> $ series  <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP…
#> $ year    <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
#> $ time    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
#> $ count   <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA,…
#> $ mintemp <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520, …
#> $ ndvi    <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.76132…

You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values

summary(model_data)
#>  series        year           time           count          mintemp       
#>  PP:199   Min.   :2004   Min.   :  1.0   Min.   : 0.00   Min.   :-24.000  
#>           1st Qu.:2008   1st Qu.: 50.5   1st Qu.: 2.50   1st Qu.: -3.884  
#>           Median :2012   Median :100.0   Median :12.00   Median :  2.130  
#>           Mean   :2012   Mean   :100.0   Mean   :15.14   Mean   :  3.504  
#>           3rd Qu.:2016   3rd Qu.:149.5   3rd Qu.:24.00   3rd Qu.: 12.310  
#>           Max.   :2020   Max.   :199.0   Max.   :65.00   Max.   : 18.140  
#>                                          NA's   :36                       
#>       ndvi       
#>  Min.   :0.2817  
#>  1st Qu.:1.0741  
#>  Median :1.3501  
#>  Mean   :1.4709  
#>  3rd Qu.:1.8178  
#>  Max.   :3.9126  
#> 

We have some NAs in our response variable count. These observations will generally be thrown out by most modelling packages in . But as you will see when we work through the tutorials, mvgam keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using plot_mvgam_series():

plot_mvgam_series(data = model_data, series = 1, y = 'count')

GLMs with temporal random effects

Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our count observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for year. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc…). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert year to a factor so that we can use a random effect basis in mvgam. See ?smooth.terms and ?smooth.construct.re.smooth.spec for details about the re basis construction that is used by both mvgam and mgcv

model_data %>%
  
  # Create a 'year_fac' factor version of 'year'
  dplyr::mutate(year_fac = factor(year)) -> model_data

Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data

dplyr::glimpse(model_data)
#> Rows: 199
#> Columns: 7
#> $ series   <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, P…
#> $ year     <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2…
#> $ time     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ count    <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA
#> $ mintemp  <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520,…
#> $ ndvi     <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.7613…
#> $ year_fac <fct> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2…
levels(model_data$year_fac)
#>  [1] "2004" "2005" "2006" "2007" "2008" "2009" "2010" "2011" "2012" "2013"
#> [11] "2014" "2015" "2016" "2017" "2018" "2019" "2020"

We are now ready for our first mvgam model. The syntax will be familiar to users who have previously built models with mgcv. But for a refresher, see ?formula.gam and the examples in ?gam. Random effects can be specified using the s wrapper with the re basis. Note that we can also suppress the primary intercept using the usual R formula syntax - 1. mvgam has a number of possible observation families that can be used, see ?mvgam_families for more information. We will use Stan as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the Cmdstan backend when possible, so it is recommended that users have an up-to-date installation of Cmdstan and the associated cmdstanr interface on their machines (note that you can set the backend yourself using the backend argument: see ?mvgam for details). Interested users should consult the Stan user’s guide for more information about the software and the enormous variety of models that can be tackled with HMC.

model1 <- mvgam(count ~ s(year_fac, bs = 're') - 1,
                family = poisson(),
                data = model_data)

The model can be described mathematically for each timepoint \(t\) as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*}\]

Where the \(\beta_{year}\) effects are drawn from a population distribution that is parameterized by a common mean \((\mu_{year})\) and variance \((\sigma_{year})\). Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in brms. For example, the default priors on \((\mu_{year})\) and \((\sigma_{year})\) can be viewed using the following code:

get_mvgam_priors(count ~ s(year_fac, bs = 're') - 1,
                 family = poisson(),
                 data = model_data)
#>                      param_name param_length           param_info
#> 1             vector[1] mu_raw;            1 s(year_fac) pop mean
#> 2 vector<lower=0>[1] sigma_raw;            1   s(year_fac) pop sd
#>                               prior                 example_change
#> 1            mu_raw ~ std_normal();  mu_raw ~ normal(-0.52, 0.32);
#> 2 sigma_raw ~ student_t(3, 0, 2.5); sigma_raw ~ exponential(0.53);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

See examples in ?get_mvgam_priors to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the summary to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters

summary(model1)
#> GAM formula:
#> count ~ s(year_fac, bs = "re") - 1
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 199 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                 2.5% 50% 97.5% Rhat n_eff
#> s(year_fac).1   1.80 2.1   2.3 1.00  2952
#> s(year_fac).2   2.50 2.7   2.8 1.00  2499
#> s(year_fac).3   3.00 3.1   3.2 1.00  2450
#> s(year_fac).4   3.10 3.3   3.4 1.00  2896
#> s(year_fac).5   1.90 2.1   2.3 1.00  3131
#> s(year_fac).6   1.50 1.8   2.0 1.00  2330
#> s(year_fac).7   1.80 2.0   2.3 1.00  2578
#> s(year_fac).8   2.80 3.0   3.1 1.00  2998
#> s(year_fac).9   3.10 3.2   3.4 1.00  2444
#> s(year_fac).10  2.60 2.8   2.9 1.00  2269
#> s(year_fac).11  2.90 3.1   3.2 1.00  3104
#> s(year_fac).12  3.10 3.2   3.3 1.00  2814
#> s(year_fac).13  2.00 2.3   2.5 1.00  2584
#> s(year_fac).14  2.50 2.6   2.8 1.00  2500
#> s(year_fac).15  1.90 2.2   2.4 1.00  2601
#> s(year_fac).16  1.90 2.1   2.3 1.00  2783
#> s(year_fac).17 -0.29 1.0   1.9 1.01   590
#> 
#> GAM group-level estimates:
#>                   2.5%  50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 2.10 2.40   2.7 1.02   160
#> sd(s(year_fac))   0.46 0.68   1.1 1.01   223
#> 
#> Approximate significance of GAM smooths:
#>              edf Ref.df Chi.sq p-value    
#> s(year_fac) 12.8     17   1758  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:05:09 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(model1) to get started describing this model

The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class brmsfit can (see ?mvgam::mvgam_draws for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the \(\beta\)’s) into a data.frame using:

beta_post <- as.data.frame(model1, variable = 'betas')
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 17
#> $ `s(year_fac).1`  <dbl> 2.24750, 1.90580, 2.05282, 2.13643, 2.00244, 2.05844,…
#> $ `s(year_fac).2`  <dbl> 2.70431, 2.65140, 2.64873, 2.69983, 2.59428, 2.72077,…
#> $ `s(year_fac).3`  <dbl> 3.08997, 3.01857, 3.14596, 3.00698, 3.12615, 3.09019,…
#> $ `s(year_fac).4`  <dbl> 3.32873, 3.27549, 3.22856, 3.24602, 3.21468, 3.21990,…
#> $ `s(year_fac).5`  <dbl> 2.11989, 2.18383, 2.10280, 1.97372, 2.13022, 2.12983,…
#> $ `s(year_fac).6`  <dbl> 1.89307, 1.70960, 1.72953, 1.68534, 1.77066, 1.73519,…
#> $ `s(year_fac).7`  <dbl> 2.06631, 2.11429, 1.88207, 2.08131, 2.06614, 1.92626,…
#> $ `s(year_fac).8`  <dbl> 2.90509, 2.99382, 2.93212, 3.04373, 2.82881, 3.04845,…
#> $ `s(year_fac).9`  <dbl> 3.27292, 3.31954, 3.12765, 3.20833, 3.28495, 3.23128,…
#> $ `s(year_fac).10` <dbl> 2.78318, 2.67944, 2.78202, 2.69068, 2.70938, 2.80262,…
#> $ `s(year_fac).11` <dbl> 3.00746, 2.98039, 3.14399, 2.98110, 3.07569, 3.05284,…
#> $ `s(year_fac).12` <dbl> 3.20087, 3.14174, 3.24543, 3.29494, 3.19153, 3.17930,…
#> $ `s(year_fac).13` <dbl> 2.36027, 2.16231, 2.25140, 2.14675, 2.23117, 2.29141,…
#> $ `s(year_fac).14` <dbl> 2.61637, 2.65665, 2.67817, 2.72201, 2.69925, 2.58293,…
#> $ `s(year_fac).15` <dbl> 2.10499, 2.30554, 2.46154, 2.25384, 2.02545, 2.29679,…
#> $ `s(year_fac).16` <dbl> 2.11230, 2.07353, 2.01463, 1.95542, 2.10797, 2.02406,…
#> $ `s(year_fac).17` <dbl> 1.2708600, 0.9398070, 0.9854610, 1.0112900, 1.5607200…

With any model fitted in mvgam, the underlying Stan code can be viewed using the code function:

code(model1)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 17] = mu_raw[1] + b_raw[1 : 17] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior (non-centred) for s(year_fac)...
#>   b_raw[1 : 17] ~ std_normal();
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   array[n, n_series] int ypred;
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Plotting effects and residuals

Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using plot.mvgam with type = 're'. See ?plot.mvgam for more details about the types of plots that can be produced from fitted mvgam objects

plot(model1, type = 're')

bayesplot support

We can also capitalize on most of the useful MCMC plotting functions from the bayesplot package to visualize posterior distributions and diagnostics (see ?mvgam::mcmc_plot.mvgam for details):

mcmc_plot(object = model1,
          variable = 'betas',
          type = 'areas')

We can also use the wide range of posterior checking functions available in bayesplot (see ?mvgam::ppc_check.mvgam for details):

pp_check(object = model1)

There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using plot.mvgam with type = 'forecast'

plot(model1, type = 'forecast')

If you wish to extract these hindcasts for other downstream analyses, the hindcast function can be used. This will return a list object of class mvgam_forecast. In the hindcasts slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example):

hc <- hindcast(model1)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr "PP"
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:199] 0 1 2 NA 10 NA NA 16 18 12 ...
#>  $ train_times       : num [1:199] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:199] 6 5 8 15 11 14 10 12 6 3 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:199] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"

You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation:

hc <- hindcast(model1, type = 'link')
range(hc$hindcasts$PP)
#> [1] -1.42247  3.50620

In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the Dunn-Smyth, or randomized quantile, residual. Inspect Dunn-Smyth residuals from the model using plot.mvgam with type = 'residuals'

plot(model1, type = 'residuals')

Automatic forecasting for new data

These temporal random effects do not have a sense of “time”. Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as newdata. For splitting, we will make use of the filter function from dplyr

model_data %>% 
  dplyr::filter(time <= 160) -> data_train 
model_data %>% 
  dplyr::filter(time > 160) -> data_test
model1b <- mvgam(count ~ s(year_fac, bs = 're') - 1,
                 family = poisson(),
                 data = data_train,
                 newdata = data_test)

We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set

plot(model1b, type = 'forecast', newdata = data_test)

As with the hindcast function, we can use the forecast function to automatically extract the posterior distributions for these predictions. This also returns an object of class mvgam_forecast, but now it will contain both the hindcasts and forecasts for each series in the data:

fc <- forecast(model1b)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 1 level "PP": 1
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:160] 0 1 2 NA 10 NA NA 16 18 12 ...
#>  $ train_times       : num [1:160] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 1
#>   ..$ PP: int [1:39] NA 0 0 10 3 14 18 NA 28 46 ...
#>  $ test_times        : num [1:39] 161 162 163 164 165 166 167 168 169 170 ...
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:160] 8 4 11 6 10 7 4 13 2 9 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:160] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         :List of 1
#>   ..$ PP: num [1:2000, 1:39] 8 10 11 10 7 9 12 8 10 5 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:39] "ypred[161,1]" "ypred[162,1]" "ypred[163,1]" "ypred[164,1]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

Adding predictors as “fixed” effects

Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of ndvi as a linear predictor:

model2 <- mvgam(count ~ s(year_fac, bs = 're') + 
                  ndvi - 1,
                family = poisson(),
                data = data_train,
                newdata = data_test)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the \(\beta_{year}\) effects are the same as before but we now have another predictor \((\beta_{ndvi})\) that applies to the ndvi value at each timepoint \(t\). Inspect the summary of this model

summary(model2)
#> GAM formula:
#> count ~ ndvi + s(year_fac, bs = "re") - 1
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 199 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                2.5%  50% 97.5% Rhat n_eff
#> ndvi           0.33 0.39  0.45    1  1642
#> s(year_fac).1  1.10 1.40  1.60    1  2513
#> s(year_fac).2  1.80 2.00  2.20    1  2354
#> s(year_fac).3  2.20 2.40  2.60    1  1945
#> s(year_fac).4  2.30 2.50  2.70    1  1837
#> s(year_fac).5  1.20 1.40  1.60    1  2175
#> s(year_fac).6  1.00 1.30  1.50    1  2603
#> s(year_fac).7  1.10 1.40  1.70    1  2394
#> s(year_fac).8  2.10 2.30  2.50    1  2341
#> s(year_fac).9  2.70 2.90  3.00    1  1992
#> s(year_fac).10 2.00 2.20  2.40    1  2523
#> s(year_fac).11 2.30 2.40  2.60    1  2090
#> s(year_fac).12 2.50 2.70  2.80    1  1974
#> s(year_fac).13 1.40 1.60  1.80    1  2247
#> s(year_fac).14 0.43 1.90  3.30    1  1820
#> s(year_fac).15 0.57 1.90  3.20    1  1370
#> s(year_fac).16 0.66 2.00  3.30    1  1308
#> s(year_fac).17 0.63 2.00  3.30    1  1658
#> 
#> GAM group-level estimates:
#>                   2.5%  50% 97.5% Rhat n_eff
#> mean(s(year_fac))  1.5 2.00   2.3 1.00   424
#> sd(s(year_fac))    0.4 0.61   1.0 1.01   388
#> 
#> Approximate significance of GAM smooths:
#>              edf Ref.df Chi.sq p-value    
#> s(year_fac) 10.8     17    275  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:05:57 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(model2) to get started describing this model

Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of ndvi (and other linear predictor coefficients) using coef:

coef(model2)
#>                     2.5%       50%     97.5% Rhat n_eff
#> ndvi           0.3256825 0.3899885 0.4531249    1  1642
#> s(year_fac).1  1.1327712 1.4036350 1.6445010    1  2513
#> s(year_fac).2  1.8074200 1.9992950 2.1872280    1  2354
#> s(year_fac).3  2.1875958 2.3805100 2.5645070    1  1945
#> s(year_fac).4  2.3322185 2.5112900 2.6777932    1  1837
#> s(year_fac).5  1.1962677 1.4209350 1.6386312    1  2175
#> s(year_fac).6  1.0228613 1.2804750 1.5080025    1  2603
#> s(year_fac).7  1.1369240 1.4162050 1.6712360    1  2394
#> s(year_fac).8  2.0874303 2.2704450 2.4544053    1  2341
#> s(year_fac).9  2.7122645 2.8535800 2.9892530    1  1992
#> s(year_fac).10 1.9949905 2.1855350 2.3791877    1  2523
#> s(year_fac).11 2.2643445 2.4402900 2.5967460    1  2090
#> s(year_fac).12 2.5413305 2.6920250 2.8385445    1  1974
#> s(year_fac).13 1.3730840 1.6154300 1.8469810    1  2247
#> s(year_fac).14 0.4345990 1.9269100 3.2774943    1  1820
#> s(year_fac).15 0.5703857 1.9499800 3.2463585    1  1370
#> s(year_fac).16 0.6586234 1.9673050 3.2961680    1  1308
#> s(year_fac).17 0.6262948 1.9681500 3.2743295    1  1658

Look at the estimated effect of ndvi using using a histogram. This can be done by first extracting the posterior coefficients:

beta_post <- as.data.frame(model2, variable = 'betas')
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 18
#> $ ndvi             <dbl> 0.314106, 0.462773, 0.362485, 0.437628, 0.441784, 0.3…
#> $ `s(year_fac).1`  <dbl> 1.58746, 1.48534, 1.59421, 1.13806, 1.10479, 1.54879,…
#> $ `s(year_fac).2`  <dbl> 2.13317, 1.86345, 2.15433, 2.01243, 2.02203, 1.97593,…
#> $ `s(year_fac).3`  <dbl> 2.54934, 2.29645, 2.47957, 2.26229, 2.29450, 2.49129,…
#> $ `s(year_fac).4`  <dbl> 2.58859, 2.48111, 2.57932, 2.41897, 2.42637, 2.57229,…
#> $ `s(year_fac).5`  <dbl> 1.59800, 1.30210, 1.56847, 1.56592, 1.56435, 1.27456,…
#> $ `s(year_fac).6`  <dbl> 1.25149, 1.37874, 1.35929, 1.14402, 1.17067, 1.42695,…
#> $ `s(year_fac).7`  <dbl> 1.53079, 1.22207, 1.45129, 1.51042, 1.49857, 1.41118,…
#> $ `s(year_fac).8`  <dbl> 2.38441, 2.14815, 2.42610, 2.25680, 2.27784, 2.25634,…
#> $ `s(year_fac).9`  <dbl> 3.01334, 2.74536, 2.85456, 2.74256, 2.77100, 2.90889,…
#> $ `s(year_fac).10` <dbl> 2.14233, 2.22449, 2.20035, 2.05167, 2.06097, 2.26014,…
#> $ `s(year_fac).11` <dbl> 2.61197, 2.25615, 2.47429, 2.32546, 2.35079, 2.50332,…
#> $ `s(year_fac).12` <dbl> 2.78978, 2.66717, 2.75505, 2.54814, 2.51411, 2.67491,…
#> $ `s(year_fac).13` <dbl> 1.83302, 1.27570, 1.59194, 1.46359, 1.46365, 1.70662,…
#> $ `s(year_fac).14` <dbl> 2.022070, 2.091280, 2.013660, 2.490420, 2.497030, 2.5…
#> $ `s(year_fac).15` <dbl> 1.941950, 2.604400, 1.844450, 2.683310, 2.642490, 1.6…
#> $ `s(year_fac).16` <dbl> 0.841668, 4.271900, 1.890010, 2.719500, 2.730760, 1.5…
#> $ `s(year_fac).17` <dbl> 2.27782, 1.88689, 1.96082, 2.57778, 2.63135, 2.31181,…

The posterior distribution for the effect of ndvi is stored in the ndvi column. A quick histogram confirms our inference that log(counts) respond positively to increases in ndvi:

hist(beta_post$ndvi,
     xlim = c(-1 * max(abs(beta_post$ndvi)),
              max(abs(beta_post$ndvi))),
     col = 'darkred',
     border = 'white',
     xlab = expression(beta[NDVI]),
     ylab = '',
     yaxt = 'n',
     main = '',
     lwd = 2)
abline(v = 0, lwd = 2.5)

marginaleffects support

Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the marginaleffects package makes this relatively straightforward. Objects of class mvgam can be used with marginaleffects to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like brms, mvgam has the simple conditional_effects function to make quick and informative plots for main effects, which rely on marginaleffects support. This will likely be your go-to function for quickly understanding patterns from fitted mvgam models

Adding predictors as smooths

Smooth functions, using penalized splines, are a major feature of mvgam. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package mgcv will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We’ll create a set of 6 basis functions to represent possible variation in the effect of time on our outcome.In addition to constructing the basis functions, mgcv also creates a penalty matrix \(S\), which contains known coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters (\(\lambda\)) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the \(\lambda\)’s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of time. We will need a reasonably complex function (large k) to try and accommodate the temporal variation in our observations. Following some useful advice by Gavin Simpson, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no -1 in the formula now):

model3 <- mvgam(count ~ s(time, bs = 'bs', k = 15) + 
                  ndvi,
                family = poisson(),
                data = data_train,
                newdata = data_test)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the smooth function \(f_{time}\) is built by summing across a set of weighted basis functions. The basis functions \((b)\) are constructed using a thin plate regression basis in mgcv. The weights \((\beta_{smooth})\) are drawn from a penalized multivariate normal distribution where the precision matrix \((\Omega\)) is multiplied by a smoothing penalty \((\lambda)\). If \(\lambda\) becomes large, this acts to squeeze the covariances among the weights \((\beta_{smooth})\), leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before

summary(model3)
#> GAM formula:
#> count ~ s(time, bs = "bs", k = 15) + ndvi
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 199 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>              2.5%   50%  97.5% Rhat n_eff
#> (Intercept)  2.00  2.10  2.200 1.00   871
#> ndvi         0.26  0.33  0.400 1.00   909
#> s(time).1   -2.20 -1.10 -0.056 1.01   512
#> s(time).2    0.44  1.30  2.300 1.02   337
#> s(time).3   -0.55  0.45  1.500 1.02   320
#> s(time).4    1.60  2.50  3.400 1.02   320
#> s(time).5   -1.20 -0.20  0.770 1.02   320
#> s(time).6   -0.60  0.37  1.400 1.02   363
#> s(time).7   -1.50 -0.52  0.460 1.02   347
#> s(time).8    0.53  1.50  2.500 1.02   331
#> s(time).9    1.20  2.10  3.100 1.02   309
#> s(time).10  -0.41  0.54  1.500 1.02   318
#> s(time).11   0.82  1.80  2.800 1.02   311
#> s(time).12   0.62  1.50  2.400 1.02   299
#> s(time).13  -1.20 -0.31  0.610 1.01   455
#> s(time).14  -7.50 -4.20 -1.100 1.02   447
#> 
#> Approximate significance of GAM smooths:
#>          edf Ref.df Chi.sq p-value    
#> s(time) 9.98     14   64.9  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:06:25 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(model3) to get started describing this model

The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of time. We can visualize conditional_effects as before:

conditional_effects(model3, type = 'link')

Inspect the underlying Stan code to gain some idea of how the spline is being penalized:

code(model3)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[14, 28] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 2.6, 2.5);
#>   
#>   // prior for ndvi...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(time)...
#>   b_raw[3 : 16] ~ multi_normal_prec(zero[3 : 16],
#>                                     S1[1 : 14, 1 : 14] * lambda[1]
#>                                     + S1[1 : 14, 15 : 28] * lambda[2]);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

The line below // prior for s(time)... shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix \(S\) is penalized by two different smoothing parameters (the \(\lambda\)’s) to enforce smoothness and reduce overfitting

Latent dynamics in mvgam

Forecasts from the above model are not ideal:

plot(model3, type = 'forecast', newdata = data_test)

Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions:

plot_mvgam_smooth(model3, smooth = 's(time)',
                  # feed newdata to the plot function to generate
                  # predictions of the temporal smooth to the end of the 
                  # testing period
                  newdata = data.frame(time = 1:max(data_test$time),
                                       ndvi = 0))
abline(v = max(data_train$time), lty = 'dashed', lwd = 2)

This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of time. Now onto another prominent feature of mvgam: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the trend_model argument (see ?mvgam_trends for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of ndvi in this model, rather than the parametric term that was used above, to showcase that mvgam can include combinations of smooths and dynamic components:

model4 <- mvgam(count ~ s(ndvi, k = 6),
                family = poisson(),
                data = data_train,
                newdata = data_test,
                trend_model = 'AR1')

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*}\]

Here the term \(z_t\) captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process:

summary(model4)
#> GAM formula:
#> count ~ s(ndvi, k = 6)
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR1
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 199 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%     50% 97.5% Rhat n_eff
#> (Intercept)  1.200  2.1000  2.70 1.11    44
#> s(ndvi).1   -0.087  0.0099  0.18 1.00   539
#> s(ndvi).2   -0.200  0.0160  0.30 1.00   655
#> s(ndvi).3   -0.064 -0.0018  0.05 1.01   541
#> s(ndvi).4   -0.270  0.1200  1.30 1.00   323
#> s(ndvi).5   -0.065  0.1500  0.36 1.01   485
#> 
#> Approximate significance of GAM smooths:
#>          edf Ref.df Chi.sq p-value
#> s(ndvi) 1.35      5   8.16    0.98
#> 
#> Latent trend parameter AR estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> ar1[1]   0.69 0.81  0.92 1.00   245
#> sigma[1] 0.68 0.80  0.95 1.01   466
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 107 parameters
#>  *Diagnose further to investigate why the chains have not mixed
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:07:19 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(model4) to get started describing this model

View posterior hindcasts / forecasts and compare against the out of sample test data

plot(model4, type = 'forecast', newdata = data_test)

The trend is evolving as an AR1 process, which we can also view:

plot(model4, type = 'trend', newdata = data_test)

In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the loo package (a higher value is preferred for this metric):

loo_compare(model3, model4)
#>        elpd_diff se_diff
#> model4    0.0       0.0 
#> model3 -559.7      66.8

The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data.

Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the forecast and score functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric)

fc_mod3 <- forecast(model3)
fc_mod4 <- forecast(model4)
score_mod3 <- score(fc_mod3, score = 'drps')
score_mod4 <- score(fc_mod4, score = 'drps')
sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE)
#> [1] -128.6068

A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3)

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/nmixtures.html ================================================ N-mixtures in mvgam • mvgam Skip to contents

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection.

N-mixture models

An N-mixture model is a fairly recent addition to the ecological modeller’s toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect (Royle 2004). Briefly, assume \(\boldsymbol{Y_{i,r}}\) is the number of individuals recorded at site \(i\) during replicate sampling observation \(r\) (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren’t perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true “latent” abundance \(N\) and a detection probability \(p\):

\[\begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*}\]

Using a set of linear predictors, we can estimate effects of covariates \(\boldsymbol{X}\) on the expected latent abundance (with a log link for \(\lambda\)) and, jointly, effects of possibly different covariates (call them \(\boldsymbol{Q}\)) on detection probability (with a logit link for \(p\)):

\[\begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*}\]

mvgam can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the “true” but unknown population size. This is very convenient because we can incorporate any of the package’s diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc…) into the linear predictors. All that is required for this to work is a marginalization trick that allows Stan’s sampling algorithms to handle discrete parameters (see more about how this method of “integrating out” discrete parameters works in this nice blog post by Maxwell Joseph).

The family nmix() is used to set up N-mixture models in mvgam, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation series and trend_map arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a cap variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done.

First we will use a simple simulation in which multiple replicate observations are taken at each timepoint for two different species. The simulation produces observations at a single site over six years, with five replicate surveys per year. Each species is simulated to have different nonlinear temporal trends and different detection probabilities. For now, detection probability is fixed (i.e. it does not change over time or in association with any covariates). Notice that we add the cap variable, which does not need to be static, to define the maximum possible value that we think the latent abundance could be for each timepoint. This simply needs to be large enough that we get a reasonable idea of which latent N values are most likely, without adding too much computational cost:

set.seed(999)
# Simulate observations for species 1, which shows a declining trend and 0.7 detection probability
data.frame(site = 1,
           # five replicates per year; six years
           replicate = rep(1:5, 6),
           time = sort(rep(1:6, 5)),
           species = 'sp_1',
           # true abundance declines nonlinearly
           truth = c(rep(28, 5),
                     rep(26, 5),
                     rep(23, 5),
                     rep(16, 5),
                     rep(14, 5),
                     rep(14, 5)),
           # observations are taken with detection prob = 0.7
           obs = c(rbinom(5, 28, 0.7),
                   rbinom(5, 26, 0.7),
                   rbinom(5, 23, 0.7),
                   rbinom(5, 15, 0.7),
                   rbinom(5, 14, 0.7),
                   rbinom(5, 14, 0.7))) %>%
  # add 'series' information, which is an identifier of site, replicate and species
  dplyr::mutate(series = paste0('site_', site,
                                '_', species,
                                '_rep_', replicate),
                time = as.numeric(time),
                # add a 'cap' variable that defines the maximum latent N to 
                # marginalize over when estimating latent abundance; in other words
                # how large do we realistically think the true abundance could be?
                cap = 100) %>%
  dplyr::select(- replicate) -> testdat

# Now add another species that has a different temporal trend and a smaller 
# detection probability (0.45 for this species)
testdat = testdat %>%
  dplyr::bind_rows(data.frame(site = 1,
                              replicate = rep(1:5, 6),
                              time = sort(rep(1:6, 5)),
                              species = 'sp_2',
                              truth = c(rep(4, 5),
                                        rep(7, 5),
                                        rep(15, 5),
                                        rep(16, 5),
                                        rep(19, 5),
                                        rep(18, 5)),
                              obs = c(rbinom(5, 4, 0.45),
                                      rbinom(5, 7, 0.45),
                                      rbinom(5, 15, 0.45),
                                      rbinom(5, 16, 0.45),
                                      rbinom(5, 19, 0.45),
                                      rbinom(5, 18, 0.45))) %>%
                     dplyr::mutate(series = paste0('site_', site,
                                                   '_', species,
                                                   '_rep_', replicate),
                                   time = as.numeric(time),
                                   cap = 50) %>%
                     dplyr::select(-replicate))

This data format isn’t too difficult to set up, but it does differ from the traditional multidimensional array setup that is commonly used for fitting N-mixture models in other software packages. Next we ensure that species and series IDs are included as factor variables, in case we’d like to allow certain effects to vary by species

testdat$species <- factor(testdat$species,
                          levels = unique(testdat$species))
testdat$series <- factor(testdat$series,
                         levels = unique(testdat$series))

Preview the dataset to get an idea of how it is structured:

dplyr::glimpse(testdat)
#> Rows: 60
#> Columns: 7
#> $ site    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ time    <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5,…
#> $ species <fct> sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp…
#> $ truth   <dbl> 28, 28, 28, 28, 28, 26, 26, 26, 26, 26, 23, 23, 23, 23, 23, 16…
#> $ obs     <int> 20, 19, 23, 17, 18, 21, 18, 21, 19, 18, 17, 16, 20, 11, 19, 9,…
#> $ series  <fct> site_1_sp_1_rep_1, site_1_sp_1_rep_2, site_1_sp_1_rep_3, site_…
#> $ cap     <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 10…
head(testdat, 12)
#>    site time species truth obs            series cap
#> 1     1    1    sp_1    28  20 site_1_sp_1_rep_1 100
#> 2     1    1    sp_1    28  19 site_1_sp_1_rep_2 100
#> 3     1    1    sp_1    28  23 site_1_sp_1_rep_3 100
#> 4     1    1    sp_1    28  17 site_1_sp_1_rep_4 100
#> 5     1    1    sp_1    28  18 site_1_sp_1_rep_5 100
#> 6     1    2    sp_1    26  21 site_1_sp_1_rep_1 100
#> 7     1    2    sp_1    26  18 site_1_sp_1_rep_2 100
#> 8     1    2    sp_1    26  21 site_1_sp_1_rep_3 100
#> 9     1    2    sp_1    26  19 site_1_sp_1_rep_4 100
#> 10    1    2    sp_1    26  18 site_1_sp_1_rep_5 100
#> 11    1    3    sp_1    23  17 site_1_sp_1_rep_1 100
#> 12    1    3    sp_1    23  16 site_1_sp_1_rep_2 100

Setting up the trend_map

Finally, we need to set up the trend_map object. This is crucial for allowing multiple observations to be linked to the same latent process model (see more information about this argument in the Shared latent states vignette). In this case, the mapping operates by species and site to state that each set of replicate observations from the same time point should all share the exact same latent abundance model:

testdat %>%
  # each unique combination of site*species is a separate process
  dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>%
  dplyr::select(trend, series) %>%
  dplyr::distinct() -> trend_map
trend_map
#>    trend            series
#> 1      1 site_1_sp_1_rep_1
#> 2      1 site_1_sp_1_rep_2
#> 3      1 site_1_sp_1_rep_3
#> 4      1 site_1_sp_1_rep_4
#> 5      1 site_1_sp_1_rep_5
#> 6      2 site_1_sp_2_rep_1
#> 7      2 site_1_sp_2_rep_2
#> 8      2 site_1_sp_2_rep_3
#> 9      2 site_1_sp_2_rep_4
#> 10     2 site_1_sp_2_rep_5

Notice how all of the replicates for species 1 in site 1 share the same process (i.e. the same trend). This will ensure that all replicates are Binomial draws of the same latent N.

Modelling with the nmix() family

Now we are ready to fit a model using mvgam(). This model will allow each species to have different detection probabilities and different temporal trends. We will use Cmdstan as the backend, which by default will use Hamiltonian Monte Carlo for full Bayesian inference

mod <- mvgam(
  # the observation formula sets up linear predictors for
  # detection probability on the logit scale
  formula = obs ~ species - 1,
  
  # the trend_formula sets up the linear predictors for 
  # the latent abundance processes on the log scale
  trend_formula = ~ s(time, by = trend, k = 4) + species,
  
  # the trend_map takes care of the mapping
  trend_map = trend_map,
  
  # nmix() family and data
  family = nmix(),
  data = testdat,
  
  # priors can be set in the usual way
  priors = c(prior(std_normal(), class = b),
             prior(normal(1, 1.5), class = Intercept_trend)),
  samples = 1000)

View the automatically-generated Stan code to get a sense of how the marginalization over latent N works

code(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp_trend; // number of trend smoothing parameters
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent states
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   int<lower=0> num_basis_trend; // number of trend basis coefficients
#>   vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   array[n, n_lv] int ytimes_trend;
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[total_obs] int<lower=0> cap; // upper limits of latent abundances
#>   array[total_obs] int ytimes_array; // sorted ytimes
#>   array[n, n_series] int<lower=0> ytimes_pred; // time-ordered matrix for prediction
#>   int<lower=0> K_groups; // number of unique replicated observations
#>   int<lower=0> K_reps; // maximum number of replicate observations
#>   array[K_groups] int<lower=0> K_starts; // col of K_inds where each group starts
#>   array[K_groups] int<lower=0> K_stops; // col of K_inds where each group ends
#>   array[K_groups, K_reps] int<lower=0> K_inds; // indices of replicated observations
#>   matrix[3, 6] S_trend1; // mgcv smooth penalty matrix S_trend1
#>   matrix[3, 6] S_trend2; // mgcv smooth penalty matrix S_trend2
#>   array[total_obs] int<lower=0> flat_ys; // flattened observations
#> }
#> transformed data {
#>   matrix[total_obs, num_basis] X_ordered = X[ytimes_array,  : ];
#>   array[K_groups] int<lower=0> Y_max;
#>   array[K_groups] int<lower=0> N_max;
#>   for (k in 1 : K_groups) {
#>     Y_max[k] = max(flat_ys[K_inds[k, K_starts[k] : K_stops[k]]]);
#>     N_max[k] = max(cap[K_inds[k, K_starts[k] : K_stops[k]]]);
#>   }
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   vector[num_basis_trend] b_raw_trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp_trend] lambda_trend;
#> }
#> transformed parameters {
#>   // detection probability
#>   vector[total_obs] p;
#>   
#>   // latent states
#>   matrix[n, n_lv] LV;
#>   
#>   // latent states and loading matrix
#>   vector[n * n_lv] trend_mus;
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   vector[num_basis_trend] b_trend;
#>   
#>   // observation model basis coefficients
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // process model basis coefficients
#>   b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend];
#>   
#>   // detection probability
#>   p = X_ordered * b;
#>   
#>   // latent process linear predictors
#>   trend_mus = X_trend * b_trend;
#>   for (j in 1 : n_lv) {
#>     LV[1 : n, j] = trend_mus[ytimes_trend[1 : n, j]];
#>   }
#>   
#>   // derived latent states
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for speciessp_1...
#>   b_raw[1] ~ std_normal();
#>   
#>   // prior for speciessp_2...
#>   b_raw[2] ~ std_normal();
#>   
#>   // dynamic process models
#>   
#>   // prior for (Intercept)_trend...
#>   b_raw_trend[1] ~ normal(1, 1.5);
#>   
#>   // prior for speciessp_2_trend...
#>   b_raw_trend[2] ~ std_normal();
#>   
#>   // prior for s(time):trendtrend1_trend...
#>   b_raw_trend[3 : 5] ~ multi_normal_prec(zero_trend[3 : 5],
#>                                          S_trend1[1 : 3, 1 : 3]
#>                                          * lambda_trend[1]
#>                                          + S_trend1[1 : 3, 4 : 6]
#>                                            * lambda_trend[2]);
#>   
#>   // prior for s(time):trendtrend2_trend...
#>   b_raw_trend[6 : 8] ~ multi_normal_prec(zero_trend[6 : 8],
#>                                          S_trend2[1 : 3, 1 : 3]
#>                                          * lambda_trend[3]
#>                                          + S_trend2[1 : 3, 4 : 6]
#>                                            * lambda_trend[4]);
#>   lambda_trend ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     array[total_obs] real flat_trends;
#>     array[total_obs] real flat_ps;
#>     flat_trends = to_array_1d(trend);
#>     flat_ps = to_array_1d(p);
#>     
#>     // loop over replicate sampling window (each site*time*species combination)
#>     for (k in 1 : K_groups) {
#>       // all log_lambdas are identical because they represent site*time
#>       // covariates; so just use the first measurement
#>       real log_lambda = flat_trends[K_inds[k, 1]];
#>       
#>       // logit-scale detection probilities for the replicate observations
#>       vector[size(K_inds[k, K_starts[k] : K_stops[k]])] logit_p = to_vector(flat_ps[K_inds[k, K_starts[k] : K_stops[k]]]);
#>       
#>       // K values and observed counts for these replicates
#>       int K_max = N_max[k];
#>       int K_min = Y_max[k];
#>       array[size(K_inds[k, K_starts[k] : K_stops[k]])] int N_obs = flat_ys[K_inds[k, K_starts[k] : K_stops[k]]];
#>       int possible_N = K_max - K_min;
#>       
#>       // marginalize over possible latent counts analytically
#>       real ff = exp(log_lambda) * prod(1 - inv_logit(logit_p));
#>       real prob_n = 1;
#>       for (i in 1 : possible_N) {
#>         real N = K_max - i + 1;
#>         real k_obs = 1;
#>         for (j in 1 : size(N_obs)) {
#>           k_obs *= N / (N - N_obs[j]);
#>         }
#>         prob_n = 1 + prob_n * ff * k_obs / N;
#>       }
#>       
#>       // add log(pr_n) to prob(K_min)
#>       target += poisson_log_lpmf(K_min | log_lambda)
#>                 + binomial_logit_lpmf(N_obs | K_min, logit_p) + log(prob_n);
#>     }
#>   }
#> }
#> generated quantities {
#>   vector[n_lv] penalty = rep_vector(1e12, n_lv);
#>   vector[n_sp_trend] rho_trend = log(lambda_trend);
#> }

The posterior summary of this model shows that it has converged nicely

summary(mod)
#> GAM observation formula:
#> obs ~ species - 1
#> 
#> GAM process formula:
#> ~s(time, by = trend, k = 4) + species
#> 
#> Family:
#> nmix
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 10 
#> 
#> N timepoints:
#> 6 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1500; warmup = 500; thin = 1 
#> Total post-warmup draws = 4000
#> 
#> 
#> GAM observation model coefficient (beta) estimates:
#>              2.5%   50% 97.5% Rhat n_eff
#> speciessp_1 -0.28 0.710  1.40    1  2060
#> speciessp_2 -1.20 0.014  0.89    1  1892
#> 
#> GAM process model coefficient (beta) estimates:
#>                               2.5%     50%  97.5% Rhat n_eff
#> (Intercept)_trend            2.700  3.0000  3.500 1.00  1824
#> speciessp_2_trend           -1.200 -0.6200  0.160 1.00  1483
#> s(time):trendtrend1.1_trend -0.080  0.0160  0.220 1.00  1030
#> s(time):trendtrend1.2_trend -0.240  0.0078  0.280 1.00  2105
#> s(time):trendtrend1.3_trend -0.470 -0.2500 -0.036 1.00  2200
#> s(time):trendtrend2.1_trend -0.270 -0.0140  0.085 1.01   509
#> s(time):trendtrend2.2_trend -0.200  0.0290  0.500 1.00   859
#> s(time):trendtrend2.3_trend  0.028  0.3300  0.630 1.00  2317
#> 
#> Approximate significance of GAM process smooths:
#>                       edf Ref.df Chi.sq p-value
#> s(time):seriestrend1 1.13      3   1.10    0.74
#> s(time):seriestrend2 1.07      3   3.17    0.59
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 4000 iterations ended with a divergence (0%)
#> 0 of 4000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:07:59 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

loo() functionality works just as it does for all mvgam models to aid in model comparison / selection (though note that Pareto K values often give warnings for mixture models so these may not be too helpful)

loo(mod)
#> 
#> Computed from 4000 by 60 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -233.5 14.9
#> p_loo        86.8 14.0
#> looic       467.1 29.8
#> ------
#> MCSE of elpd_loo is NA.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.1]).
#> 
#> Pareto k diagnostic values:
#>                          Count Pct.    Min. ESS
#> (-Inf, 0.7]   (good)     30    50.0%   679     
#>    (0.7, 1]   (bad)      13    21.7%   <NA>    
#>    (1, Inf)   (very bad) 17    28.3%   <NA>    
#> See help('pareto-k-diagnostic') for details.

Plot the estimated smooths of time from each species’ latent abundance process (on the log scale)

plot(mod, type = 'smooths', trend_effects = TRUE)

marginaleffects support allows for more useful prediction-based interrogations on different scales (though note that at the time of writing this Vignette, you must have the development version of marginaleffects installed for nmix() models to be supported; use remotes::install_github('vincentarelbundock/marginaleffects') to install). Objects that use family nmix() have a few additional prediction scales that can be used (i.e. link, response, detection or latent_N). For example, here are the estimated detection probabilities per species, which show that the model has done a nice job of estimating these parameters:

marginaleffects::plot_predictions(mod, condition = 'species',
                 type = 'detection') +
  ylab('Pr(detection)') +
  ylim(c(0, 1)) +
  theme_classic() +
  theme(legend.position = 'none')

A common goal in N-mixture modelling is to estimate the true latent abundance. The model has automatically generated predictions for the unknown latent abundance that are conditional on the observations. We can extract these and produce decent plots using a small function

hc <- hindcast(mod, type = 'latent_N')

# Function to plot latent abundance estimates vs truth
plot_latentN = function(hindcasts, data, species = 'sp_1'){
  all_series <- unique(data %>%
                         dplyr::filter(species == !!species) %>%
                         dplyr::pull(series))
  
  # Grab the first replicate that represents this series
  # so we can get the true simulated values
  series <- as.numeric(all_series[1])
  truths <- data %>%
    dplyr::arrange(time, series) %>%
    dplyr::filter(series == !!levels(data$series)[series]) %>%
    dplyr::pull(truth)
  
  # In case some replicates have missing observations,
  # pull out predictions for ALL replicates and average over them
  hcs <- do.call(rbind, lapply(all_series, function(x){
    ind <- which(names(hindcasts$hindcasts) %in% as.character(x))
    hindcasts$hindcasts[[ind]]
  }))
  
  # Calculate posterior empirical quantiles of predictions
  pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) 
    quantile(x, probs = c(0.05, 0.2, 0.3, 0.4, 
                          0.5, 0.6, 0.7, 0.8, 0.95)))))
  pred_quantiles$time <- 1:NROW(pred_quantiles)
  pred_quantiles$truth <- truths
  
  # Grab observations
  data %>%
    dplyr::filter(series %in% all_series) %>%
    dplyr::select(time, obs) -> observations
  
  # Plot
  ggplot(pred_quantiles, aes(x = time, group = 1)) +
    geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + 
    geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") +
    geom_line(aes(x = time, y = truth),
              colour = 'black', linewidth = 1) +
    geom_point(aes(x = time, y = truth),
               shape = 21, colour = 'white', fill = 'black',
               size = 2.5) +
    geom_jitter(data = observations, aes(x = time, y = obs),
                width = 0.06, 
                shape = 21, fill = 'darkred', colour = 'white', size = 2.5) +
    labs(y = 'Latent abundance (N)',
         x = 'Time',
         title = species)
}

Latent abundance plots vs the simulated truths for each species are shown below. Here, the red points show the imperfect observations, the black line shows the true latent abundance, and the ribbons show credible intervals of our estimates:

plot_latentN(hc, testdat, species = 'sp_1')

plot_latentN(hc, testdat, species = 'sp_2')

We can see that estimates for both species have correctly captured the true temporal variation and magnitudes in abundance

Example 2: a larger survey with possible nonlinear effects

Now for another example with a larger dataset. We will use data from Jeff Doser’s simulation example from the wonderful spAbundance package. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models.

Download the data and grab observations / covariate measurements for one species

# Date link
load(url('https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda'))
data.one.sp <- dataNMixSim

# Pull out observations for one species
data.one.sp$y <- data.one.sp$y[1, , ]

# Abundance covariates that don't change across repeat sampling observations
abund.cov <- dataNMixSim$abund.covs[, 1]
abund.factor <- as.factor(dataNMixSim$abund.covs[, 2])

# Detection covariates that can change across repeat sampling observations
# Note that `NA`s are not allowed for covariates in mvgam, so we randomly
# impute them here
det.cov <- dataNMixSim$det.covs$det.cov.1[,]
det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov))))
det.cov2 <- dataNMixSim$det.covs$det.cov.2
det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2))))

Next we wrangle into the appropriate ‘long’ data format, adding indicators of time and series for working in mvgam. We also add the cap variable to represent the maximum latent N to marginalize over for each observation

mod_data <- do.call(rbind,
                    lapply(1:NROW(data.one.sp$y), function(x){
                      data.frame(y = data.one.sp$y[x,],
                                 abund_cov = abund.cov[x],
                                 abund_fac = abund.factor[x],
                                 det_cov = det.cov[x,],
                                 det_cov2 = det.cov2[x,],
                                 replicate = 1:NCOL(data.one.sp$y),
                                 site = paste0('site', x))
                    })) %>%
  dplyr::mutate(species = 'sp_1',
                series = as.factor(paste0(site, '_', species, '_', replicate))) %>%
  dplyr::mutate(site = factor(site, levels = unique(site)),
                species = factor(species, levels = unique(species)),
                time = 1,
                cap = max(data.one.sp$y, na.rm = TRUE) + 20)

The data include observations for 225 sites with three replicates per site, though some observations are missing

NROW(mod_data)
#> [1] 675
dplyr::glimpse(mod_data)
#> Rows: 675
#> Columns: 11
#> $ y         <int> 1, NA, NA, NA, 2, 2, NA, 1, NA, NA, 0, 1, 0, 0, 0, 0, NA, NA
#> $ abund_cov <dbl> -0.3734384, -0.3734384, -0.3734384, 0.7064305, 0.7064305, 0.…
#> $ abund_fac <fct> 3, 3, 3, 4, 4, 4, 9, 9, 9, 2, 2, 2, 3, 3, 3, 2, 2, 2, 1, 1, …
#> $ det_cov   <dbl> -1.2827999, 1.1770575, -1.3636225, -0.2922343, 0.1954809, 0.…
#> $ det_cov2  <dbl> 2.03047314, 0.01556041, 0.05861094, 0.90822039, 1.04555361, …
#> $ replicate <int> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …
#> $ site      <fct> site1, site1, site1, site2, site2, site2, site3, site3, site…
#> $ species   <fct> sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, …
#> $ series    <fct> site1_sp_1_1, site1_sp_1_2, site1_sp_1_3, site2_sp_1_1, site…
#> $ time      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cap       <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, …
head(mod_data)
#>    y  abund_cov abund_fac    det_cov   det_cov2 replicate  site species
#> 1  1 -0.3734384         3 -1.2827999 2.03047314         1 site1    sp_1
#> 2 NA -0.3734384         3  1.1770575 0.01556041         2 site1    sp_1
#> 3 NA -0.3734384         3 -1.3636225 0.05861094         3 site1    sp_1
#> 4 NA  0.7064305         4 -0.2922343 0.90822039         1 site2    sp_1
#> 5  2  0.7064305         4  0.1954809 1.04555361         2 site2    sp_1
#> 6  2  0.7064305         4  0.9673034 1.91971178         3 site2    sp_1
#>         series time cap
#> 1 site1_sp_1_1    1  33
#> 2 site1_sp_1_2    1  33
#> 3 site1_sp_1_3    1  33
#> 4 site2_sp_1_1    1  33
#> 5 site2_sp_1_2    1  33
#> 6 site2_sp_1_3    1  33

The final step for data preparation is of course the trend_map, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above

mod_data %>%
  # each unique combination of site*species is a separate process
  dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>%
  dplyr::select(trend, series) %>%
  dplyr::distinct() -> trend_map

trend_map %>%
  dplyr::arrange(trend) %>%
  head(12)
#>    trend         series
#> 1      1 site100_sp_1_1
#> 2      1 site100_sp_1_2
#> 3      1 site100_sp_1_3
#> 4      2 site101_sp_1_1
#> 5      2 site101_sp_1_2
#> 6      2 site101_sp_1_3
#> 7      3 site102_sp_1_1
#> 8      3 site102_sp_1_2
#> 9      3 site102_sp_1_3
#> 10     4 site103_sp_1_1
#> 11     4 site103_sp_1_2
#> 12     4 site103_sp_1_3

Now we are ready to fit a model using mvgam(). Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how mvgam can make use of the different approximation algorithms available in Stan by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example)

mod <- mvgam(
  # effects of covariates on detection probability;
  # here we use penalized splines for both continuous covariates
  formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4),
  
  # effects of the covariates on latent abundance;
  # here we use a penalized spline for the continuous covariate and
  # hierarchical intercepts for the factor covariate
  trend_formula = ~ s(abund_cov, k = 4) +
    s(abund_fac, bs = 're'),
  
  # link multiple observations to each site
  trend_map = trend_map,
  
  # nmix() family and supplied data
  family = nmix(),
  data = mod_data,
  
  # standard normal priors on key regression parameters
  priors = c(prior(std_normal(), class = 'b'),
             prior(std_normal(), class = 'Intercept'),
             prior(std_normal(), class = 'Intercept_trend'),
             prior(std_normal(), class = 'sigma_raw_trend')),
  
  # use Stan's variational inference for quicker results
  algorithm = 'meanfield',
  
  # no need to compute "series-level" residuals
  residuals = FALSE,
  samples = 1000)

Inspect the model summary but don’t bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ s(det_cov, k = 3) + s(det_cov2, k = 3)
#> 
#> GAM process formula:
#> ~s(abund_cov, k = 3) + s(abund_fac, bs = "re")
#> 
#> Family:
#> nmix
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N process models:
#> 225 
#> 
#> N series:
#> 675 
#> 
#> N timepoints:
#> 1 
#> 
#> Status:
#> Fitted using Stan 
#> 1 chains, each with iter = 1000; warmup = ; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM observation model coefficient (beta) estimates:
#>              2.5%  50% 97.5% Rhat n.eff
#> (Intercept) 0.022 0.33  0.63  NaN   NaN
#> 
#> Approximate significance of GAM observation smooths:
#>              edf Ref.df Chi.sq p-value    
#> s(det_cov)  1.17      2    113 0.00063 ***
#> s(det_cov2) 1.84      2    572 < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> GAM process model coefficient (beta) estimates:
#>                    2.5%   50%  97.5% Rhat n.eff
#> (Intercept)_trend -0.33 -0.18 -0.028  NaN   NaN
#> 
#> GAM process model group-level estimates:
#>                           2.5%   50% 97.5% Rhat n.eff
#> mean(s(abund_fac))_trend -0.11 0.037  0.19  NaN   NaN
#> sd(s(abund_fac))_trend    0.27 0.400  0.59  NaN   NaN
#> 
#> Approximate significance of GAM process smooths:
#>               edf Ref.df Chi.sq p-value
#> s(abund_cov) 1.21      2   1.35    0.37
#> s(abund_fac) 8.84     10  12.23    0.19
#> 
#> Posterior approximation used: no diagnostics to compute
#> 
#> Use how_to_cite(mod) to get started describing this model

Again we can make use of marginaleffects support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability

marginaleffects::avg_predictions(mod, type = 'detection')
#> 
#>  Estimate 2.5 % 97.5 %
#>     0.568 0.504   0.63
#> 
#> Type:  detection 
#> Columns: estimate, conf.low, conf.high

Next investigate estimated effects of covariates on latent abundance using the conditional_effects() function and specifying type = 'link'; this will return plots on the expectation scale

abund_plots <- plot(conditional_effects(mod,
                                        type = 'link',
                                        effects = c('abund_cov',
                                                    'abund_fac')),
                    plot = FALSE)

The effect of the continuous covariate on expected latent abundance

abund_plots[[1]] +
  ylab('Expected latent abundance')

The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect

abund_plots[[2]] +
  ylab('Expected latent abundance')

Now we can investigate estimated effects of covariates on detection probability using type = 'detection'

det_plots <- plot(conditional_effects(mod,
                                      type = 'detection',
                                      effects = c('det_cov',
                                                  'det_cov2')),
                  plot = FALSE)

The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful

det_plots[[1]] +
  ylab('Pr(detection)')

det_plots[[2]] +
  ylab('Pr(detection)')

More targeted predictions are also easy with marginaleffects support. For example, we can ask: How does detection probability change as we change both detection covariates?

fivenum_round = function(x)round(fivenum(x, na.rm = TRUE), 2)

marginaleffects::plot_predictions(mod, 
                 newdata = marginaleffects::datagrid(det_cov = unique,
                                    det_cov2 = fivenum_round),
                 by = c('det_cov', 'det_cov2'),
                 type = 'detection') +
  theme_classic() +
  ylab('Pr(detection)')

The model has found support for some important covariate effects, but of course we’d want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths).

Further reading

The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations:

Guélat, Jérôme, and Kéry, Marc. “Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.Methods in Ecology and Evolution 9 (2018): 1614–25.

Kéry, Marc, and Royle Andrew J. “Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models”. London, UK: Academic Press (2020).

Royle, Andrew J. “N‐mixture models for estimating population size from spatially replicated counts.Biometrics 60.1 (2004): 108-115.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/shared_states.html ================================================ Shared latent states in mvgam • mvgam Skip to contents

This vignette gives an example of how mvgam can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic mvgam functionality, please see the introductory vignette.

The trend_map argument

The trend_map argument in the mvgam() function is an optional data.frame that can be used to specify which series should depend on which latent process models (called “trends” in mvgam). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting use_lv = TRUE and using the supplied trend_map to set up the shared trends. Users familiar with the MARSS family of packages will recognize this as a way of specifying the \(Z\) matrix. This data.frame needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the series variable in data). For example, if we were to simulate a collection of three integer-valued time series (using sim_mvgam), the following trend_map would force the first two series to share the same latent trend process:

set.seed(122)
simdat <- sim_mvgam(trend_model = AR(),
                    prop_trend = 0.6,
                    mu = c(0, 1, 2),
                    family = poisson())
trend_map <- data.frame(series = unique(simdat$data_train$series),
                        trend = c(1, 1, 2))
trend_map
#>     series trend
#> 1 series_1     1
#> 2 series_2     1
#> 3 series_3     2

We can see that the factor levels in trend_map match those in the data:

all.equal(levels(trend_map$series), levels(simdat$data_train$series))
#> [1] TRUE

Checking trend_map with run_model = FALSE

Supplying this trend_map to the mvgam function for a simple model, but setting run_model = FALSE, allows us to inspect the constructed Stan code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the mvgam framework:

fake_mod <- mvgam(y ~ 
                    # observation model formula, which has a 
                    # different intercept per series
                    series - 1,
                  
                  # process model formula, which has a shared seasonal smooth
                  # (each latent process model shares the SAME smooth)
                  trend_formula = ~ s(season, bs = 'cc', k = 6),
                  
                  # AR1 dynamics (each latent process model has DIFFERENT)
                  # dynamics; processes are estimated using the noncentred
                  # parameterisation for improved efficiency
                  trend_model = AR(),
                  noncentred = TRUE,
                  
                  # supplied trend_map
                  trend_map = trend_map,
                  
                  # data and observation family
                  family = poisson(),
                  data = simdat$data_train,
                  run_model = FALSE)

Inspecting the Stan code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied trend_map:

code(fake_mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp_trend; // number of trend smoothing parameters
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent states
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   int<lower=0> num_basis_trend; // number of trend basis coefficients
#>   vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   array[n, n_lv] int ytimes_trend;
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   matrix[4, 4] S_trend1; // mgcv smooth penalty matrix S_trend1
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   vector[num_basis_trend] b_raw_trend;
#>   
#>   // latent state SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // latent state AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // raw latent states
#>   matrix[n, n_lv] LV_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp_trend] lambda_trend;
#> }
#> transformed parameters {
#>   // raw latent states
#>   vector[n * n_lv] trend_mus;
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent states
#>   matrix[n, n_lv] LV;
#>   vector[num_basis_trend] b_trend;
#>   
#>   // observation model basis coefficients
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // process model basis coefficients
#>   b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend];
#>   
#>   // latent process linear predictors
#>   trend_mus = X_trend * b_trend;
#>   LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));
#>   for (j in 1 : n_lv) {
#>     LV[1, j] += trend_mus[ytimes_trend[1, j]];
#>     for (i in 2 : n) {
#>       LV[i, j] += trend_mus[ytimes_trend[i, j]]
#>                   + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);
#>     }
#>   }
#>   
#>   // derived latent states
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for seriesseries_1...
#>   b_raw[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_2...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_3...
#>   b_raw[3] ~ student_t(3, 0, 2);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for latent state SD parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   to_vector(LV_raw) ~ std_normal();
#>   
#>   // dynamic process models
#>   
#>   // prior for (Intercept)_trend...
#>   b_raw_trend[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(season)_trend...
#>   b_raw_trend[2 : 5] ~ multi_normal_prec(zero_trend[2 : 5],
#>                                          S_trend1[1 : 4, 1 : 4]
#>                                          * lambda_trend[1]);
#>   lambda_trend ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp_trend] rho_trend;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   penalty = 1.0 / (sigma .* sigma);
#>   rho_trend = log(lambda_trend);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Notice the line that states “lv_coefs = Z;”. This uses the supplied \(Z\) matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you’d use if you were to create a similar model in the MARSS package:

fake_mod$model_data$Z
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1

Fitting and inspecting the model

Though this model doesn’t perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like:

full_mod <- mvgam(y ~ series - 1,
                  trend_formula = ~ s(season, bs = 'cc', k = 6),
                  trend_model = AR(),
                  noncentred = TRUE,
                  trend_map = trend_map,
                  family = poisson(),
                  data = simdat$data_train,
                  silent = 2)

The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well

summary(full_mod)
#> GAM observation formula:
#> y ~ series - 1
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 6)
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> seriesseries_1 -2.50 -0.67   1.2    1   835
#> seriesseries_2 -1.60  0.27   2.1    1   835
#> seriesseries_3 -0.59  1.30   3.2    1   835
#> 
#> Process model AR parameter estimates:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar1[1] -0.57 -0.210  0.16 1.01   669
#> ar1[2] -0.29  0.018  0.32 1.01   399
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.38 0.56  0.76 1.00   712
#> sigma[2] 0.50 0.63  0.79 1.01   702
#> 
#> GAM process model coefficient (beta) estimates:
#>                     2.5%    50% 97.5% Rhat n_eff
#> (Intercept)_trend -1.100  0.760  2.60    1   831
#> s(season).1_trend -0.300 -0.051  0.22    1   852
#> s(season).2_trend -0.038  0.230  0.53    1   974
#> s(season).3_trend -0.510 -0.190  0.11    1   615
#> s(season).4_trend  0.340  0.680  0.97    1   603
#> 
#> Approximate significance of GAM process smooths:
#>            edf Ref.df Chi.sq p-value    
#> s(season) 2.95      4   16.4 0.00017 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:12:13 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(full_mod) to get started describing this model

Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different:

plot(full_mod, type = 'trend', series = 1)

plot(full_mod, type = 'trend', series = 2)

plot(full_mod, type = 'trend', series = 3)

However, forecasts for series’ 1 and 2 will differ because they have different intercepts in the observation model

Example: signal detection

Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called productivity, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation:

set.seed(0)
# simulate a nonlinear relationship using the mgcv function gamSim
signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1)
#> Gu & Wahba 4 term additive model

# productivity is one of the variables in the simulated data
productivity <- signal_dat$x2

# simulate the true signal, which already has a nonlinear relationship
# with productivity; we will add in a fairly strong AR1 process to 
# contribute to the signal
true_signal <- as.vector(scale(signal_dat$y) +
                         arima.sim(100, model = list(ar = 0.8, sd = 0.1)))

Plot the signal to inspect it’s evolution over time

plot(true_signal, type = 'l',
     bty = 'l', lwd = 2,
     ylab = 'True signal',
     xlab = 'Time')

Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called temperature in this example. Again this makes use of gamSim

sim_series = function(n_series = 3, true_signal){
  temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.1)
  temperature <- temp_effects$y
  alphas <- rnorm(n_series, sd = 2)

  do.call(rbind, lapply(seq_len(n_series), function(series){
    data.frame(observed = rnorm(length(true_signal),
                                mean = alphas[series] +
                                       1.5*as.vector(scale(temp_effects[, series + 1])) +
                                       true_signal,
                                sd = runif(1, 1, 2)),
               series = paste0('sensor_', series),
               time = 1:length(true_signal),
               temperature = temperature,
               productivity = productivity,
               true_signal = true_signal)
   }))
  }
model_dat <- sim_series(true_signal = true_signal) %>%
  dplyr::mutate(series = factor(series))
#> Gu & Wahba 4 term additive model, correlated predictors

Plot the sensor observations

plot_mvgam_series(data = model_dat, y = 'observed',
                  series = 'all')

And now plot the observed relationships between the three sensors and the temperature covariate

 plot(observed ~ temperature, data = model_dat %>%
   dplyr::filter(series == 'sensor_1'),
   pch = 16, bty = 'l',
   ylab = 'Sensor 1',
   xlab = 'Temperature')

 plot(observed ~ temperature, data = model_dat %>%
   dplyr::filter(series == 'sensor_2'),
   pch = 16, bty = 'l',
   ylab = 'Sensor 2',
   xlab = 'Temperature')

 plot(observed ~ temperature, data = model_dat %>%
   dplyr::filter(series == 'sensor_3'),
   pch = 16, bty = 'l',
   ylab = 'Sensor 3',
   xlab = 'Temperature')

The shared signal model

Now we can formulate and fit a model that allows each sensor’s observation error to depend nonlinearly on temperature while allowing the true signal to depend nonlinearly on productivity. By fixing all of the values in the trend column to 1 in the trend_map, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error

mod <- mvgam(formula =
               # formula for observations, allowing for different
               # intercepts and hierarchical smooth effects of temperature
               observed ~ series + 
               s(temperature, k = 10) +
               s(series, temperature, bs = 'sz', k = 8),
             
             trend_formula =
               # formula for the latent signal, which can depend
               # nonlinearly on productivity
               ~ s(productivity, k = 8) - 1,
             
             trend_model =
               # in addition to productivity effects, the signal is
               # assumed to exhibit temporal autocorrelation
               AR(),
             noncentred = TRUE,
             
             trend_map =
               # trend_map forces all sensors to track the same
               # latent signal
               data.frame(series = unique(model_dat$series),
                          trend = c(1, 1, 1)),
             
             # informative priors on process error
             # and observation error will help with convergence
             priors = c(prior(normal(2, 0.5), class = sigma),
                        prior(normal(1, 0.5), class = sigma_obs)),
             
             # Gaussian observations
             family = gaussian(),
             data = model_dat,
             silent = 2)

View a reduced version of the model summary because there will be many spline coefficients in this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> observed ~ series + s(temperature, k = 10) + s(series, temperature, 
#>     bs = "sz", k = 8)
#> 
#> GAM process formula:
#> ~s(productivity, k = 8) - 1
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1100; warmup = 600; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1]  1.4 1.7   2.1    1  1117
#> sigma_obs[2]  1.7 2.0   2.4    1  1881
#> sigma_obs[3]  2.0 2.3   2.7    1  2132
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%  50% 97.5% Rhat n_eff
#> (Intercept)    -3.30 -2.1 -0.81    1  1136
#> seriessensor_2 -2.80 -1.4 -0.19    1   945
#> seriessensor_3  0.75  3.1  4.80    1  1150
#> 
#> Approximate significance of GAM observation smooths:
#>                        edf Ref.df Chi.sq p-value    
#> s(temperature)        1.30      9   0.15 1.00000    
#> s(series,temperature) 2.17     16 107.42 0.00084 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Process model AR parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.38 0.6   0.8 1.02   490
#> 
#> Process error parameter estimates:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1]  1.4 1.8   2.2 1.01   738
#> 
#> Approximate significance of GAM process smooths:
#>                  edf Ref.df Chi.sq p-value
#> s(productivity) 1.02      7   5.17    0.95
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:13:15 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

Inspecting effects on both process and observation models

Don’t pay much attention to the approximate p-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don’t tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with conditional_effects:

conditional_effects(mod, type = 'link')

conditional_effects is simply a wrapper to the more flexible plot_predictions function from the marginaleffects package. We can get more useful plots of these effects using this function for further customisation:

require(marginaleffects)
plot_predictions(mod, 
                 condition = c('temperature', 'series', 'series'),
                 points = 0.5) +
  theme(legend.position = 'none')

We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time.

Recovering the hidden signal

A final but very key question is whether we can successfully recover the true hidden signal. The trend slot in the returned model parameters has the estimates for this signal, which we can easily plot using the mvgam S3 method for plot. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it:

plot(mod, type = 'trend')

# Overlay the true simulated signal
points(true_signal, pch = 16, cex = 1, col = 'white')
points(true_signal, pch = 16, cex = 0.8)

Further reading

The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice:

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Ward, Eric J., et al. “Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.Journal of Applied Ecology 47.1 (2010): 47-56.

Auger‐Méthé, Marie, et al. “A guide to state–space modeling of ecological time series.Ecological Monographs 91.4 (2021): e01470.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/time_varying_effects.html ================================================ Time-varying effects in mvgam • mvgam Skip to contents

The purpose of this vignette is to show how the mvgam package can be used to estimate and forecast regression coefficients that vary through time.

Time-varying effects

Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In mvgam, the dynamic() formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc…) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the gp basis in mgcv of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the ‘global’ dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn’t do well if we used splines to estimate the effect. An example below illustrates.

Simulating time-varying effects

Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale \(\rho\)=10. We will do this using an internal function from mvgam (the sim_gp function):

set.seed(1111)
N <- 200
beta_temp <- mvgam:::sim_gp(rnorm(1),
                            alpha_gp = 0.75,
                            rho_gp = 10,
                            h = N) + 0.5

A plot of the time-varying coefficient shows that it changes smoothly through time:

plot(beta_temp, type = 'l', lwd = 3, 
     bty = 'l', xlab = 'Time', ylab = 'Coefficient',
     col = 'darkred')
box(bty = 'l', lwd = 2)

Simulating time-varying effects in mvgam and R

Next we need to simulate the values of the covariate, which we will call temp (to represent \(temperature\)). In this case we just use a standard normal distribution to simulate this covariate:

temp <- rnorm(N, sd = 1)

Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of \(temperature\)

out <- rnorm(N, mean = 4 + beta_temp * temp,
             sd = 0.25)
time <- seq_along(temp)
plot(out,  type = 'l', lwd = 3, 
     bty = 'l', xlab = 'Time', ylab = 'Outcome',
     col = 'darkred')
box(bty = 'l', lwd = 2)

Simulating time-varying effects in mvgam and R

Gather the data into a data.frame for fitting models, and split the data into training and testing folds.

data <- data.frame(out, temp, time)
data_train <- data[1:190,]
data_test <- data[191:200,]

The dynamic() function

Time-varying coefficients can be fairly easily set up using the s() or gp() wrapper functions in mvgam formulae by fitting a nonlinear effect of time and using the covariate of interest as the numeric by variable (see ?mgcv::s or ?brms::gp for more details). The dynamic() formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to dynamic, it will either set up a low-rank GP smooth function using s() with bs = 'gp' and a fixed value of the length scale parameter \(\rho\), or it will set up a Hilbert space approximate GP using the gp() function with c=5/4 so that \(\rho\) is estimated (see ?dynamic for more details). In this first example we will use the s() option, and will mis-specify the \(\rho\) parameter here as, in practice, it is never known. This call to dynamic() will set up the following smooth: s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)

mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40),
             family = gaussian(),
             data = data_train,
             silent = 2)

Inspect the model summary, which shows how the dynamic() wrapper was used to construct a low-rank Gaussian Process smooth function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.23 0.25  0.28    1  2374
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  3037
#> 
#> Approximate significance of GAM smooths:
#>               edf Ref.df Chi.sq p-value    
#> s(time):temp 16.1     40    169  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 1 of 2000 iterations ended with a divergence (0.05%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:13:49 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

Because this model used a spline with a gp basis, it’s smooths can be visualised just like any other gam. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the newdata argument in plot_mvgam_smooth() to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it’s dynamics in both the training and testing data partitions

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = 'dashed', lwd = 2)
lines(beta_temp, lwd = 2.5, col = 'white')
lines(beta_temp, lwd = 2)

We can also use plot_predictions() from the marginaleffects package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of \(temperature\):

require(marginaleffects)
range_round = function(x){
  round(range(x, na.rm = TRUE), 2)
}
plot_predictions(mod, 
                 newdata = datagrid(time = unique,
                                    temp = range_round),
                 by = c('time', 'temp', 'temp'),
                 type = 'link')

This results in sensible forecasts of the observations as well

fc <- forecast(mod, newdata = data_test)
plot(fc)

The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the rho argument in dynamic() to make this happen. This will set up a call similar to gp(time, by = 'temp', c = 5/4, k = 40).

mod <- mvgam(out ~ dynamic(temp, k = 40),
             family = gaussian(),
             data = data_train,
             silent = 2)

This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ gp(time, by = temp, c = 5/4, k = 40, scale = TRUE)
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.24 0.26   0.3    1  1709
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  2286
#> 
#> GAM gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                      2.5%   50% 97.5% Rhat n_eff
#> alpha_gp(time):temp 0.640 0.890 1.400    1   681
#> rho_gp(time):temp   0.026 0.052 0.069    1   765
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:14:42 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

Effects for gp() terms can also be plotted as smooths:

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = 'dashed', lwd = 2)
lines(beta_temp, lwd = 2.5, col = 'white')
lines(beta_temp, lwd = 2)

Salmon survival example

Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. Scheuerell and Williams (2005) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the MARSS package:

load(url('https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda'))
dplyr::glimpse(SalmonSurvCUI)
#> Rows: 42
#> Columns: 3
#> $ year    <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 19…
#> $ logit.s <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82,…
#> $ CUI.apr <int> 57, 5, 43, 11, 47, -21, 25, -2, -1, 43, 2, 35, 0, 1, -1, 6, -7…

First we need to prepare the data for modelling. The variable CUI.apr will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a time indicator and a series indicator for working in mvgam:

SalmonSurvCUI %>%
  # create a time variable
  dplyr::mutate(time = dplyr::row_number()) %>%

  # create a series variable
  dplyr::mutate(series = as.factor('salmon')) %>%

  # z-score the covariate CUI.apr
  dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>%

  # convert logit-transformed survival back to proportional
  dplyr::mutate(survival = plogis(logit.s)) -> model_data

Inspect the data

dplyr::glimpse(model_data)
#> Rows: 42
#> Columns: 6
#> $ year     <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1…
#> $ logit.s  <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82…
#> $ CUI.apr  <dbl> 2.37949804, 0.03330223, 1.74782994, 0.30401713, 1.92830654, -…
#> $ time     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ series   <fct> salmon, salmon, salmon, salmon, salmon, salmon, salmon, salmo…
#> $ survival <dbl> 0.030472033, 0.034891409, 0.027119717, 0.046088827, 0.0263393…

Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model:

plot_mvgam_series(data = model_data, y = 'survival')

A State-Space Beta regression

mvgam can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the mgcv function betar(), see ?mgcv::betar for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model:

mod0 <- mvgam(formula = survival ~ 1,
              trend_model = AR(),
              noncentred = TRUE,
              priors = prior(normal(-3.5, 0.5), class = Intercept),
              family = betar(),
              data = model_data,
              silent = 2)

The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters:

summary(mod0)
#> GAM formula:
#> survival ~ 1
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]   80 240   550 1.04   144
#> 
#> GAM coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.6 -4.3    -4    1   224
#> 
#> Latent trend parameter AR estimates:
#>            2.5%  50% 97.5% Rhat n_eff
#> ar1[1]   -0.370 0.68  0.98 1.01   321
#> sigma[1]  0.047 0.44  0.70 1.03   132
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 71 of 2000 iterations ended with a divergence (3.55%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:15:33 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod0) to get started describing this model

A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series:

plot(mod0, type = 'trend')

Including time-varying upwelling effects

Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a dynamic() effect of CUI.apr in the latent process model. We do not specify the \(\rho\) parameter, instead opting to estimate it using a Hilbert space approximate GP:

mod1 <- mvgam(formula = survival ~ 1,
              trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1,
              trend_model = AR(),
              noncentred = TRUE,
              priors = prior(normal(-3.5, 0.5), class = Intercept),
              family = betar(),
              data = model_data,
              silent = 2)

The summary for this model now includes estimates for the time-varying GP parameters:

summary(mod1, include_betas = FALSE)
#> GAM observation formula:
#> survival ~ 1
#> 
#> GAM process formula:
#> ~dynamic(CUI.apr, k = 25, scale = FALSE) - 1
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]  170 350   670    1   653
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.4 -3.8  -2.9 1.01   677
#> 
#> Process model AR parameter estimates:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.55 0.92  0.99 1.01   468
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.19 0.34  0.56    1   630
#> 
#> GAM process model gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                         2.5% 50% 97.5% Rhat n_eff
#> alpha_gp(time):CUI.apr 0.018 0.3   1.2    1   520
#> rho_gp(time):CUI.apr   1.300 5.7  37.0    1   351
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 83 of 2000 iterations ended with a divergence (4.15%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:17:08 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod1) to get started describing this model

The estimates for the underlying dynamic process, and for the hindcasts, haven’t changed much:

plot(mod1, type = 'trend')

plot(mod1, type = 'forecast')

But the process error parameter \(\sigma\) is slightly smaller for this model than for the first model:

# Extract estimates of the process error 'sigma' for each model
mod0_sigma <- as.data.frame(mod0, variable = 'sigma', regex = TRUE) %>%
  dplyr::mutate(model = 'Mod0')
mod1_sigma <- as.data.frame(mod1, variable = 'sigma', regex = TRUE) %>%
  dplyr::mutate(model = 'Mod1')
sigmas <- rbind(mod0_sigma, mod1_sigma)

# Plot using ggplot2
require(ggplot2)
ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) +
  geom_density(alpha = 0.3, colour = NA) +
  coord_flip()

Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using plot():

plot(mod1, type = 'smooths', trend_effects = TRUE)

Comparing model predictive performances

A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in mvgam for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular loo package:

loo_compare(mod0, mod1)
#>      elpd_diff se_diff
#> mod1  0.0       0.0   
#> mod0 -8.3       2.9

The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, mvgam also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don’t need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020).

lfo_mod0 <- lfo_cv(mod0, min_t = 30)
lfo_mod1 <- lfo_cv(mod1, min_t = 30)

The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD

sum(lfo_mod0$elpds)
#> [1] 35.85707
sum(lfo_mod1$elpds)
#> [1] 37.15718

We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts:

plot(x = 1:length(lfo_mod0$elpds) + 30,
     y = lfo_mod0$elpds - lfo_mod1$elpds,
     ylab = 'ELPDmod0 - ELPDmod1',
     xlab = 'Evaluation time point',
     pch = 16,
     col = 'darkred',
     bty = 'l')
abline(h = 0, lty = 'dashed')

Comparing forecast skill for dynamic beta regression models in mvgam and R

A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in mvgam(). But for now, we will leave the model as-is.

Further reading

The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice:

Bürkner, PC, Gabry, J and Vehtari, A Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation. 90:14 (2020) 2499-2523.

Herrero, Asier, et al. From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits. Journal of Ecology 104.2 (2016): 430-442.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Scheuerell, Mark D., and John G. Williams. Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (Oncorhynchus Tshawytscha) Fisheries Oceanography 14 (2005): 448–57.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/articles/trend_formulas.html ================================================ State-Space models in mvgam • mvgam Skip to contents

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate State-Space models with nonlinear effects.

State-Space Models

Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process
Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process


State-Space models allow us to separately make inferences about the underlying dynamic process model that we are interested in (i.e. the evolution of a time series or a collection of time series) and the observation model (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariate impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of mvgam is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes.

Lake Washington plankton data

The data we will use to illustrate how we can fit State-Space models in mvgam are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the MARSS package and can be downloaded using the following:

load(url('https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda'))

We will work with five different groups of plankton:

outcomes <- c('Greens', 'Bluegreens', 'Diatoms', 'Unicells', 'Other.algae')

As usual, preparing the data into the correct format for mvgam modelling takes a little bit of wrangling in dplyr:

# loop across each plankton group to create the long datframe
plankton_data <- do.call(rbind, lapply(outcomes, function(x){
  
  # create a group-specific dataframe with counts labelled 'y'
  # and the group name in the 'series' variable
  data.frame(year = lakeWAplanktonTrans[, 'Year'],
             month = lakeWAplanktonTrans[, 'Month'],
             y = lakeWAplanktonTrans[, x],
             series = x,
             temp = lakeWAplanktonTrans[, 'Temp'])})) %>%
  
  # change the 'series' label to a factor
  dplyr::mutate(series = factor(series)) %>%
  
  # filter to only include some years in the data
  dplyr::filter(year >= 1965 & year < 1975) %>%
  dplyr::arrange(year, month) %>%
  dplyr::group_by(series) %>%
  
  # z-score the counts so they are approximately standard normal
  dplyr::mutate(y = as.vector(scale(y))) %>%
  
  # add the time indicator
  dplyr::mutate(time = dplyr::row_number()) %>%
  dplyr::ungroup()

Inspect the data structure

head(plankton_data)
#> # A tibble: 6 × 6
#>    year month       y series       temp  time
#>   <dbl> <dbl>   <dbl> <fct>       <dbl> <int>
#> 1  1965     1 -0.542  Greens      -1.23     1
#> 2  1965     1 -0.344  Bluegreens  -1.23     1
#> 3  1965     1 -0.0768 Diatoms     -1.23     1
#> 4  1965     1 -1.52   Unicells    -1.23     1
#> 5  1965     1 -0.491  Other.algae -1.23     1
#> 6  1965     2 NA      Greens      -1.32     2
dplyr::glimpse(plankton_data)
#> Rows: 600
#> Columns: 6
#> $ year   <dbl> 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196…
#> $ month  <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …
#> $ y      <dbl> -0.54241769, -0.34410776, -0.07684901, -1.52243490, -0.49055442…
#> $ series <fct> Greens, Bluegreens, Diatoms, Unicells, Other.algae, Greens, Blu…
#> $ temp   <dbl> -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.…
#> $ time   <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …

Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables)

plot_mvgam_series(data = plankton_data, series = 'all')

We have some missing observations, but this isn’t an issue for modelling in mvgam. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month:

plankton_data %>%
  dplyr::filter(series == 'Other.algae') %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y), col = 'white',
            size = 1.3) +
  geom_line(aes(y = y), col = 'darkred',
            size = 1.1) +
  ylab('z-score') +
  xlab('Time') +
  ggtitle('Temperature (black) vs Other algae (red)')

plankton_data %>%
  dplyr::filter(series == 'Diatoms') %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y), col = 'white',
            size = 1.3) +
  geom_line(aes(y = y), col = 'darkred',
            size = 1.1) +
  ylab('z-score') +
  xlab('Time') +
  ggtitle('Temperature (black) vs Diatoms (red)')

We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits:

plankton_train <- plankton_data %>%
  dplyr::filter(time <= 112)
plankton_test <- plankton_data %>%
  dplyr::filter(time > 112)

Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of mvgam).

Capturing seasonality

First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a “global” tensor product of the month and temp variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group’s seasonality differs from the overall “global” seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0.

notrend_mod <- mvgam(y ~ 
                       # tensor of temp and month to capture
                       # "global" seasonality
                       te(temp, month, k = c(4, 4)) +
                       
                       # series-specific deviation tensor products
                       te(temp, month, k = c(4, 4), by = series) - 1,
                     family = gaussian(),
                     data = plankton_train,
                     newdata = plankton_test,
                     trend_model = 'None')

The “global” tensor product smooth function can be quickly visualized:

plot_mvgam_smooth(notrend_mod, smooth = 1)

On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the “global” pattern:

plot_mvgam_smooth(notrend_mod, smooth = 2)

plot_mvgam_smooth(notrend_mod, smooth = 3)

These multidimensional smooths have done a good job of capturing the seasonal variation in our observations:

plot(notrend_mod, type = 'forecast', series = 1)

plot(notrend_mod, type = 'forecast', series = 2)

plot(notrend_mod, type = 'forecast', series = 3)

This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series:

plot(notrend_mod, type = 'residuals', series = 1)

plot(notrend_mod, type = 'residuals', series = 3)

Multiseries dynamics

Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows:

\[\begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*}\]

Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors \(\sigma_{obs}\)). At present, mvgam does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time \(t\) \((\mu_{process[t]})\) is a vector that evolves as a function of where the vector-valued process model was at time \(t-1\). The \(A\) matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by \(\Sigma_{process}\), which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using Stan’s \(LKJcorr\) distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, mvgam makes use of a recent breakthrough by Sarah Heaps to enforce stationarity of Bayesian VAR processes. This is advantageous as we often don’t expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.


Ok that was a lot to take in. Let’s fit some models to try and inspect what is going on and what they assume. But first, we need to update mvgam’s default priors for the observation and process errors. By default, mvgam uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let’s update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in mvgam:

priors <- get_mvgam_priors(
  # observation formula, which has no terms in it
  y ~ -1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,
  
  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train)

Get names of all parameters whose priors can be modified:

priors[, 3]
#>  [1] "(Intercept)"                                                                                                                                                                                                                                                           
#>  [2] "process error sd"                                                                                                                                                                                                                                                      
#>  [3] "diagonal autocorrelation population mean"                                                                                                                                                                                                                              
#>  [4] "off-diagonal autocorrelation population mean"                                                                                                                                                                                                                          
#>  [5] "diagonal autocorrelation population variance"                                                                                                                                                                                                                          
#>  [6] "off-diagonal autocorrelation population variance"                                                                                                                                                                                                                      
#>  [7] "shape1 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#>  [8] "shape1 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#>  [9] "shape2 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#> [10] "shape2 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#> [11] "observation error sd"                                                                                                                                                                                                                                                  
#> [12] "te(temp,month) smooth parameters, te(temp,month):trendtrend1 smooth parameters, te(temp,month):trendtrend2 smooth parameters, te(temp,month):trendtrend3 smooth parameters, te(temp,month):trendtrend4 smooth parameters, te(temp,month):trendtrend5 smooth parameters"

And their default prior distributions:

priors[, 4]
#>  [1] "(Intercept) ~ student_t(3, -0.1, 2.5);"
#>  [2] "sigma ~ student_t(3, 0, 2.5);"         
#>  [3] "es[1] = 0;"                            
#>  [4] "es[2] = 0;"                            
#>  [5] "fs[1] = sqrt(0.455);"                  
#>  [6] "fs[2] = sqrt(0.455);"                  
#>  [7] "gs[1] = 1.365;"                        
#>  [8] "gs[2] = 1.365;"                        
#>  [9] "hs[1] = 0.071175;"                     
#> [10] "hs[2] = 0.071175;"                     
#> [11] "sigma_obs ~ student_t(3, 0, 2.5);"     
#> [12] "lambda_trend ~ normal(5, 30);"

Setting priors is easy in mvgam as you can use brms routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors:

priors <- c(prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
            prior(normal(0.5, 0.25), class = sigma))

You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. mvgam accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated

var_mod <- mvgam(  
  # observation formula, which is empty
  y ~ -1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,
  
  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  
  # include the updated priors
  priors = priors,
  silent = 2)

Inspecting SS models

This model’s summary is a bit different to other mvgam summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called A in the output) and for the process errors (Sigma). Note that we set include_betas = FALSE to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret:

summary(var_mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> 
#> GAM process formula:
#> ~te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), 
#>     by = trend) - 1
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> VAR()
#> 
#> 
#> N process models:
#> 5 
#> 
#> N series:
#> 5 
#> 
#> N timepoints:
#> 120 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1500; warmup = 1000; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.20 0.25  0.34 1.02   277
#> sigma_obs[2] 0.25 0.40  0.53 1.01   214
#> sigma_obs[3] 0.42 0.63  0.80 1.10    45
#> sigma_obs[4] 0.25 0.37  0.50 1.02   205
#> sigma_obs[5] 0.31 0.43  0.55 1.01   207
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> Process model VAR parameter estimates:
#>          2.5%     50% 97.5% Rhat n_eff
#> A[1,1]  0.620  0.7900 0.920 1.01   331
#> A[1,2] -0.370 -0.1300 0.037 1.01   428
#> A[1,3] -0.140  0.0150 0.180 1.00   621
#> A[1,4] -0.054  0.0600 0.200 1.00   813
#> A[1,5] -0.042  0.1100 0.350 1.00   311
#> A[2,1] -0.530 -0.1900 0.034 1.03   174
#> A[2,2]  0.070  0.4300 0.730 1.01   249
#> A[2,3] -0.340  0.0064 0.330 1.06    74
#> A[2,4] -0.059  0.1300 0.390 1.01   220
#> A[2,5] -0.043  0.2200 0.630 1.03   175
#> A[3,1] -0.380 -0.0230 0.210 1.02   317
#> A[3,2] -0.500 -0.0340 0.340 1.02   228
#> A[3,3] -0.073  0.4700 0.850 1.07    63
#> A[3,4] -0.087  0.1400 0.570 1.03   134
#> A[3,5] -0.280  0.0310 0.460 1.02   418
#> A[4,1] -0.450 -0.1300 0.069 1.03   175
#> A[4,2] -0.690 -0.1700 0.140 1.03   161
#> A[4,3] -0.210  0.0650 0.410 1.03   166
#> A[4,4]  0.530  0.7400 0.960 1.02   268
#> A[4,5] -0.058  0.1900 0.650 1.02   189
#> A[5,1] -0.099  0.0550 0.250 1.00   424
#> A[5,2] -0.410 -0.1100 0.130 1.01   337
#> A[5,3] -0.150  0.0510 0.300 1.01   468
#> A[5,4] -0.210 -0.0340 0.110 1.01   682
#> A[5,5]  0.490  0.7500 0.940 1.01   283
#> 
#> Process error parameter estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> Sigma[1,1] 0.064 0.11  0.18 1.01   357
#> Sigma[1,2] 0.000 0.00  0.00  NaN   NaN
#> Sigma[1,3] 0.000 0.00  0.00  NaN   NaN
#> Sigma[1,4] 0.000 0.00  0.00  NaN   NaN
#> Sigma[1,5] 0.000 0.00  0.00  NaN   NaN
#> Sigma[2,1] 0.000 0.00  0.00  NaN   NaN
#> Sigma[2,2] 0.058 0.16  0.30 1.02   194
#> Sigma[2,3] 0.000 0.00  0.00  NaN   NaN
#> Sigma[2,4] 0.000 0.00  0.00  NaN   NaN
#> Sigma[2,5] 0.000 0.00  0.00  NaN   NaN
#> Sigma[3,1] 0.000 0.00  0.00  NaN   NaN
#> Sigma[3,2] 0.000 0.00  0.00  NaN   NaN
#> Sigma[3,3] 0.062 0.29  0.65 1.12    37
#> Sigma[3,4] 0.000 0.00  0.00  NaN   NaN
#> Sigma[3,5] 0.000 0.00  0.00  NaN   NaN
#> Sigma[4,1] 0.000 0.00  0.00  NaN   NaN
#> Sigma[4,2] 0.000 0.00  0.00  NaN   NaN
#> Sigma[4,3] 0.000 0.00  0.00  NaN   NaN
#> Sigma[4,4] 0.096 0.21  0.35 1.02   176
#> Sigma[4,5] 0.000 0.00  0.00  NaN   NaN
#> Sigma[5,1] 0.000 0.00  0.00  NaN   NaN
#> Sigma[5,2] 0.000 0.00  0.00  NaN   NaN
#> Sigma[5,3] 0.000 0.00  0.00  NaN   NaN
#> Sigma[5,4] 0.000 0.00  0.00  NaN   NaN
#> Sigma[5,5] 0.048 0.13  0.27 1.01   127
#> 
#> Approximate significance of GAM process smooths:
#>                              edf Ref.df Chi.sq p-value
#> te(temp,month)              3.90     15  53.86    0.44
#> te(temp,month):seriestrend1 1.08     15   7.28    1.00
#> te(temp,month):seriestrend2 5.18     15  41.73    0.47
#> te(temp,month):seriestrend3 2.20     15   1.66    1.00
#> te(temp,month):seriestrend4 2.77     15   4.18    1.00
#> te(temp,month):seriestrend5 2.40     15   6.66    0.96
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 13 parameters
#>  *Diagnose further to investigate why the chains have not mixed
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Feb 06 12:21:10 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(var_mod) to get started describing this model

The convergence of this model isn’t fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using trend_effects = TRUE in the plotting functions:

plot(var_mod, 'smooths', trend_effects = TRUE)

The VAR matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately bayesplot doesn’t know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. A little bit of wrangling gives us these histograms in the correct order:

A_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    A_pars[i, j] <- paste0('A[', i, ',', j, ']')
  }
}
mcmc_plot(var_mod, 
          variable = as.vector(t(A_pars)), 
          type = 'hist')

There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an increase in the process for series 3 (Greens) at time \(t\) is expected to impact the process for series 1 (Bluegreens) at time \(t+1\). The latent process model is now capturing these effects and the smooth seasonal effects.

The process error \((\Sigma)\) captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes:

Sigma_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']')
  }
}
mcmc_plot(var_mod, 
          variable = as.vector(t(Sigma_pars)), 
          type = 'hist')

The observation error estimates \((\sigma_{obs})\) represent how much the model thinks we might miss the true count when we take our imperfect measurements:

mcmc_plot(var_mod, variable = 'sigma_obs', regex = TRUE, type = 'hist')

These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations.

Correlated process errors

Let’s see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors:

priors <- c(prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
            prior(normal(0.5, 0.25), class = sigma))

And now we can fit the correlated process error model

varcor_mod <- mvgam(  
  # observation formula, which remains empty
  y ~ -1,
  
  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,
  
  # VAR1 model with correlated process errors
  trend_model = VAR(cor = TRUE),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  
  # include the updated priors
  priors = priors,
  silent = 2)

The \((\Sigma)\) matrix now captures any evidence of contemporaneously correlated process error:

Sigma_pars <- matrix(NA, nrow = 5, ncol = 5)
for(i in 1:5){
  for(j in 1:5){
    Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']')
  }
}
mcmc_plot(varcor_mod, 
          variable = as.vector(t(Sigma_pars)), 
          type = 'hist')

This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations:

Sigma_post <- as.matrix(varcor_mod, variable = 'Sigma', regex = TRUE)
median_correlations <- cov2cor(matrix(apply(Sigma_post, 2, median),
                                      nrow = 5, ncol = 5))
rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series)

round(median_correlations, 2)
#>             Bluegreens Diatoms Greens Other.algae Unicells
#> Bluegreens        1.00   -0.22  -0.04        0.16     0.47
#> Diatoms          -0.22    1.00   0.16        0.47     0.16
#> Greens           -0.04    0.16   1.00        0.34    -0.03
#> Other.algae       0.16    0.47   0.34        1.00     0.28
#> Unicells          0.47    0.16  -0.03        0.28     1.00

Impulse response functions

Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an Impulse Response Function (IRF). If \(h\) represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times \((t+1):h\) if a focal series is given an innovation “shock” at time \(t = 0\). mvgam can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the irf() function and then use the S3 plot() function to view the estimated responses. By default, irf() will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps:

irfs <- irf(varcor_mod, h = 12)

Plot the expected responses of the remaining series to a positive shock for series 3 (Greens)

plot(irfs, series = 3)

This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the \(A\) matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models.

Comparing forecast scores

But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set:

# create forecast objects for each model
fcvar <- forecast(var_mod)
fcvarcor <- forecast(varcor_mod)

# plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = 'variogram')$all_series$score -
  score(fcvar, score = 'variogram')$all_series$score
plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', 
     ylim = c(-1*max(abs(diff_scores), na.rm = TRUE),
              max(abs(diff_scores), na.rm = TRUE)),
     bty = 'l',
     xlab = 'Forecast horizon',
     ylab = expression(variogram[VAR1cor]~-~variogram[VAR1]))
abline(h = 0, lty = 'dashed')

And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated:

# plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = 'energy')$all_series$score -
  score(fcvar, score = 'energy')$all_series$score
plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', 
     ylim = c(-1*max(abs(diff_scores), na.rm = TRUE),
              max(abs(diff_scores), na.rm = TRUE)),
     bty = 'l',
     xlab = 'Forecast horizon',
     ylab = expression(energy[VAR1cor]~-~energy[VAR1]))
abline(h = 0, lty = 'dashed')

The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. mvgam offers some utilities for doing this (i.e. see ?lfo_cv for guidance). Alternatively, we could use forecasts from both models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the ensemble() function in mvgam (see ?ensemble for guidance).

Further reading

The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice:

Auger‐Méthé, Marie, et al. “A guide to state–space modeling of ecological time series.Ecological Monographs 91.4 (2021): e01470.

Heaps, Sarah E. “Enforcing stationarity through the prior in vector autoregressions.Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hannaford, Naomi E., et al. “A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.Computational Statistics & Data Analysis 179 (2023): 107659.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Ward, Eric J., et al. “Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.Journal of Applied Ecology 47.1 (2010): 47-56.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: docs/authors.html ================================================ Authors and Citation • mvgam Skip to contents

Authors

  • Nicholas J Clark. Author, maintainer.

  • Sarah Heaps. Contributor.
    VARMA parameterisations

  • Scott Pease. Contributor.
    broom enhancements

  • Matthijs Hollanders. Contributor.
    ggplot visualizations

Citation

Source: inst/CITATION

Clark & Wells (2023). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974

@Article{,
  title = {Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series},
  author = {Nicholas J Clark and Konstans Wells},
  journal = {Methods in Ecology and Evolution},
  year = {2023},
  volume = {14},
  pages = {771-784},
  doi = {10.18637/jss.v100.i05},
  encoding = {UTF-8},
}
================================================ FILE: docs/deps/bootstrap-5.2.2/font.css ================================================ @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 400; font-display: swap; src: url(fonts/KFOmCnqEu92Fr1Me5g.woff) format('woff'); } @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 500; font-display: swap; src: url(fonts/KFOlCnqEu92Fr1MmEU9vAA.woff) format('woff'); } @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 700; font-display: swap; src: url(fonts/KFOlCnqEu92Fr1MmWUlvAA.woff) format('woff'); } ================================================ FILE: docs/deps/bootstrap-5.3.1/font.css ================================================ @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 400; font-display: swap; src: url(fonts/KFOmCnqEu92Fr1Me5g.woff) format('woff'); } @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 500; font-display: swap; src: url(fonts/KFOlCnqEu92Fr1MmEU9vAA.woff) format('woff'); } @font-face { font-family: 'Roboto'; font-style: normal; font-weight: 700; font-display: swap; src: url(fonts/KFOlCnqEu92Fr1MmWUlvAA.woff) format('woff'); } ================================================ FILE: docs/deps/data-deps.txt ================================================ ================================================ FILE: docs/deps/jquery-3.6.0/jquery-3.6.0.js ================================================ /*! * jQuery JavaScript Library v3.6.0 * https://jquery.com/ * * Includes Sizzle.js * https://sizzlejs.com/ * * Copyright OpenJS Foundation and other contributors * Released under the MIT license * https://jquery.org/license * * Date: 2021-03-02T17:08Z */ ( function( global, factory ) { "use strict"; if ( typeof module === "object" && typeof module.exports === "object" ) { // For CommonJS and CommonJS-like environments where a proper `window` // is present, execute the factory and get jQuery. // For environments that do not have a `window` with a `document` // (such as Node.js), expose a factory as module.exports. // This accentuates the need for the creation of a real `window`. // e.g. var jQuery = require("jquery")(window); // See ticket #14549 for more info. module.exports = global.document ? factory( global, true ) : function( w ) { if ( !w.document ) { throw new Error( "jQuery requires a window with a document" ); } return factory( w ); }; } else { factory( global ); } // Pass this if window is not defined yet } )( typeof window !== "undefined" ? window : this, function( window, noGlobal ) { // Edge <= 12 - 13+, Firefox <=18 - 45+, IE 10 - 11, Safari 5.1 - 9+, iOS 6 - 9.1 // throw exceptions when non-strict code (e.g., ASP.NET 4.5) accesses strict mode // arguments.callee.caller (trac-13335). But as of jQuery 3.0 (2016), strict mode should be common // enough that all such attempts are guarded in a try block. "use strict"; var arr = []; var getProto = Object.getPrototypeOf; var slice = arr.slice; var flat = arr.flat ? function( array ) { return arr.flat.call( array ); } : function( array ) { return arr.concat.apply( [], array ); }; var push = arr.push; var indexOf = arr.indexOf; var class2type = {}; var toString = class2type.toString; var hasOwn = class2type.hasOwnProperty; var fnToString = hasOwn.toString; var ObjectFunctionString = fnToString.call( Object ); var support = {}; var isFunction = function isFunction( obj ) { // Support: Chrome <=57, Firefox <=52 // In some browsers, typeof returns "function" for HTML elements // (i.e., `typeof document.createElement( "object" ) === "function"`). // We don't want to classify *any* DOM node as a function. // Support: QtWeb <=3.8.5, WebKit <=534.34, wkhtmltopdf tool <=0.12.5 // Plus for old WebKit, typeof returns "function" for HTML collections // (e.g., `typeof document.getElementsByTagName("div") === "function"`). (gh-4756) return typeof obj === "function" && typeof obj.nodeType !== "number" && typeof obj.item !== "function"; }; var isWindow = function isWindow( obj ) { return obj != null && obj === obj.window; }; var document = window.document; var preservedScriptAttributes = { type: true, src: true, nonce: true, noModule: true }; function DOMEval( code, node, doc ) { doc = doc || document; var i, val, script = doc.createElement( "script" ); script.text = code; if ( node ) { for ( i in preservedScriptAttributes ) { // Support: Firefox 64+, Edge 18+ // Some browsers don't support the "nonce" property on scripts. // On the other hand, just using `getAttribute` is not enough as // the `nonce` attribute is reset to an empty string whenever it // becomes browsing-context connected. // See https://github.com/whatwg/html/issues/2369 // See https://html.spec.whatwg.org/#nonce-attributes // The `node.getAttribute` check was added for the sake of // `jQuery.globalEval` so that it can fake a nonce-containing node // via an object. val = node[ i ] || node.getAttribute && node.getAttribute( i ); if ( val ) { script.setAttribute( i, val ); } } } doc.head.appendChild( script ).parentNode.removeChild( script ); } function toType( obj ) { if ( obj == null ) { return obj + ""; } // Support: Android <=2.3 only (functionish RegExp) return typeof obj === "object" || typeof obj === "function" ? class2type[ toString.call( obj ) ] || "object" : typeof obj; } /* global Symbol */ // Defining this global in .eslintrc.json would create a danger of using the global // unguarded in another place, it seems safer to define global only for this module var version = "3.6.0", // Define a local copy of jQuery jQuery = function( selector, context ) { // The jQuery object is actually just the init constructor 'enhanced' // Need init if jQuery is called (just allow error to be thrown if not included) return new jQuery.fn.init( selector, context ); }; jQuery.fn = jQuery.prototype = { // The current version of jQuery being used jquery: version, constructor: jQuery, // The default length of a jQuery object is 0 length: 0, toArray: function() { return slice.call( this ); }, // Get the Nth element in the matched element set OR // Get the whole matched element set as a clean array get: function( num ) { // Return all the elements in a clean array if ( num == null ) { return slice.call( this ); } // Return just the one element from the set return num < 0 ? this[ num + this.length ] : this[ num ]; }, // Take an array of elements and push it onto the stack // (returning the new matched element set) pushStack: function( elems ) { // Build a new jQuery matched element set var ret = jQuery.merge( this.constructor(), elems ); // Add the old object onto the stack (as a reference) ret.prevObject = this; // Return the newly-formed element set return ret; }, // Execute a callback for every element in the matched set. each: function( callback ) { return jQuery.each( this, callback ); }, map: function( callback ) { return this.pushStack( jQuery.map( this, function( elem, i ) { return callback.call( elem, i, elem ); } ) ); }, slice: function() { return this.pushStack( slice.apply( this, arguments ) ); }, first: function() { return this.eq( 0 ); }, last: function() { return this.eq( -1 ); }, even: function() { return this.pushStack( jQuery.grep( this, function( _elem, i ) { return ( i + 1 ) % 2; } ) ); }, odd: function() { return this.pushStack( jQuery.grep( this, function( _elem, i ) { return i % 2; } ) ); }, eq: function( i ) { var len = this.length, j = +i + ( i < 0 ? len : 0 ); return this.pushStack( j >= 0 && j < len ? [ this[ j ] ] : [] ); }, end: function() { return this.prevObject || this.constructor(); }, // For internal use only. // Behaves like an Array's method, not like a jQuery method. push: push, sort: arr.sort, splice: arr.splice }; jQuery.extend = jQuery.fn.extend = function() { var options, name, src, copy, copyIsArray, clone, target = arguments[ 0 ] || {}, i = 1, length = arguments.length, deep = false; // Handle a deep copy situation if ( typeof target === "boolean" ) { deep = target; // Skip the boolean and the target target = arguments[ i ] || {}; i++; } // Handle case when target is a string or something (possible in deep copy) if ( typeof target !== "object" && !isFunction( target ) ) { target = {}; } // Extend jQuery itself if only one argument is passed if ( i === length ) { target = this; i--; } for ( ; i < length; i++ ) { // Only deal with non-null/undefined values if ( ( options = arguments[ i ] ) != null ) { // Extend the base object for ( name in options ) { copy = options[ name ]; // Prevent Object.prototype pollution // Prevent never-ending loop if ( name === "__proto__" || target === copy ) { continue; } // Recurse if we're merging plain objects or arrays if ( deep && copy && ( jQuery.isPlainObject( copy ) || ( copyIsArray = Array.isArray( copy ) ) ) ) { src = target[ name ]; // Ensure proper type for the source value if ( copyIsArray && !Array.isArray( src ) ) { clone = []; } else if ( !copyIsArray && !jQuery.isPlainObject( src ) ) { clone = {}; } else { clone = src; } copyIsArray = false; // Never move original objects, clone them target[ name ] = jQuery.extend( deep, clone, copy ); // Don't bring in undefined values } else if ( copy !== undefined ) { target[ name ] = copy; } } } } // Return the modified object return target; }; jQuery.extend( { // Unique for each copy of jQuery on the page expando: "jQuery" + ( version + Math.random() ).replace( /\D/g, "" ), // Assume jQuery is ready without the ready module isReady: true, error: function( msg ) { throw new Error( msg ); }, noop: function() {}, isPlainObject: function( obj ) { var proto, Ctor; // Detect obvious negatives // Use toString instead of jQuery.type to catch host objects if ( !obj || toString.call( obj ) !== "[object Object]" ) { return false; } proto = getProto( obj ); // Objects with no prototype (e.g., `Object.create( null )`) are plain if ( !proto ) { return true; } // Objects with prototype are plain iff they were constructed by a global Object function Ctor = hasOwn.call( proto, "constructor" ) && proto.constructor; return typeof Ctor === "function" && fnToString.call( Ctor ) === ObjectFunctionString; }, isEmptyObject: function( obj ) { var name; for ( name in obj ) { return false; } return true; }, // Evaluates a script in a provided context; falls back to the global one // if not specified. globalEval: function( code, options, doc ) { DOMEval( code, { nonce: options && options.nonce }, doc ); }, each: function( obj, callback ) { var length, i = 0; if ( isArrayLike( obj ) ) { length = obj.length; for ( ; i < length; i++ ) { if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { break; } } } else { for ( i in obj ) { if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { break; } } } return obj; }, // results is for internal usage only makeArray: function( arr, results ) { var ret = results || []; if ( arr != null ) { if ( isArrayLike( Object( arr ) ) ) { jQuery.merge( ret, typeof arr === "string" ? [ arr ] : arr ); } else { push.call( ret, arr ); } } return ret; }, inArray: function( elem, arr, i ) { return arr == null ? -1 : indexOf.call( arr, elem, i ); }, // Support: Android <=4.0 only, PhantomJS 1 only // push.apply(_, arraylike) throws on ancient WebKit merge: function( first, second ) { var len = +second.length, j = 0, i = first.length; for ( ; j < len; j++ ) { first[ i++ ] = second[ j ]; } first.length = i; return first; }, grep: function( elems, callback, invert ) { var callbackInverse, matches = [], i = 0, length = elems.length, callbackExpect = !invert; // Go through the array, only saving the items // that pass the validator function for ( ; i < length; i++ ) { callbackInverse = !callback( elems[ i ], i ); if ( callbackInverse !== callbackExpect ) { matches.push( elems[ i ] ); } } return matches; }, // arg is for internal usage only map: function( elems, callback, arg ) { var length, value, i = 0, ret = []; // Go through the array, translating each of the items to their new values if ( isArrayLike( elems ) ) { length = elems.length; for ( ; i < length; i++ ) { value = callback( elems[ i ], i, arg ); if ( value != null ) { ret.push( value ); } } // Go through every key on the object, } else { for ( i in elems ) { value = callback( elems[ i ], i, arg ); if ( value != null ) { ret.push( value ); } } } // Flatten any nested arrays return flat( ret ); }, // A global GUID counter for objects guid: 1, // jQuery.support is not used in Core but other projects attach their // properties to it so it needs to exist. support: support } ); if ( typeof Symbol === "function" ) { jQuery.fn[ Symbol.iterator ] = arr[ Symbol.iterator ]; } // Populate the class2type map jQuery.each( "Boolean Number String Function Array Date RegExp Object Error Symbol".split( " " ), function( _i, name ) { class2type[ "[object " + name + "]" ] = name.toLowerCase(); } ); function isArrayLike( obj ) { // Support: real iOS 8.2 only (not reproducible in simulator) // `in` check used to prevent JIT error (gh-2145) // hasOwn isn't used here due to false negatives // regarding Nodelist length in IE var length = !!obj && "length" in obj && obj.length, type = toType( obj ); if ( isFunction( obj ) || isWindow( obj ) ) { return false; } return type === "array" || length === 0 || typeof length === "number" && length > 0 && ( length - 1 ) in obj; } var Sizzle = /*! * Sizzle CSS Selector Engine v2.3.6 * https://sizzlejs.com/ * * Copyright JS Foundation and other contributors * Released under the MIT license * https://js.foundation/ * * Date: 2021-02-16 */ ( function( window ) { var i, support, Expr, getText, isXML, tokenize, compile, select, outermostContext, sortInput, hasDuplicate, // Local document vars setDocument, document, docElem, documentIsHTML, rbuggyQSA, rbuggyMatches, matches, contains, // Instance-specific data expando = "sizzle" + 1 * new Date(), preferredDoc = window.document, dirruns = 0, done = 0, classCache = createCache(), tokenCache = createCache(), compilerCache = createCache(), nonnativeSelectorCache = createCache(), sortOrder = function( a, b ) { if ( a === b ) { hasDuplicate = true; } return 0; }, // Instance methods hasOwn = ( {} ).hasOwnProperty, arr = [], pop = arr.pop, pushNative = arr.push, push = arr.push, slice = arr.slice, // Use a stripped-down indexOf as it's faster than native // https://jsperf.com/thor-indexof-vs-for/5 indexOf = function( list, elem ) { var i = 0, len = list.length; for ( ; i < len; i++ ) { if ( list[ i ] === elem ) { return i; } } return -1; }, booleans = "checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|" + "ismap|loop|multiple|open|readonly|required|scoped", // Regular expressions // http://www.w3.org/TR/css3-selectors/#whitespace whitespace = "[\\x20\\t\\r\\n\\f]", // https://www.w3.org/TR/css-syntax-3/#ident-token-diagram identifier = "(?:\\\\[\\da-fA-F]{1,6}" + whitespace + "?|\\\\[^\\r\\n\\f]|[\\w-]|[^\0-\\x7f])+", // Attribute selectors: http://www.w3.org/TR/selectors/#attribute-selectors attributes = "\\[" + whitespace + "*(" + identifier + ")(?:" + whitespace + // Operator (capture 2) "*([*^$|!~]?=)" + whitespace + // "Attribute values must be CSS identifiers [capture 5] // or strings [capture 3 or capture 4]" "*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|(" + identifier + "))|)" + whitespace + "*\\]", pseudos = ":(" + identifier + ")(?:\\((" + // To reduce the number of selectors needing tokenize in the preFilter, prefer arguments: // 1. quoted (capture 3; capture 4 or capture 5) "('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|" + // 2. simple (capture 6) "((?:\\\\.|[^\\\\()[\\]]|" + attributes + ")*)|" + // 3. anything else (capture 2) ".*" + ")\\)|)", // Leading and non-escaped trailing whitespace, capturing some non-whitespace characters preceding the latter rwhitespace = new RegExp( whitespace + "+", "g" ), rtrim = new RegExp( "^" + whitespace + "+|((?:^|[^\\\\])(?:\\\\.)*)" + whitespace + "+$", "g" ), rcomma = new RegExp( "^" + whitespace + "*," + whitespace + "*" ), rcombinators = new RegExp( "^" + whitespace + "*([>+~]|" + whitespace + ")" + whitespace + "*" ), rdescend = new RegExp( whitespace + "|>" ), rpseudo = new RegExp( pseudos ), ridentifier = new RegExp( "^" + identifier + "$" ), matchExpr = { "ID": new RegExp( "^#(" + identifier + ")" ), "CLASS": new RegExp( "^\\.(" + identifier + ")" ), "TAG": new RegExp( "^(" + identifier + "|[*])" ), "ATTR": new RegExp( "^" + attributes ), "PSEUDO": new RegExp( "^" + pseudos ), "CHILD": new RegExp( "^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\(" + whitespace + "*(even|odd|(([+-]|)(\\d*)n|)" + whitespace + "*(?:([+-]|)" + whitespace + "*(\\d+)|))" + whitespace + "*\\)|)", "i" ), "bool": new RegExp( "^(?:" + booleans + ")$", "i" ), // For use in libraries implementing .is() // We use this for POS matching in `select` "needsContext": new RegExp( "^" + whitespace + "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" ) }, rhtml = /HTML$/i, rinputs = /^(?:input|select|textarea|button)$/i, rheader = /^h\d$/i, rnative = /^[^{]+\{\s*\[native \w/, // Easily-parseable/retrievable ID or TAG or CLASS selectors rquickExpr = /^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/, rsibling = /[+~]/, // CSS escapes // http://www.w3.org/TR/CSS21/syndata.html#escaped-characters runescape = new RegExp( "\\\\[\\da-fA-F]{1,6}" + whitespace + "?|\\\\([^\\r\\n\\f])", "g" ), funescape = function( escape, nonHex ) { var high = "0x" + escape.slice( 1 ) - 0x10000; return nonHex ? // Strip the backslash prefix from a non-hex escape sequence nonHex : // Replace a hexadecimal escape sequence with the encoded Unicode code point // Support: IE <=11+ // For values outside the Basic Multilingual Plane (BMP), manually construct a // surrogate pair high < 0 ? String.fromCharCode( high + 0x10000 ) : String.fromCharCode( high >> 10 | 0xD800, high & 0x3FF | 0xDC00 ); }, // CSS string/identifier serialization // https://drafts.csswg.org/cssom/#common-serializing-idioms rcssescape = /([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g, fcssescape = function( ch, asCodePoint ) { if ( asCodePoint ) { // U+0000 NULL becomes U+FFFD REPLACEMENT CHARACTER if ( ch === "\0" ) { return "\uFFFD"; } // Control characters and (dependent upon position) numbers get escaped as code points return ch.slice( 0, -1 ) + "\\" + ch.charCodeAt( ch.length - 1 ).toString( 16 ) + " "; } // Other potentially-special ASCII characters get backslash-escaped return "\\" + ch; }, // Used for iframes // See setDocument() // Removing the function wrapper causes a "Permission Denied" // error in IE unloadHandler = function() { setDocument(); }, inDisabledFieldset = addCombinator( function( elem ) { return elem.disabled === true && elem.nodeName.toLowerCase() === "fieldset"; }, { dir: "parentNode", next: "legend" } ); // Optimize for push.apply( _, NodeList ) try { push.apply( ( arr = slice.call( preferredDoc.childNodes ) ), preferredDoc.childNodes ); // Support: Android<4.0 // Detect silently failing push.apply // eslint-disable-next-line no-unused-expressions arr[ preferredDoc.childNodes.length ].nodeType; } catch ( e ) { push = { apply: arr.length ? // Leverage slice if possible function( target, els ) { pushNative.apply( target, slice.call( els ) ); } : // Support: IE<9 // Otherwise append directly function( target, els ) { var j = target.length, i = 0; // Can't trust NodeList.length while ( ( target[ j++ ] = els[ i++ ] ) ) {} target.length = j - 1; } }; } function Sizzle( selector, context, results, seed ) { var m, i, elem, nid, match, groups, newSelector, newContext = context && context.ownerDocument, // nodeType defaults to 9, since context defaults to document nodeType = context ? context.nodeType : 9; results = results || []; // Return early from calls with invalid selector or context if ( typeof selector !== "string" || !selector || nodeType !== 1 && nodeType !== 9 && nodeType !== 11 ) { return results; } // Try to shortcut find operations (as opposed to filters) in HTML documents if ( !seed ) { setDocument( context ); context = context || document; if ( documentIsHTML ) { // If the selector is sufficiently simple, try using a "get*By*" DOM method // (excepting DocumentFragment context, where the methods don't exist) if ( nodeType !== 11 && ( match = rquickExpr.exec( selector ) ) ) { // ID selector if ( ( m = match[ 1 ] ) ) { // Document context if ( nodeType === 9 ) { if ( ( elem = context.getElementById( m ) ) ) { // Support: IE, Opera, Webkit // TODO: identify versions // getElementById can match elements by name instead of ID if ( elem.id === m ) { results.push( elem ); return results; } } else { return results; } // Element context } else { // Support: IE, Opera, Webkit // TODO: identify versions // getElementById can match elements by name instead of ID if ( newContext && ( elem = newContext.getElementById( m ) ) && contains( context, elem ) && elem.id === m ) { results.push( elem ); return results; } } // Type selector } else if ( match[ 2 ] ) { push.apply( results, context.getElementsByTagName( selector ) ); return results; // Class selector } else if ( ( m = match[ 3 ] ) && support.getElementsByClassName && context.getElementsByClassName ) { push.apply( results, context.getElementsByClassName( m ) ); return results; } } // Take advantage of querySelectorAll if ( support.qsa && !nonnativeSelectorCache[ selector + " " ] && ( !rbuggyQSA || !rbuggyQSA.test( selector ) ) && // Support: IE 8 only // Exclude object elements ( nodeType !== 1 || context.nodeName.toLowerCase() !== "object" ) ) { newSelector = selector; newContext = context; // qSA considers elements outside a scoping root when evaluating child or // descendant combinators, which is not what we want. // In such cases, we work around the behavior by prefixing every selector in the // list with an ID selector referencing the scope context. // The technique has to be used as well when a leading combinator is used // as such selectors are not recognized by querySelectorAll. // Thanks to Andrew Dupont for this technique. if ( nodeType === 1 && ( rdescend.test( selector ) || rcombinators.test( selector ) ) ) { // Expand context for sibling selectors newContext = rsibling.test( selector ) && testContext( context.parentNode ) || context; // We can use :scope instead of the ID hack if the browser // supports it & if we're not changing the context. if ( newContext !== context || !support.scope ) { // Capture the context ID, setting it first if necessary if ( ( nid = context.getAttribute( "id" ) ) ) { nid = nid.replace( rcssescape, fcssescape ); } else { context.setAttribute( "id", ( nid = expando ) ); } } // Prefix every selector in the list groups = tokenize( selector ); i = groups.length; while ( i-- ) { groups[ i ] = ( nid ? "#" + nid : ":scope" ) + " " + toSelector( groups[ i ] ); } newSelector = groups.join( "," ); } try { push.apply( results, newContext.querySelectorAll( newSelector ) ); return results; } catch ( qsaError ) { nonnativeSelectorCache( selector, true ); } finally { if ( nid === expando ) { context.removeAttribute( "id" ); } } } } } // All others return select( selector.replace( rtrim, "$1" ), context, results, seed ); } /** * Create key-value caches of limited size * @returns {function(string, object)} Returns the Object data after storing it on itself with * property name the (space-suffixed) string and (if the cache is larger than Expr.cacheLength) * deleting the oldest entry */ function createCache() { var keys = []; function cache( key, value ) { // Use (key + " ") to avoid collision with native prototype properties (see Issue #157) if ( keys.push( key + " " ) > Expr.cacheLength ) { // Only keep the most recent entries delete cache[ keys.shift() ]; } return ( cache[ key + " " ] = value ); } return cache; } /** * Mark a function for special use by Sizzle * @param {Function} fn The function to mark */ function markFunction( fn ) { fn[ expando ] = true; return fn; } /** * Support testing using an element * @param {Function} fn Passed the created element and returns a boolean result */ function assert( fn ) { var el = document.createElement( "fieldset" ); try { return !!fn( el ); } catch ( e ) { return false; } finally { // Remove from its parent by default if ( el.parentNode ) { el.parentNode.removeChild( el ); } // release memory in IE el = null; } } /** * Adds the same handler for all of the specified attrs * @param {String} attrs Pipe-separated list of attributes * @param {Function} handler The method that will be applied */ function addHandle( attrs, handler ) { var arr = attrs.split( "|" ), i = arr.length; while ( i-- ) { Expr.attrHandle[ arr[ i ] ] = handler; } } /** * Checks document order of two siblings * @param {Element} a * @param {Element} b * @returns {Number} Returns less than 0 if a precedes b, greater than 0 if a follows b */ function siblingCheck( a, b ) { var cur = b && a, diff = cur && a.nodeType === 1 && b.nodeType === 1 && a.sourceIndex - b.sourceIndex; // Use IE sourceIndex if available on both nodes if ( diff ) { return diff; } // Check if b follows a if ( cur ) { while ( ( cur = cur.nextSibling ) ) { if ( cur === b ) { return -1; } } } return a ? 1 : -1; } /** * Returns a function to use in pseudos for input types * @param {String} type */ function createInputPseudo( type ) { return function( elem ) { var name = elem.nodeName.toLowerCase(); return name === "input" && elem.type === type; }; } /** * Returns a function to use in pseudos for buttons * @param {String} type */ function createButtonPseudo( type ) { return function( elem ) { var name = elem.nodeName.toLowerCase(); return ( name === "input" || name === "button" ) && elem.type === type; }; } /** * Returns a function to use in pseudos for :enabled/:disabled * @param {Boolean} disabled true for :disabled; false for :enabled */ function createDisabledPseudo( disabled ) { // Known :disabled false positives: fieldset[disabled] > legend:nth-of-type(n+2) :can-disable return function( elem ) { // Only certain elements can match :enabled or :disabled // https://html.spec.whatwg.org/multipage/scripting.html#selector-enabled // https://html.spec.whatwg.org/multipage/scripting.html#selector-disabled if ( "form" in elem ) { // Check for inherited disabledness on relevant non-disabled elements: // * listed form-associated elements in a disabled fieldset // https://html.spec.whatwg.org/multipage/forms.html#category-listed // https://html.spec.whatwg.org/multipage/forms.html#concept-fe-disabled // * option elements in a disabled optgroup // https://html.spec.whatwg.org/multipage/forms.html#concept-option-disabled // All such elements have a "form" property. if ( elem.parentNode && elem.disabled === false ) { // Option elements defer to a parent optgroup if present if ( "label" in elem ) { if ( "label" in elem.parentNode ) { return elem.parentNode.disabled === disabled; } else { return elem.disabled === disabled; } } // Support: IE 6 - 11 // Use the isDisabled shortcut property to check for disabled fieldset ancestors return elem.isDisabled === disabled || // Where there is no isDisabled, check manually /* jshint -W018 */ elem.isDisabled !== !disabled && inDisabledFieldset( elem ) === disabled; } return elem.disabled === disabled; // Try to winnow out elements that can't be disabled before trusting the disabled property. // Some victims get caught in our net (label, legend, menu, track), but it shouldn't // even exist on them, let alone have a boolean value. } else if ( "label" in elem ) { return elem.disabled === disabled; } // Remaining elements are neither :enabled nor :disabled return false; }; } /** * Returns a function to use in pseudos for positionals * @param {Function} fn */ function createPositionalPseudo( fn ) { return markFunction( function( argument ) { argument = +argument; return markFunction( function( seed, matches ) { var j, matchIndexes = fn( [], seed.length, argument ), i = matchIndexes.length; // Match elements found at the specified indexes while ( i-- ) { if ( seed[ ( j = matchIndexes[ i ] ) ] ) { seed[ j ] = !( matches[ j ] = seed[ j ] ); } } } ); } ); } /** * Checks a node for validity as a Sizzle context * @param {Element|Object=} context * @returns {Element|Object|Boolean} The input node if acceptable, otherwise a falsy value */ function testContext( context ) { return context && typeof context.getElementsByTagName !== "undefined" && context; } // Expose support vars for convenience support = Sizzle.support = {}; /** * Detects XML nodes * @param {Element|Object} elem An element or a document * @returns {Boolean} True iff elem is a non-HTML XML node */ isXML = Sizzle.isXML = function( elem ) { var namespace = elem && elem.namespaceURI, docElem = elem && ( elem.ownerDocument || elem ).documentElement; // Support: IE <=8 // Assume HTML when documentElement doesn't yet exist, such as inside loading iframes // https://bugs.jquery.com/ticket/4833 return !rhtml.test( namespace || docElem && docElem.nodeName || "HTML" ); }; /** * Sets document-related variables once based on the current document * @param {Element|Object} [doc] An element or document object to use to set the document * @returns {Object} Returns the current document */ setDocument = Sizzle.setDocument = function( node ) { var hasCompare, subWindow, doc = node ? node.ownerDocument || node : preferredDoc; // Return early if doc is invalid or already selected // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( doc == document || doc.nodeType !== 9 || !doc.documentElement ) { return document; } // Update global variables document = doc; docElem = document.documentElement; documentIsHTML = !isXML( document ); // Support: IE 9 - 11+, Edge 12 - 18+ // Accessing iframe documents after unload throws "permission denied" errors (jQuery #13936) // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( preferredDoc != document && ( subWindow = document.defaultView ) && subWindow.top !== subWindow ) { // Support: IE 11, Edge if ( subWindow.addEventListener ) { subWindow.addEventListener( "unload", unloadHandler, false ); // Support: IE 9 - 10 only } else if ( subWindow.attachEvent ) { subWindow.attachEvent( "onunload", unloadHandler ); } } // Support: IE 8 - 11+, Edge 12 - 18+, Chrome <=16 - 25 only, Firefox <=3.6 - 31 only, // Safari 4 - 5 only, Opera <=11.6 - 12.x only // IE/Edge & older browsers don't support the :scope pseudo-class. // Support: Safari 6.0 only // Safari 6.0 supports :scope but it's an alias of :root there. support.scope = assert( function( el ) { docElem.appendChild( el ).appendChild( document.createElement( "div" ) ); return typeof el.querySelectorAll !== "undefined" && !el.querySelectorAll( ":scope fieldset div" ).length; } ); /* Attributes ---------------------------------------------------------------------- */ // Support: IE<8 // Verify that getAttribute really returns attributes and not properties // (excepting IE8 booleans) support.attributes = assert( function( el ) { el.className = "i"; return !el.getAttribute( "className" ); } ); /* getElement(s)By* ---------------------------------------------------------------------- */ // Check if getElementsByTagName("*") returns only elements support.getElementsByTagName = assert( function( el ) { el.appendChild( document.createComment( "" ) ); return !el.getElementsByTagName( "*" ).length; } ); // Support: IE<9 support.getElementsByClassName = rnative.test( document.getElementsByClassName ); // Support: IE<10 // Check if getElementById returns elements by name // The broken getElementById methods don't pick up programmatically-set names, // so use a roundabout getElementsByName test support.getById = assert( function( el ) { docElem.appendChild( el ).id = expando; return !document.getElementsByName || !document.getElementsByName( expando ).length; } ); // ID filter and find if ( support.getById ) { Expr.filter[ "ID" ] = function( id ) { var attrId = id.replace( runescape, funescape ); return function( elem ) { return elem.getAttribute( "id" ) === attrId; }; }; Expr.find[ "ID" ] = function( id, context ) { if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { var elem = context.getElementById( id ); return elem ? [ elem ] : []; } }; } else { Expr.filter[ "ID" ] = function( id ) { var attrId = id.replace( runescape, funescape ); return function( elem ) { var node = typeof elem.getAttributeNode !== "undefined" && elem.getAttributeNode( "id" ); return node && node.value === attrId; }; }; // Support: IE 6 - 7 only // getElementById is not reliable as a find shortcut Expr.find[ "ID" ] = function( id, context ) { if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { var node, i, elems, elem = context.getElementById( id ); if ( elem ) { // Verify the id attribute node = elem.getAttributeNode( "id" ); if ( node && node.value === id ) { return [ elem ]; } // Fall back on getElementsByName elems = context.getElementsByName( id ); i = 0; while ( ( elem = elems[ i++ ] ) ) { node = elem.getAttributeNode( "id" ); if ( node && node.value === id ) { return [ elem ]; } } } return []; } }; } // Tag Expr.find[ "TAG" ] = support.getElementsByTagName ? function( tag, context ) { if ( typeof context.getElementsByTagName !== "undefined" ) { return context.getElementsByTagName( tag ); // DocumentFragment nodes don't have gEBTN } else if ( support.qsa ) { return context.querySelectorAll( tag ); } } : function( tag, context ) { var elem, tmp = [], i = 0, // By happy coincidence, a (broken) gEBTN appears on DocumentFragment nodes too results = context.getElementsByTagName( tag ); // Filter out possible comments if ( tag === "*" ) { while ( ( elem = results[ i++ ] ) ) { if ( elem.nodeType === 1 ) { tmp.push( elem ); } } return tmp; } return results; }; // Class Expr.find[ "CLASS" ] = support.getElementsByClassName && function( className, context ) { if ( typeof context.getElementsByClassName !== "undefined" && documentIsHTML ) { return context.getElementsByClassName( className ); } }; /* QSA/matchesSelector ---------------------------------------------------------------------- */ // QSA and matchesSelector support // matchesSelector(:active) reports false when true (IE9/Opera 11.5) rbuggyMatches = []; // qSa(:focus) reports false when true (Chrome 21) // We allow this because of a bug in IE8/9 that throws an error // whenever `document.activeElement` is accessed on an iframe // So, we allow :focus to pass through QSA all the time to avoid the IE error // See https://bugs.jquery.com/ticket/13378 rbuggyQSA = []; if ( ( support.qsa = rnative.test( document.querySelectorAll ) ) ) { // Build QSA regex // Regex strategy adopted from Diego Perini assert( function( el ) { var input; // Select is set to empty string on purpose // This is to test IE's treatment of not explicitly // setting a boolean content attribute, // since its presence should be enough // https://bugs.jquery.com/ticket/12359 docElem.appendChild( el ).innerHTML = "" + ""; // Support: IE8, Opera 11-12.16 // Nothing should be selected when empty strings follow ^= or $= or *= // The test attribute must be unknown in Opera but "safe" for WinRT // https://msdn.microsoft.com/en-us/library/ie/hh465388.aspx#attribute_section if ( el.querySelectorAll( "[msallowcapture^='']" ).length ) { rbuggyQSA.push( "[*^$]=" + whitespace + "*(?:''|\"\")" ); } // Support: IE8 // Boolean attributes and "value" are not treated correctly if ( !el.querySelectorAll( "[selected]" ).length ) { rbuggyQSA.push( "\\[" + whitespace + "*(?:value|" + booleans + ")" ); } // Support: Chrome<29, Android<4.4, Safari<7.0+, iOS<7.0+, PhantomJS<1.9.8+ if ( !el.querySelectorAll( "[id~=" + expando + "-]" ).length ) { rbuggyQSA.push( "~=" ); } // Support: IE 11+, Edge 15 - 18+ // IE 11/Edge don't find elements on a `[name='']` query in some cases. // Adding a temporary attribute to the document before the selection works // around the issue. // Interestingly, IE 10 & older don't seem to have the issue. input = document.createElement( "input" ); input.setAttribute( "name", "" ); el.appendChild( input ); if ( !el.querySelectorAll( "[name='']" ).length ) { rbuggyQSA.push( "\\[" + whitespace + "*name" + whitespace + "*=" + whitespace + "*(?:''|\"\")" ); } // Webkit/Opera - :checked should return selected option elements // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked // IE8 throws error here and will not see later tests if ( !el.querySelectorAll( ":checked" ).length ) { rbuggyQSA.push( ":checked" ); } // Support: Safari 8+, iOS 8+ // https://bugs.webkit.org/show_bug.cgi?id=136851 // In-page `selector#id sibling-combinator selector` fails if ( !el.querySelectorAll( "a#" + expando + "+*" ).length ) { rbuggyQSA.push( ".#.+[+~]" ); } // Support: Firefox <=3.6 - 5 only // Old Firefox doesn't throw on a badly-escaped identifier. el.querySelectorAll( "\\\f" ); rbuggyQSA.push( "[\\r\\n\\f]" ); } ); assert( function( el ) { el.innerHTML = "" + ""; // Support: Windows 8 Native Apps // The type and name attributes are restricted during .innerHTML assignment var input = document.createElement( "input" ); input.setAttribute( "type", "hidden" ); el.appendChild( input ).setAttribute( "name", "D" ); // Support: IE8 // Enforce case-sensitivity of name attribute if ( el.querySelectorAll( "[name=d]" ).length ) { rbuggyQSA.push( "name" + whitespace + "*[*^$|!~]?=" ); } // FF 3.5 - :enabled/:disabled and hidden elements (hidden elements are still enabled) // IE8 throws error here and will not see later tests if ( el.querySelectorAll( ":enabled" ).length !== 2 ) { rbuggyQSA.push( ":enabled", ":disabled" ); } // Support: IE9-11+ // IE's :disabled selector does not pick up the children of disabled fieldsets docElem.appendChild( el ).disabled = true; if ( el.querySelectorAll( ":disabled" ).length !== 2 ) { rbuggyQSA.push( ":enabled", ":disabled" ); } // Support: Opera 10 - 11 only // Opera 10-11 does not throw on post-comma invalid pseudos el.querySelectorAll( "*,:x" ); rbuggyQSA.push( ",.*:" ); } ); } if ( ( support.matchesSelector = rnative.test( ( matches = docElem.matches || docElem.webkitMatchesSelector || docElem.mozMatchesSelector || docElem.oMatchesSelector || docElem.msMatchesSelector ) ) ) ) { assert( function( el ) { // Check to see if it's possible to do matchesSelector // on a disconnected node (IE 9) support.disconnectedMatch = matches.call( el, "*" ); // This should fail with an exception // Gecko does not error, returns false instead matches.call( el, "[s!='']:x" ); rbuggyMatches.push( "!=", pseudos ); } ); } rbuggyQSA = rbuggyQSA.length && new RegExp( rbuggyQSA.join( "|" ) ); rbuggyMatches = rbuggyMatches.length && new RegExp( rbuggyMatches.join( "|" ) ); /* Contains ---------------------------------------------------------------------- */ hasCompare = rnative.test( docElem.compareDocumentPosition ); // Element contains another // Purposefully self-exclusive // As in, an element does not contain itself contains = hasCompare || rnative.test( docElem.contains ) ? function( a, b ) { var adown = a.nodeType === 9 ? a.documentElement : a, bup = b && b.parentNode; return a === bup || !!( bup && bup.nodeType === 1 && ( adown.contains ? adown.contains( bup ) : a.compareDocumentPosition && a.compareDocumentPosition( bup ) & 16 ) ); } : function( a, b ) { if ( b ) { while ( ( b = b.parentNode ) ) { if ( b === a ) { return true; } } } return false; }; /* Sorting ---------------------------------------------------------------------- */ // Document order sorting sortOrder = hasCompare ? function( a, b ) { // Flag for duplicate removal if ( a === b ) { hasDuplicate = true; return 0; } // Sort on method existence if only one input has compareDocumentPosition var compare = !a.compareDocumentPosition - !b.compareDocumentPosition; if ( compare ) { return compare; } // Calculate position if both inputs belong to the same document // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq compare = ( a.ownerDocument || a ) == ( b.ownerDocument || b ) ? a.compareDocumentPosition( b ) : // Otherwise we know they are disconnected 1; // Disconnected nodes if ( compare & 1 || ( !support.sortDetached && b.compareDocumentPosition( a ) === compare ) ) { // Choose the first element that is related to our preferred document // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( a == document || a.ownerDocument == preferredDoc && contains( preferredDoc, a ) ) { return -1; } // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( b == document || b.ownerDocument == preferredDoc && contains( preferredDoc, b ) ) { return 1; } // Maintain original order return sortInput ? ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : 0; } return compare & 4 ? -1 : 1; } : function( a, b ) { // Exit early if the nodes are identical if ( a === b ) { hasDuplicate = true; return 0; } var cur, i = 0, aup = a.parentNode, bup = b.parentNode, ap = [ a ], bp = [ b ]; // Parentless nodes are either documents or disconnected if ( !aup || !bup ) { // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. /* eslint-disable eqeqeq */ return a == document ? -1 : b == document ? 1 : /* eslint-enable eqeqeq */ aup ? -1 : bup ? 1 : sortInput ? ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : 0; // If the nodes are siblings, we can do a quick check } else if ( aup === bup ) { return siblingCheck( a, b ); } // Otherwise we need full lists of their ancestors for comparison cur = a; while ( ( cur = cur.parentNode ) ) { ap.unshift( cur ); } cur = b; while ( ( cur = cur.parentNode ) ) { bp.unshift( cur ); } // Walk down the tree looking for a discrepancy while ( ap[ i ] === bp[ i ] ) { i++; } return i ? // Do a sibling check if the nodes have a common ancestor siblingCheck( ap[ i ], bp[ i ] ) : // Otherwise nodes in our document sort first // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. /* eslint-disable eqeqeq */ ap[ i ] == preferredDoc ? -1 : bp[ i ] == preferredDoc ? 1 : /* eslint-enable eqeqeq */ 0; }; return document; }; Sizzle.matches = function( expr, elements ) { return Sizzle( expr, null, null, elements ); }; Sizzle.matchesSelector = function( elem, expr ) { setDocument( elem ); if ( support.matchesSelector && documentIsHTML && !nonnativeSelectorCache[ expr + " " ] && ( !rbuggyMatches || !rbuggyMatches.test( expr ) ) && ( !rbuggyQSA || !rbuggyQSA.test( expr ) ) ) { try { var ret = matches.call( elem, expr ); // IE 9's matchesSelector returns false on disconnected nodes if ( ret || support.disconnectedMatch || // As well, disconnected nodes are said to be in a document // fragment in IE 9 elem.document && elem.document.nodeType !== 11 ) { return ret; } } catch ( e ) { nonnativeSelectorCache( expr, true ); } } return Sizzle( expr, document, null, [ elem ] ).length > 0; }; Sizzle.contains = function( context, elem ) { // Set document vars if needed // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( ( context.ownerDocument || context ) != document ) { setDocument( context ); } return contains( context, elem ); }; Sizzle.attr = function( elem, name ) { // Set document vars if needed // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( ( elem.ownerDocument || elem ) != document ) { setDocument( elem ); } var fn = Expr.attrHandle[ name.toLowerCase() ], // Don't get fooled by Object.prototype properties (jQuery #13807) val = fn && hasOwn.call( Expr.attrHandle, name.toLowerCase() ) ? fn( elem, name, !documentIsHTML ) : undefined; return val !== undefined ? val : support.attributes || !documentIsHTML ? elem.getAttribute( name ) : ( val = elem.getAttributeNode( name ) ) && val.specified ? val.value : null; }; Sizzle.escape = function( sel ) { return ( sel + "" ).replace( rcssescape, fcssescape ); }; Sizzle.error = function( msg ) { throw new Error( "Syntax error, unrecognized expression: " + msg ); }; /** * Document sorting and removing duplicates * @param {ArrayLike} results */ Sizzle.uniqueSort = function( results ) { var elem, duplicates = [], j = 0, i = 0; // Unless we *know* we can detect duplicates, assume their presence hasDuplicate = !support.detectDuplicates; sortInput = !support.sortStable && results.slice( 0 ); results.sort( sortOrder ); if ( hasDuplicate ) { while ( ( elem = results[ i++ ] ) ) { if ( elem === results[ i ] ) { j = duplicates.push( i ); } } while ( j-- ) { results.splice( duplicates[ j ], 1 ); } } // Clear input after sorting to release objects // See https://github.com/jquery/sizzle/pull/225 sortInput = null; return results; }; /** * Utility function for retrieving the text value of an array of DOM nodes * @param {Array|Element} elem */ getText = Sizzle.getText = function( elem ) { var node, ret = "", i = 0, nodeType = elem.nodeType; if ( !nodeType ) { // If no nodeType, this is expected to be an array while ( ( node = elem[ i++ ] ) ) { // Do not traverse comment nodes ret += getText( node ); } } else if ( nodeType === 1 || nodeType === 9 || nodeType === 11 ) { // Use textContent for elements // innerText usage removed for consistency of new lines (jQuery #11153) if ( typeof elem.textContent === "string" ) { return elem.textContent; } else { // Traverse its children for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { ret += getText( elem ); } } } else if ( nodeType === 3 || nodeType === 4 ) { return elem.nodeValue; } // Do not include comment or processing instruction nodes return ret; }; Expr = Sizzle.selectors = { // Can be adjusted by the user cacheLength: 50, createPseudo: markFunction, match: matchExpr, attrHandle: {}, find: {}, relative: { ">": { dir: "parentNode", first: true }, " ": { dir: "parentNode" }, "+": { dir: "previousSibling", first: true }, "~": { dir: "previousSibling" } }, preFilter: { "ATTR": function( match ) { match[ 1 ] = match[ 1 ].replace( runescape, funescape ); // Move the given value to match[3] whether quoted or unquoted match[ 3 ] = ( match[ 3 ] || match[ 4 ] || match[ 5 ] || "" ).replace( runescape, funescape ); if ( match[ 2 ] === "~=" ) { match[ 3 ] = " " + match[ 3 ] + " "; } return match.slice( 0, 4 ); }, "CHILD": function( match ) { /* matches from matchExpr["CHILD"] 1 type (only|nth|...) 2 what (child|of-type) 3 argument (even|odd|\d*|\d*n([+-]\d+)?|...) 4 xn-component of xn+y argument ([+-]?\d*n|) 5 sign of xn-component 6 x of xn-component 7 sign of y-component 8 y of y-component */ match[ 1 ] = match[ 1 ].toLowerCase(); if ( match[ 1 ].slice( 0, 3 ) === "nth" ) { // nth-* requires argument if ( !match[ 3 ] ) { Sizzle.error( match[ 0 ] ); } // numeric x and y parameters for Expr.filter.CHILD // remember that false/true cast respectively to 0/1 match[ 4 ] = +( match[ 4 ] ? match[ 5 ] + ( match[ 6 ] || 1 ) : 2 * ( match[ 3 ] === "even" || match[ 3 ] === "odd" ) ); match[ 5 ] = +( ( match[ 7 ] + match[ 8 ] ) || match[ 3 ] === "odd" ); // other types prohibit arguments } else if ( match[ 3 ] ) { Sizzle.error( match[ 0 ] ); } return match; }, "PSEUDO": function( match ) { var excess, unquoted = !match[ 6 ] && match[ 2 ]; if ( matchExpr[ "CHILD" ].test( match[ 0 ] ) ) { return null; } // Accept quoted arguments as-is if ( match[ 3 ] ) { match[ 2 ] = match[ 4 ] || match[ 5 ] || ""; // Strip excess characters from unquoted arguments } else if ( unquoted && rpseudo.test( unquoted ) && // Get excess from tokenize (recursively) ( excess = tokenize( unquoted, true ) ) && // advance to the next closing parenthesis ( excess = unquoted.indexOf( ")", unquoted.length - excess ) - unquoted.length ) ) { // excess is a negative index match[ 0 ] = match[ 0 ].slice( 0, excess ); match[ 2 ] = unquoted.slice( 0, excess ); } // Return only captures needed by the pseudo filter method (type and argument) return match.slice( 0, 3 ); } }, filter: { "TAG": function( nodeNameSelector ) { var nodeName = nodeNameSelector.replace( runescape, funescape ).toLowerCase(); return nodeNameSelector === "*" ? function() { return true; } : function( elem ) { return elem.nodeName && elem.nodeName.toLowerCase() === nodeName; }; }, "CLASS": function( className ) { var pattern = classCache[ className + " " ]; return pattern || ( pattern = new RegExp( "(^|" + whitespace + ")" + className + "(" + whitespace + "|$)" ) ) && classCache( className, function( elem ) { return pattern.test( typeof elem.className === "string" && elem.className || typeof elem.getAttribute !== "undefined" && elem.getAttribute( "class" ) || "" ); } ); }, "ATTR": function( name, operator, check ) { return function( elem ) { var result = Sizzle.attr( elem, name ); if ( result == null ) { return operator === "!="; } if ( !operator ) { return true; } result += ""; /* eslint-disable max-len */ return operator === "=" ? result === check : operator === "!=" ? result !== check : operator === "^=" ? check && result.indexOf( check ) === 0 : operator === "*=" ? check && result.indexOf( check ) > -1 : operator === "$=" ? check && result.slice( -check.length ) === check : operator === "~=" ? ( " " + result.replace( rwhitespace, " " ) + " " ).indexOf( check ) > -1 : operator === "|=" ? result === check || result.slice( 0, check.length + 1 ) === check + "-" : false; /* eslint-enable max-len */ }; }, "CHILD": function( type, what, _argument, first, last ) { var simple = type.slice( 0, 3 ) !== "nth", forward = type.slice( -4 ) !== "last", ofType = what === "of-type"; return first === 1 && last === 0 ? // Shortcut for :nth-*(n) function( elem ) { return !!elem.parentNode; } : function( elem, _context, xml ) { var cache, uniqueCache, outerCache, node, nodeIndex, start, dir = simple !== forward ? "nextSibling" : "previousSibling", parent = elem.parentNode, name = ofType && elem.nodeName.toLowerCase(), useCache = !xml && !ofType, diff = false; if ( parent ) { // :(first|last|only)-(child|of-type) if ( simple ) { while ( dir ) { node = elem; while ( ( node = node[ dir ] ) ) { if ( ofType ? node.nodeName.toLowerCase() === name : node.nodeType === 1 ) { return false; } } // Reverse direction for :only-* (if we haven't yet done so) start = dir = type === "only" && !start && "nextSibling"; } return true; } start = [ forward ? parent.firstChild : parent.lastChild ]; // non-xml :nth-child(...) stores cache data on `parent` if ( forward && useCache ) { // Seek `elem` from a previously-cached index // ...in a gzip-friendly way node = parent; outerCache = node[ expando ] || ( node[ expando ] = {} ); // Support: IE <9 only // Defend against cloned attroperties (jQuery gh-1709) uniqueCache = outerCache[ node.uniqueID ] || ( outerCache[ node.uniqueID ] = {} ); cache = uniqueCache[ type ] || []; nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; diff = nodeIndex && cache[ 2 ]; node = nodeIndex && parent.childNodes[ nodeIndex ]; while ( ( node = ++nodeIndex && node && node[ dir ] || // Fallback to seeking `elem` from the start ( diff = nodeIndex = 0 ) || start.pop() ) ) { // When found, cache indexes on `parent` and break if ( node.nodeType === 1 && ++diff && node === elem ) { uniqueCache[ type ] = [ dirruns, nodeIndex, diff ]; break; } } } else { // Use previously-cached element index if available if ( useCache ) { // ...in a gzip-friendly way node = elem; outerCache = node[ expando ] || ( node[ expando ] = {} ); // Support: IE <9 only // Defend against cloned attroperties (jQuery gh-1709) uniqueCache = outerCache[ node.uniqueID ] || ( outerCache[ node.uniqueID ] = {} ); cache = uniqueCache[ type ] || []; nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; diff = nodeIndex; } // xml :nth-child(...) // or :nth-last-child(...) or :nth(-last)?-of-type(...) if ( diff === false ) { // Use the same loop as above to seek `elem` from the start while ( ( node = ++nodeIndex && node && node[ dir ] || ( diff = nodeIndex = 0 ) || start.pop() ) ) { if ( ( ofType ? node.nodeName.toLowerCase() === name : node.nodeType === 1 ) && ++diff ) { // Cache the index of each encountered element if ( useCache ) { outerCache = node[ expando ] || ( node[ expando ] = {} ); // Support: IE <9 only // Defend against cloned attroperties (jQuery gh-1709) uniqueCache = outerCache[ node.uniqueID ] || ( outerCache[ node.uniqueID ] = {} ); uniqueCache[ type ] = [ dirruns, diff ]; } if ( node === elem ) { break; } } } } } // Incorporate the offset, then check against cycle size diff -= last; return diff === first || ( diff % first === 0 && diff / first >= 0 ); } }; }, "PSEUDO": function( pseudo, argument ) { // pseudo-class names are case-insensitive // http://www.w3.org/TR/selectors/#pseudo-classes // Prioritize by case sensitivity in case custom pseudos are added with uppercase letters // Remember that setFilters inherits from pseudos var args, fn = Expr.pseudos[ pseudo ] || Expr.setFilters[ pseudo.toLowerCase() ] || Sizzle.error( "unsupported pseudo: " + pseudo ); // The user may use createPseudo to indicate that // arguments are needed to create the filter function // just as Sizzle does if ( fn[ expando ] ) { return fn( argument ); } // But maintain support for old signatures if ( fn.length > 1 ) { args = [ pseudo, pseudo, "", argument ]; return Expr.setFilters.hasOwnProperty( pseudo.toLowerCase() ) ? markFunction( function( seed, matches ) { var idx, matched = fn( seed, argument ), i = matched.length; while ( i-- ) { idx = indexOf( seed, matched[ i ] ); seed[ idx ] = !( matches[ idx ] = matched[ i ] ); } } ) : function( elem ) { return fn( elem, 0, args ); }; } return fn; } }, pseudos: { // Potentially complex pseudos "not": markFunction( function( selector ) { // Trim the selector passed to compile // to avoid treating leading and trailing // spaces as combinators var input = [], results = [], matcher = compile( selector.replace( rtrim, "$1" ) ); return matcher[ expando ] ? markFunction( function( seed, matches, _context, xml ) { var elem, unmatched = matcher( seed, null, xml, [] ), i = seed.length; // Match elements unmatched by `matcher` while ( i-- ) { if ( ( elem = unmatched[ i ] ) ) { seed[ i ] = !( matches[ i ] = elem ); } } } ) : function( elem, _context, xml ) { input[ 0 ] = elem; matcher( input, null, xml, results ); // Don't keep the element (issue #299) input[ 0 ] = null; return !results.pop(); }; } ), "has": markFunction( function( selector ) { return function( elem ) { return Sizzle( selector, elem ).length > 0; }; } ), "contains": markFunction( function( text ) { text = text.replace( runescape, funescape ); return function( elem ) { return ( elem.textContent || getText( elem ) ).indexOf( text ) > -1; }; } ), // "Whether an element is represented by a :lang() selector // is based solely on the element's language value // being equal to the identifier C, // or beginning with the identifier C immediately followed by "-". // The matching of C against the element's language value is performed case-insensitively. // The identifier C does not have to be a valid language name." // http://www.w3.org/TR/selectors/#lang-pseudo "lang": markFunction( function( lang ) { // lang value must be a valid identifier if ( !ridentifier.test( lang || "" ) ) { Sizzle.error( "unsupported lang: " + lang ); } lang = lang.replace( runescape, funescape ).toLowerCase(); return function( elem ) { var elemLang; do { if ( ( elemLang = documentIsHTML ? elem.lang : elem.getAttribute( "xml:lang" ) || elem.getAttribute( "lang" ) ) ) { elemLang = elemLang.toLowerCase(); return elemLang === lang || elemLang.indexOf( lang + "-" ) === 0; } } while ( ( elem = elem.parentNode ) && elem.nodeType === 1 ); return false; }; } ), // Miscellaneous "target": function( elem ) { var hash = window.location && window.location.hash; return hash && hash.slice( 1 ) === elem.id; }, "root": function( elem ) { return elem === docElem; }, "focus": function( elem ) { return elem === document.activeElement && ( !document.hasFocus || document.hasFocus() ) && !!( elem.type || elem.href || ~elem.tabIndex ); }, // Boolean properties "enabled": createDisabledPseudo( false ), "disabled": createDisabledPseudo( true ), "checked": function( elem ) { // In CSS3, :checked should return both checked and selected elements // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked var nodeName = elem.nodeName.toLowerCase(); return ( nodeName === "input" && !!elem.checked ) || ( nodeName === "option" && !!elem.selected ); }, "selected": function( elem ) { // Accessing this property makes selected-by-default // options in Safari work properly if ( elem.parentNode ) { // eslint-disable-next-line no-unused-expressions elem.parentNode.selectedIndex; } return elem.selected === true; }, // Contents "empty": function( elem ) { // http://www.w3.org/TR/selectors/#empty-pseudo // :empty is negated by element (1) or content nodes (text: 3; cdata: 4; entity ref: 5), // but not by others (comment: 8; processing instruction: 7; etc.) // nodeType < 6 works because attributes (2) do not appear as children for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { if ( elem.nodeType < 6 ) { return false; } } return true; }, "parent": function( elem ) { return !Expr.pseudos[ "empty" ]( elem ); }, // Element/input types "header": function( elem ) { return rheader.test( elem.nodeName ); }, "input": function( elem ) { return rinputs.test( elem.nodeName ); }, "button": function( elem ) { var name = elem.nodeName.toLowerCase(); return name === "input" && elem.type === "button" || name === "button"; }, "text": function( elem ) { var attr; return elem.nodeName.toLowerCase() === "input" && elem.type === "text" && // Support: IE<8 // New HTML5 attribute values (e.g., "search") appear with elem.type === "text" ( ( attr = elem.getAttribute( "type" ) ) == null || attr.toLowerCase() === "text" ); }, // Position-in-collection "first": createPositionalPseudo( function() { return [ 0 ]; } ), "last": createPositionalPseudo( function( _matchIndexes, length ) { return [ length - 1 ]; } ), "eq": createPositionalPseudo( function( _matchIndexes, length, argument ) { return [ argument < 0 ? argument + length : argument ]; } ), "even": createPositionalPseudo( function( matchIndexes, length ) { var i = 0; for ( ; i < length; i += 2 ) { matchIndexes.push( i ); } return matchIndexes; } ), "odd": createPositionalPseudo( function( matchIndexes, length ) { var i = 1; for ( ; i < length; i += 2 ) { matchIndexes.push( i ); } return matchIndexes; } ), "lt": createPositionalPseudo( function( matchIndexes, length, argument ) { var i = argument < 0 ? argument + length : argument > length ? length : argument; for ( ; --i >= 0; ) { matchIndexes.push( i ); } return matchIndexes; } ), "gt": createPositionalPseudo( function( matchIndexes, length, argument ) { var i = argument < 0 ? argument + length : argument; for ( ; ++i < length; ) { matchIndexes.push( i ); } return matchIndexes; } ) } }; Expr.pseudos[ "nth" ] = Expr.pseudos[ "eq" ]; // Add button/input type pseudos for ( i in { radio: true, checkbox: true, file: true, password: true, image: true } ) { Expr.pseudos[ i ] = createInputPseudo( i ); } for ( i in { submit: true, reset: true } ) { Expr.pseudos[ i ] = createButtonPseudo( i ); } // Easy API for creating new setFilters function setFilters() {} setFilters.prototype = Expr.filters = Expr.pseudos; Expr.setFilters = new setFilters(); tokenize = Sizzle.tokenize = function( selector, parseOnly ) { var matched, match, tokens, type, soFar, groups, preFilters, cached = tokenCache[ selector + " " ]; if ( cached ) { return parseOnly ? 0 : cached.slice( 0 ); } soFar = selector; groups = []; preFilters = Expr.preFilter; while ( soFar ) { // Comma and first run if ( !matched || ( match = rcomma.exec( soFar ) ) ) { if ( match ) { // Don't consume trailing commas as valid soFar = soFar.slice( match[ 0 ].length ) || soFar; } groups.push( ( tokens = [] ) ); } matched = false; // Combinators if ( ( match = rcombinators.exec( soFar ) ) ) { matched = match.shift(); tokens.push( { value: matched, // Cast descendant combinators to space type: match[ 0 ].replace( rtrim, " " ) } ); soFar = soFar.slice( matched.length ); } // Filters for ( type in Expr.filter ) { if ( ( match = matchExpr[ type ].exec( soFar ) ) && ( !preFilters[ type ] || ( match = preFilters[ type ]( match ) ) ) ) { matched = match.shift(); tokens.push( { value: matched, type: type, matches: match } ); soFar = soFar.slice( matched.length ); } } if ( !matched ) { break; } } // Return the length of the invalid excess // if we're just parsing // Otherwise, throw an error or return tokens return parseOnly ? soFar.length : soFar ? Sizzle.error( selector ) : // Cache the tokens tokenCache( selector, groups ).slice( 0 ); }; function toSelector( tokens ) { var i = 0, len = tokens.length, selector = ""; for ( ; i < len; i++ ) { selector += tokens[ i ].value; } return selector; } function addCombinator( matcher, combinator, base ) { var dir = combinator.dir, skip = combinator.next, key = skip || dir, checkNonElements = base && key === "parentNode", doneName = done++; return combinator.first ? // Check against closest ancestor/preceding element function( elem, context, xml ) { while ( ( elem = elem[ dir ] ) ) { if ( elem.nodeType === 1 || checkNonElements ) { return matcher( elem, context, xml ); } } return false; } : // Check against all ancestor/preceding elements function( elem, context, xml ) { var oldCache, uniqueCache, outerCache, newCache = [ dirruns, doneName ]; // We can't set arbitrary data on XML nodes, so they don't benefit from combinator caching if ( xml ) { while ( ( elem = elem[ dir ] ) ) { if ( elem.nodeType === 1 || checkNonElements ) { if ( matcher( elem, context, xml ) ) { return true; } } } } else { while ( ( elem = elem[ dir ] ) ) { if ( elem.nodeType === 1 || checkNonElements ) { outerCache = elem[ expando ] || ( elem[ expando ] = {} ); // Support: IE <9 only // Defend against cloned attroperties (jQuery gh-1709) uniqueCache = outerCache[ elem.uniqueID ] || ( outerCache[ elem.uniqueID ] = {} ); if ( skip && skip === elem.nodeName.toLowerCase() ) { elem = elem[ dir ] || elem; } else if ( ( oldCache = uniqueCache[ key ] ) && oldCache[ 0 ] === dirruns && oldCache[ 1 ] === doneName ) { // Assign to newCache so results back-propagate to previous elements return ( newCache[ 2 ] = oldCache[ 2 ] ); } else { // Reuse newcache so results back-propagate to previous elements uniqueCache[ key ] = newCache; // A match means we're done; a fail means we have to keep checking if ( ( newCache[ 2 ] = matcher( elem, context, xml ) ) ) { return true; } } } } } return false; }; } function elementMatcher( matchers ) { return matchers.length > 1 ? function( elem, context, xml ) { var i = matchers.length; while ( i-- ) { if ( !matchers[ i ]( elem, context, xml ) ) { return false; } } return true; } : matchers[ 0 ]; } function multipleContexts( selector, contexts, results ) { var i = 0, len = contexts.length; for ( ; i < len; i++ ) { Sizzle( selector, contexts[ i ], results ); } return results; } function condense( unmatched, map, filter, context, xml ) { var elem, newUnmatched = [], i = 0, len = unmatched.length, mapped = map != null; for ( ; i < len; i++ ) { if ( ( elem = unmatched[ i ] ) ) { if ( !filter || filter( elem, context, xml ) ) { newUnmatched.push( elem ); if ( mapped ) { map.push( i ); } } } } return newUnmatched; } function setMatcher( preFilter, selector, matcher, postFilter, postFinder, postSelector ) { if ( postFilter && !postFilter[ expando ] ) { postFilter = setMatcher( postFilter ); } if ( postFinder && !postFinder[ expando ] ) { postFinder = setMatcher( postFinder, postSelector ); } return markFunction( function( seed, results, context, xml ) { var temp, i, elem, preMap = [], postMap = [], preexisting = results.length, // Get initial elements from seed or context elems = seed || multipleContexts( selector || "*", context.nodeType ? [ context ] : context, [] ), // Prefilter to get matcher input, preserving a map for seed-results synchronization matcherIn = preFilter && ( seed || !selector ) ? condense( elems, preMap, preFilter, context, xml ) : elems, matcherOut = matcher ? // If we have a postFinder, or filtered seed, or non-seed postFilter or preexisting results, postFinder || ( seed ? preFilter : preexisting || postFilter ) ? // ...intermediate processing is necessary [] : // ...otherwise use results directly results : matcherIn; // Find primary matches if ( matcher ) { matcher( matcherIn, matcherOut, context, xml ); } // Apply postFilter if ( postFilter ) { temp = condense( matcherOut, postMap ); postFilter( temp, [], context, xml ); // Un-match failing elements by moving them back to matcherIn i = temp.length; while ( i-- ) { if ( ( elem = temp[ i ] ) ) { matcherOut[ postMap[ i ] ] = !( matcherIn[ postMap[ i ] ] = elem ); } } } if ( seed ) { if ( postFinder || preFilter ) { if ( postFinder ) { // Get the final matcherOut by condensing this intermediate into postFinder contexts temp = []; i = matcherOut.length; while ( i-- ) { if ( ( elem = matcherOut[ i ] ) ) { // Restore matcherIn since elem is not yet a final match temp.push( ( matcherIn[ i ] = elem ) ); } } postFinder( null, ( matcherOut = [] ), temp, xml ); } // Move matched elements from seed to results to keep them synchronized i = matcherOut.length; while ( i-- ) { if ( ( elem = matcherOut[ i ] ) && ( temp = postFinder ? indexOf( seed, elem ) : preMap[ i ] ) > -1 ) { seed[ temp ] = !( results[ temp ] = elem ); } } } // Add elements to results, through postFinder if defined } else { matcherOut = condense( matcherOut === results ? matcherOut.splice( preexisting, matcherOut.length ) : matcherOut ); if ( postFinder ) { postFinder( null, results, matcherOut, xml ); } else { push.apply( results, matcherOut ); } } } ); } function matcherFromTokens( tokens ) { var checkContext, matcher, j, len = tokens.length, leadingRelative = Expr.relative[ tokens[ 0 ].type ], implicitRelative = leadingRelative || Expr.relative[ " " ], i = leadingRelative ? 1 : 0, // The foundational matcher ensures that elements are reachable from top-level context(s) matchContext = addCombinator( function( elem ) { return elem === checkContext; }, implicitRelative, true ), matchAnyContext = addCombinator( function( elem ) { return indexOf( checkContext, elem ) > -1; }, implicitRelative, true ), matchers = [ function( elem, context, xml ) { var ret = ( !leadingRelative && ( xml || context !== outermostContext ) ) || ( ( checkContext = context ).nodeType ? matchContext( elem, context, xml ) : matchAnyContext( elem, context, xml ) ); // Avoid hanging onto element (issue #299) checkContext = null; return ret; } ]; for ( ; i < len; i++ ) { if ( ( matcher = Expr.relative[ tokens[ i ].type ] ) ) { matchers = [ addCombinator( elementMatcher( matchers ), matcher ) ]; } else { matcher = Expr.filter[ tokens[ i ].type ].apply( null, tokens[ i ].matches ); // Return special upon seeing a positional matcher if ( matcher[ expando ] ) { // Find the next relative operator (if any) for proper handling j = ++i; for ( ; j < len; j++ ) { if ( Expr.relative[ tokens[ j ].type ] ) { break; } } return setMatcher( i > 1 && elementMatcher( matchers ), i > 1 && toSelector( // If the preceding token was a descendant combinator, insert an implicit any-element `*` tokens .slice( 0, i - 1 ) .concat( { value: tokens[ i - 2 ].type === " " ? "*" : "" } ) ).replace( rtrim, "$1" ), matcher, i < j && matcherFromTokens( tokens.slice( i, j ) ), j < len && matcherFromTokens( ( tokens = tokens.slice( j ) ) ), j < len && toSelector( tokens ) ); } matchers.push( matcher ); } } return elementMatcher( matchers ); } function matcherFromGroupMatchers( elementMatchers, setMatchers ) { var bySet = setMatchers.length > 0, byElement = elementMatchers.length > 0, superMatcher = function( seed, context, xml, results, outermost ) { var elem, j, matcher, matchedCount = 0, i = "0", unmatched = seed && [], setMatched = [], contextBackup = outermostContext, // We must always have either seed elements or outermost context elems = seed || byElement && Expr.find[ "TAG" ]( "*", outermost ), // Use integer dirruns iff this is the outermost matcher dirrunsUnique = ( dirruns += contextBackup == null ? 1 : Math.random() || 0.1 ), len = elems.length; if ( outermost ) { // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq outermostContext = context == document || context || outermost; } // Add elements passing elementMatchers directly to results // Support: IE<9, Safari // Tolerate NodeList properties (IE: "length"; Safari: ) matching elements by id for ( ; i !== len && ( elem = elems[ i ] ) != null; i++ ) { if ( byElement && elem ) { j = 0; // Support: IE 11+, Edge 17 - 18+ // IE/Edge sometimes throw a "Permission denied" error when strict-comparing // two documents; shallow comparisons work. // eslint-disable-next-line eqeqeq if ( !context && elem.ownerDocument != document ) { setDocument( elem ); xml = !documentIsHTML; } while ( ( matcher = elementMatchers[ j++ ] ) ) { if ( matcher( elem, context || document, xml ) ) { results.push( elem ); break; } } if ( outermost ) { dirruns = dirrunsUnique; } } // Track unmatched elements for set filters if ( bySet ) { // They will have gone through all possible matchers if ( ( elem = !matcher && elem ) ) { matchedCount--; } // Lengthen the array for every element, matched or not if ( seed ) { unmatched.push( elem ); } } } // `i` is now the count of elements visited above, and adding it to `matchedCount` // makes the latter nonnegative. matchedCount += i; // Apply set filters to unmatched elements // NOTE: This can be skipped if there are no unmatched elements (i.e., `matchedCount` // equals `i`), unless we didn't visit _any_ elements in the above loop because we have // no element matchers and no seed. // Incrementing an initially-string "0" `i` allows `i` to remain a string only in that // case, which will result in a "00" `matchedCount` that differs from `i` but is also // numerically zero. if ( bySet && i !== matchedCount ) { j = 0; while ( ( matcher = setMatchers[ j++ ] ) ) { matcher( unmatched, setMatched, context, xml ); } if ( seed ) { // Reintegrate element matches to eliminate the need for sorting if ( matchedCount > 0 ) { while ( i-- ) { if ( !( unmatched[ i ] || setMatched[ i ] ) ) { setMatched[ i ] = pop.call( results ); } } } // Discard index placeholder values to get only actual matches setMatched = condense( setMatched ); } // Add matches to results push.apply( results, setMatched ); // Seedless set matches succeeding multiple successful matchers stipulate sorting if ( outermost && !seed && setMatched.length > 0 && ( matchedCount + setMatchers.length ) > 1 ) { Sizzle.uniqueSort( results ); } } // Override manipulation of globals by nested matchers if ( outermost ) { dirruns = dirrunsUnique; outermostContext = contextBackup; } return unmatched; }; return bySet ? markFunction( superMatcher ) : superMatcher; } compile = Sizzle.compile = function( selector, match /* Internal Use Only */ ) { var i, setMatchers = [], elementMatchers = [], cached = compilerCache[ selector + " " ]; if ( !cached ) { // Generate a function of recursive functions that can be used to check each element if ( !match ) { match = tokenize( selector ); } i = match.length; while ( i-- ) { cached = matcherFromTokens( match[ i ] ); if ( cached[ expando ] ) { setMatchers.push( cached ); } else { elementMatchers.push( cached ); } } // Cache the compiled function cached = compilerCache( selector, matcherFromGroupMatchers( elementMatchers, setMatchers ) ); // Save selector and tokenization cached.selector = selector; } return cached; }; /** * A low-level selection function that works with Sizzle's compiled * selector functions * @param {String|Function} selector A selector or a pre-compiled * selector function built with Sizzle.compile * @param {Element} context * @param {Array} [results] * @param {Array} [seed] A set of elements to match against */ select = Sizzle.select = function( selector, context, results, seed ) { var i, tokens, token, type, find, compiled = typeof selector === "function" && selector, match = !seed && tokenize( ( selector = compiled.selector || selector ) ); results = results || []; // Try to minimize operations if there is only one selector in the list and no seed // (the latter of which guarantees us context) if ( match.length === 1 ) { // Reduce context if the leading compound selector is an ID tokens = match[ 0 ] = match[ 0 ].slice( 0 ); if ( tokens.length > 2 && ( token = tokens[ 0 ] ).type === "ID" && context.nodeType === 9 && documentIsHTML && Expr.relative[ tokens[ 1 ].type ] ) { context = ( Expr.find[ "ID" ]( token.matches[ 0 ] .replace( runescape, funescape ), context ) || [] )[ 0 ]; if ( !context ) { return results; // Precompiled matchers will still verify ancestry, so step up a level } else if ( compiled ) { context = context.parentNode; } selector = selector.slice( tokens.shift().value.length ); } // Fetch a seed set for right-to-left matching i = matchExpr[ "needsContext" ].test( selector ) ? 0 : tokens.length; while ( i-- ) { token = tokens[ i ]; // Abort if we hit a combinator if ( Expr.relative[ ( type = token.type ) ] ) { break; } if ( ( find = Expr.find[ type ] ) ) { // Search, expanding context for leading sibling combinators if ( ( seed = find( token.matches[ 0 ].replace( runescape, funescape ), rsibling.test( tokens[ 0 ].type ) && testContext( context.parentNode ) || context ) ) ) { // If seed is empty or no tokens remain, we can return early tokens.splice( i, 1 ); selector = seed.length && toSelector( tokens ); if ( !selector ) { push.apply( results, seed ); return results; } break; } } } } // Compile and execute a filtering function if one is not provided // Provide `match` to avoid retokenization if we modified the selector above ( compiled || compile( selector, match ) )( seed, context, !documentIsHTML, results, !context || rsibling.test( selector ) && testContext( context.parentNode ) || context ); return results; }; // One-time assignments // Sort stability support.sortStable = expando.split( "" ).sort( sortOrder ).join( "" ) === expando; // Support: Chrome 14-35+ // Always assume duplicates if they aren't passed to the comparison function support.detectDuplicates = !!hasDuplicate; // Initialize against the default document setDocument(); // Support: Webkit<537.32 - Safari 6.0.3/Chrome 25 (fixed in Chrome 27) // Detached nodes confoundingly follow *each other* support.sortDetached = assert( function( el ) { // Should return 1, but returns 4 (following) return el.compareDocumentPosition( document.createElement( "fieldset" ) ) & 1; } ); // Support: IE<8 // Prevent attribute/property "interpolation" // https://msdn.microsoft.com/en-us/library/ms536429%28VS.85%29.aspx if ( !assert( function( el ) { el.innerHTML = ""; return el.firstChild.getAttribute( "href" ) === "#"; } ) ) { addHandle( "type|href|height|width", function( elem, name, isXML ) { if ( !isXML ) { return elem.getAttribute( name, name.toLowerCase() === "type" ? 1 : 2 ); } } ); } // Support: IE<9 // Use defaultValue in place of getAttribute("value") if ( !support.attributes || !assert( function( el ) { el.innerHTML = ""; el.firstChild.setAttribute( "value", "" ); return el.firstChild.getAttribute( "value" ) === ""; } ) ) { addHandle( "value", function( elem, _name, isXML ) { if ( !isXML && elem.nodeName.toLowerCase() === "input" ) { return elem.defaultValue; } } ); } // Support: IE<9 // Use getAttributeNode to fetch booleans when getAttribute lies if ( !assert( function( el ) { return el.getAttribute( "disabled" ) == null; } ) ) { addHandle( booleans, function( elem, name, isXML ) { var val; if ( !isXML ) { return elem[ name ] === true ? name.toLowerCase() : ( val = elem.getAttributeNode( name ) ) && val.specified ? val.value : null; } } ); } return Sizzle; } )( window ); jQuery.find = Sizzle; jQuery.expr = Sizzle.selectors; // Deprecated jQuery.expr[ ":" ] = jQuery.expr.pseudos; jQuery.uniqueSort = jQuery.unique = Sizzle.uniqueSort; jQuery.text = Sizzle.getText; jQuery.isXMLDoc = Sizzle.isXML; jQuery.contains = Sizzle.contains; jQuery.escapeSelector = Sizzle.escape; var dir = function( elem, dir, until ) { var matched = [], truncate = until !== undefined; while ( ( elem = elem[ dir ] ) && elem.nodeType !== 9 ) { if ( elem.nodeType === 1 ) { if ( truncate && jQuery( elem ).is( until ) ) { break; } matched.push( elem ); } } return matched; }; var siblings = function( n, elem ) { var matched = []; for ( ; n; n = n.nextSibling ) { if ( n.nodeType === 1 && n !== elem ) { matched.push( n ); } } return matched; }; var rneedsContext = jQuery.expr.match.needsContext; function nodeName( elem, name ) { return elem.nodeName && elem.nodeName.toLowerCase() === name.toLowerCase(); } var rsingleTag = ( /^<([a-z][^\/\0>:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i ); // Implement the identical functionality for filter and not function winnow( elements, qualifier, not ) { if ( isFunction( qualifier ) ) { return jQuery.grep( elements, function( elem, i ) { return !!qualifier.call( elem, i, elem ) !== not; } ); } // Single element if ( qualifier.nodeType ) { return jQuery.grep( elements, function( elem ) { return ( elem === qualifier ) !== not; } ); } // Arraylike of elements (jQuery, arguments, Array) if ( typeof qualifier !== "string" ) { return jQuery.grep( elements, function( elem ) { return ( indexOf.call( qualifier, elem ) > -1 ) !== not; } ); } // Filtered directly for both simple and complex selectors return jQuery.filter( qualifier, elements, not ); } jQuery.filter = function( expr, elems, not ) { var elem = elems[ 0 ]; if ( not ) { expr = ":not(" + expr + ")"; } if ( elems.length === 1 && elem.nodeType === 1 ) { return jQuery.find.matchesSelector( elem, expr ) ? [ elem ] : []; } return jQuery.find.matches( expr, jQuery.grep( elems, function( elem ) { return elem.nodeType === 1; } ) ); }; jQuery.fn.extend( { find: function( selector ) { var i, ret, len = this.length, self = this; if ( typeof selector !== "string" ) { return this.pushStack( jQuery( selector ).filter( function() { for ( i = 0; i < len; i++ ) { if ( jQuery.contains( self[ i ], this ) ) { return true; } } } ) ); } ret = this.pushStack( [] ); for ( i = 0; i < len; i++ ) { jQuery.find( selector, self[ i ], ret ); } return len > 1 ? jQuery.uniqueSort( ret ) : ret; }, filter: function( selector ) { return this.pushStack( winnow( this, selector || [], false ) ); }, not: function( selector ) { return this.pushStack( winnow( this, selector || [], true ) ); }, is: function( selector ) { return !!winnow( this, // If this is a positional/relative selector, check membership in the returned set // so $("p:first").is("p:last") won't return true for a doc with two "p". typeof selector === "string" && rneedsContext.test( selector ) ? jQuery( selector ) : selector || [], false ).length; } } ); // Initialize a jQuery object // A central reference to the root jQuery(document) var rootjQuery, // A simple way to check for HTML strings // Prioritize #id over to avoid XSS via location.hash (#9521) // Strict HTML recognition (#11290: must start with <) // Shortcut simple #id case for speed rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/, init = jQuery.fn.init = function( selector, context, root ) { var match, elem; // HANDLE: $(""), $(null), $(undefined), $(false) if ( !selector ) { return this; } // Method init() accepts an alternate rootjQuery // so migrate can support jQuery.sub (gh-2101) root = root || rootjQuery; // Handle HTML strings if ( typeof selector === "string" ) { if ( selector[ 0 ] === "<" && selector[ selector.length - 1 ] === ">" && selector.length >= 3 ) { // Assume that strings that start and end with <> are HTML and skip the regex check match = [ null, selector, null ]; } else { match = rquickExpr.exec( selector ); } // Match html or make sure no context is specified for #id if ( match && ( match[ 1 ] || !context ) ) { // HANDLE: $(html) -> $(array) if ( match[ 1 ] ) { context = context instanceof jQuery ? context[ 0 ] : context; // Option to run scripts is true for back-compat // Intentionally let the error be thrown if parseHTML is not present jQuery.merge( this, jQuery.parseHTML( match[ 1 ], context && context.nodeType ? context.ownerDocument || context : document, true ) ); // HANDLE: $(html, props) if ( rsingleTag.test( match[ 1 ] ) && jQuery.isPlainObject( context ) ) { for ( match in context ) { // Properties of context are called as methods if possible if ( isFunction( this[ match ] ) ) { this[ match ]( context[ match ] ); // ...and otherwise set as attributes } else { this.attr( match, context[ match ] ); } } } return this; // HANDLE: $(#id) } else { elem = document.getElementById( match[ 2 ] ); if ( elem ) { // Inject the element directly into the jQuery object this[ 0 ] = elem; this.length = 1; } return this; } // HANDLE: $(expr, $(...)) } else if ( !context || context.jquery ) { return ( context || root ).find( selector ); // HANDLE: $(expr, context) // (which is just equivalent to: $(context).find(expr) } else { return this.constructor( context ).find( selector ); } // HANDLE: $(DOMElement) } else if ( selector.nodeType ) { this[ 0 ] = selector; this.length = 1; return this; // HANDLE: $(function) // Shortcut for document ready } else if ( isFunction( selector ) ) { return root.ready !== undefined ? root.ready( selector ) : // Execute immediately if ready is not present selector( jQuery ); } return jQuery.makeArray( selector, this ); }; // Give the init function the jQuery prototype for later instantiation init.prototype = jQuery.fn; // Initialize central reference rootjQuery = jQuery( document ); var rparentsprev = /^(?:parents|prev(?:Until|All))/, // Methods guaranteed to produce a unique set when starting from a unique set guaranteedUnique = { children: true, contents: true, next: true, prev: true }; jQuery.fn.extend( { has: function( target ) { var targets = jQuery( target, this ), l = targets.length; return this.filter( function() { var i = 0; for ( ; i < l; i++ ) { if ( jQuery.contains( this, targets[ i ] ) ) { return true; } } } ); }, closest: function( selectors, context ) { var cur, i = 0, l = this.length, matched = [], targets = typeof selectors !== "string" && jQuery( selectors ); // Positional selectors never match, since there's no _selection_ context if ( !rneedsContext.test( selectors ) ) { for ( ; i < l; i++ ) { for ( cur = this[ i ]; cur && cur !== context; cur = cur.parentNode ) { // Always skip document fragments if ( cur.nodeType < 11 && ( targets ? targets.index( cur ) > -1 : // Don't pass non-elements to Sizzle cur.nodeType === 1 && jQuery.find.matchesSelector( cur, selectors ) ) ) { matched.push( cur ); break; } } } } return this.pushStack( matched.length > 1 ? jQuery.uniqueSort( matched ) : matched ); }, // Determine the position of an element within the set index: function( elem ) { // No argument, return index in parent if ( !elem ) { return ( this[ 0 ] && this[ 0 ].parentNode ) ? this.first().prevAll().length : -1; } // Index in selector if ( typeof elem === "string" ) { return indexOf.call( jQuery( elem ), this[ 0 ] ); } // Locate the position of the desired element return indexOf.call( this, // If it receives a jQuery object, the first element is used elem.jquery ? elem[ 0 ] : elem ); }, add: function( selector, context ) { return this.pushStack( jQuery.uniqueSort( jQuery.merge( this.get(), jQuery( selector, context ) ) ) ); }, addBack: function( selector ) { return this.add( selector == null ? this.prevObject : this.prevObject.filter( selector ) ); } } ); function sibling( cur, dir ) { while ( ( cur = cur[ dir ] ) && cur.nodeType !== 1 ) {} return cur; } jQuery.each( { parent: function( elem ) { var parent = elem.parentNode; return parent && parent.nodeType !== 11 ? parent : null; }, parents: function( elem ) { return dir( elem, "parentNode" ); }, parentsUntil: function( elem, _i, until ) { return dir( elem, "parentNode", until ); }, next: function( elem ) { return sibling( elem, "nextSibling" ); }, prev: function( elem ) { return sibling( elem, "previousSibling" ); }, nextAll: function( elem ) { return dir( elem, "nextSibling" ); }, prevAll: function( elem ) { return dir( elem, "previousSibling" ); }, nextUntil: function( elem, _i, until ) { return dir( elem, "nextSibling", until ); }, prevUntil: function( elem, _i, until ) { return dir( elem, "previousSibling", until ); }, siblings: function( elem ) { return siblings( ( elem.parentNode || {} ).firstChild, elem ); }, children: function( elem ) { return siblings( elem.firstChild ); }, contents: function( elem ) { if ( elem.contentDocument != null && // Support: IE 11+ // elements with no `data` attribute has an object // `contentDocument` with a `null` prototype. getProto( elem.contentDocument ) ) { return elem.contentDocument; } // Support: IE 9 - 11 only, iOS 7 only, Android Browser <=4.3 only // Treat the template element as a regular one in browsers that // don't support it. if ( nodeName( elem, "template" ) ) { elem = elem.content || elem; } return jQuery.merge( [], elem.childNodes ); } }, function( name, fn ) { jQuery.fn[ name ] = function( until, selector ) { var matched = jQuery.map( this, fn, until ); if ( name.slice( -5 ) !== "Until" ) { selector = until; } if ( selector && typeof selector === "string" ) { matched = jQuery.filter( selector, matched ); } if ( this.length > 1 ) { // Remove duplicates if ( !guaranteedUnique[ name ] ) { jQuery.uniqueSort( matched ); } // Reverse order for parents* and prev-derivatives if ( rparentsprev.test( name ) ) { matched.reverse(); } } return this.pushStack( matched ); }; } ); var rnothtmlwhite = ( /[^\x20\t\r\n\f]+/g ); // Convert String-formatted options into Object-formatted ones function createOptions( options ) { var object = {}; jQuery.each( options.match( rnothtmlwhite ) || [], function( _, flag ) { object[ flag ] = true; } ); return object; } /* * Create a callback list using the following parameters: * * options: an optional list of space-separated options that will change how * the callback list behaves or a more traditional option object * * By default a callback list will act like an event callback list and can be * "fired" multiple times. * * Possible options: * * once: will ensure the callback list can only be fired once (like a Deferred) * * memory: will keep track of previous values and will call any callback added * after the list has been fired right away with the latest "memorized" * values (like a Deferred) * * unique: will ensure a callback can only be added once (no duplicate in the list) * * stopOnFalse: interrupt callings when a callback returns false * */ jQuery.Callbacks = function( options ) { // Convert options from String-formatted to Object-formatted if needed // (we check in cache first) options = typeof options === "string" ? createOptions( options ) : jQuery.extend( {}, options ); var // Flag to know if list is currently firing firing, // Last fire value for non-forgettable lists memory, // Flag to know if list was already fired fired, // Flag to prevent firing locked, // Actual callback list list = [], // Queue of execution data for repeatable lists queue = [], // Index of currently firing callback (modified by add/remove as needed) firingIndex = -1, // Fire callbacks fire = function() { // Enforce single-firing locked = locked || options.once; // Execute callbacks for all pending executions, // respecting firingIndex overrides and runtime changes fired = firing = true; for ( ; queue.length; firingIndex = -1 ) { memory = queue.shift(); while ( ++firingIndex < list.length ) { // Run callback and check for early termination if ( list[ firingIndex ].apply( memory[ 0 ], memory[ 1 ] ) === false && options.stopOnFalse ) { // Jump to end and forget the data so .add doesn't re-fire firingIndex = list.length; memory = false; } } } // Forget the data if we're done with it if ( !options.memory ) { memory = false; } firing = false; // Clean up if we're done firing for good if ( locked ) { // Keep an empty list if we have data for future add calls if ( memory ) { list = []; // Otherwise, this object is spent } else { list = ""; } } }, // Actual Callbacks object self = { // Add a callback or a collection of callbacks to the list add: function() { if ( list ) { // If we have memory from a past run, we should fire after adding if ( memory && !firing ) { firingIndex = list.length - 1; queue.push( memory ); } ( function add( args ) { jQuery.each( args, function( _, arg ) { if ( isFunction( arg ) ) { if ( !options.unique || !self.has( arg ) ) { list.push( arg ); } } else if ( arg && arg.length && toType( arg ) !== "string" ) { // Inspect recursively add( arg ); } } ); } )( arguments ); if ( memory && !firing ) { fire(); } } return this; }, // Remove a callback from the list remove: function() { jQuery.each( arguments, function( _, arg ) { var index; while ( ( index = jQuery.inArray( arg, list, index ) ) > -1 ) { list.splice( index, 1 ); // Handle firing indexes if ( index <= firingIndex ) { firingIndex--; } } } ); return this; }, // Check if a given callback is in the list. // If no argument is given, return whether or not list has callbacks attached. has: function( fn ) { return fn ? jQuery.inArray( fn, list ) > -1 : list.length > 0; }, // Remove all callbacks from the list empty: function() { if ( list ) { list = []; } return this; }, // Disable .fire and .add // Abort any current/pending executions // Clear all callbacks and values disable: function() { locked = queue = []; list = memory = ""; return this; }, disabled: function() { return !list; }, // Disable .fire // Also disable .add unless we have memory (since it would have no effect) // Abort any pending executions lock: function() { locked = queue = []; if ( !memory && !firing ) { list = memory = ""; } return this; }, locked: function() { return !!locked; }, // Call all callbacks with the given context and arguments fireWith: function( context, args ) { if ( !locked ) { args = args || []; args = [ context, args.slice ? args.slice() : args ]; queue.push( args ); if ( !firing ) { fire(); } } return this; }, // Call all the callbacks with the given arguments fire: function() { self.fireWith( this, arguments ); return this; }, // To know if the callbacks have already been called at least once fired: function() { return !!fired; } }; return self; }; function Identity( v ) { return v; } function Thrower( ex ) { throw ex; } function adoptValue( value, resolve, reject, noValue ) { var method; try { // Check for promise aspect first to privilege synchronous behavior if ( value && isFunction( ( method = value.promise ) ) ) { method.call( value ).done( resolve ).fail( reject ); // Other thenables } else if ( value && isFunction( ( method = value.then ) ) ) { method.call( value, resolve, reject ); // Other non-thenables } else { // Control `resolve` arguments by letting Array#slice cast boolean `noValue` to integer: // * false: [ value ].slice( 0 ) => resolve( value ) // * true: [ value ].slice( 1 ) => resolve() resolve.apply( undefined, [ value ].slice( noValue ) ); } // For Promises/A+, convert exceptions into rejections // Since jQuery.when doesn't unwrap thenables, we can skip the extra checks appearing in // Deferred#then to conditionally suppress rejection. } catch ( value ) { // Support: Android 4.0 only // Strict mode functions invoked without .call/.apply get global-object context reject.apply( undefined, [ value ] ); } } jQuery.extend( { Deferred: function( func ) { var tuples = [ // action, add listener, callbacks, // ... .then handlers, argument index, [final state] [ "notify", "progress", jQuery.Callbacks( "memory" ), jQuery.Callbacks( "memory" ), 2 ], [ "resolve", "done", jQuery.Callbacks( "once memory" ), jQuery.Callbacks( "once memory" ), 0, "resolved" ], [ "reject", "fail", jQuery.Callbacks( "once memory" ), jQuery.Callbacks( "once memory" ), 1, "rejected" ] ], state = "pending", promise = { state: function() { return state; }, always: function() { deferred.done( arguments ).fail( arguments ); return this; }, "catch": function( fn ) { return promise.then( null, fn ); }, // Keep pipe for back-compat pipe: function( /* fnDone, fnFail, fnProgress */ ) { var fns = arguments; return jQuery.Deferred( function( newDefer ) { jQuery.each( tuples, function( _i, tuple ) { // Map tuples (progress, done, fail) to arguments (done, fail, progress) var fn = isFunction( fns[ tuple[ 4 ] ] ) && fns[ tuple[ 4 ] ]; // deferred.progress(function() { bind to newDefer or newDefer.notify }) // deferred.done(function() { bind to newDefer or newDefer.resolve }) // deferred.fail(function() { bind to newDefer or newDefer.reject }) deferred[ tuple[ 1 ] ]( function() { var returned = fn && fn.apply( this, arguments ); if ( returned && isFunction( returned.promise ) ) { returned.promise() .progress( newDefer.notify ) .done( newDefer.resolve ) .fail( newDefer.reject ); } else { newDefer[ tuple[ 0 ] + "With" ]( this, fn ? [ returned ] : arguments ); } } ); } ); fns = null; } ).promise(); }, then: function( onFulfilled, onRejected, onProgress ) { var maxDepth = 0; function resolve( depth, deferred, handler, special ) { return function() { var that = this, args = arguments, mightThrow = function() { var returned, then; // Support: Promises/A+ section 2.3.3.3.3 // https://promisesaplus.com/#point-59 // Ignore double-resolution attempts if ( depth < maxDepth ) { return; } returned = handler.apply( that, args ); // Support: Promises/A+ section 2.3.1 // https://promisesaplus.com/#point-48 if ( returned === deferred.promise() ) { throw new TypeError( "Thenable self-resolution" ); } // Support: Promises/A+ sections 2.3.3.1, 3.5 // https://promisesaplus.com/#point-54 // https://promisesaplus.com/#point-75 // Retrieve `then` only once then = returned && // Support: Promises/A+ section 2.3.4 // https://promisesaplus.com/#point-64 // Only check objects and functions for thenability ( typeof returned === "object" || typeof returned === "function" ) && returned.then; // Handle a returned thenable if ( isFunction( then ) ) { // Special processors (notify) just wait for resolution if ( special ) { then.call( returned, resolve( maxDepth, deferred, Identity, special ), resolve( maxDepth, deferred, Thrower, special ) ); // Normal processors (resolve) also hook into progress } else { // ...and disregard older resolution values maxDepth++; then.call( returned, resolve( maxDepth, deferred, Identity, special ), resolve( maxDepth, deferred, Thrower, special ), resolve( maxDepth, deferred, Identity, deferred.notifyWith ) ); } // Handle all other returned values } else { // Only substitute handlers pass on context // and multiple values (non-spec behavior) if ( handler !== Identity ) { that = undefined; args = [ returned ]; } // Process the value(s) // Default process is resolve ( special || deferred.resolveWith )( that, args ); } }, // Only normal processors (resolve) catch and reject exceptions process = special ? mightThrow : function() { try { mightThrow(); } catch ( e ) { if ( jQuery.Deferred.exceptionHook ) { jQuery.Deferred.exceptionHook( e, process.stackTrace ); } // Support: Promises/A+ section 2.3.3.3.4.1 // https://promisesaplus.com/#point-61 // Ignore post-resolution exceptions if ( depth + 1 >= maxDepth ) { // Only substitute handlers pass on context // and multiple values (non-spec behavior) if ( handler !== Thrower ) { that = undefined; args = [ e ]; } deferred.rejectWith( that, args ); } } }; // Support: Promises/A+ section 2.3.3.3.1 // https://promisesaplus.com/#point-57 // Re-resolve promises immediately to dodge false rejection from // subsequent errors if ( depth ) { process(); } else { // Call an optional hook to record the stack, in case of exception // since it's otherwise lost when execution goes async if ( jQuery.Deferred.getStackHook ) { process.stackTrace = jQuery.Deferred.getStackHook(); } window.setTimeout( process ); } }; } return jQuery.Deferred( function( newDefer ) { // progress_handlers.add( ... ) tuples[ 0 ][ 3 ].add( resolve( 0, newDefer, isFunction( onProgress ) ? onProgress : Identity, newDefer.notifyWith ) ); // fulfilled_handlers.add( ... ) tuples[ 1 ][ 3 ].add( resolve( 0, newDefer, isFunction( onFulfilled ) ? onFulfilled : Identity ) ); // rejected_handlers.add( ... ) tuples[ 2 ][ 3 ].add( resolve( 0, newDefer, isFunction( onRejected ) ? onRejected : Thrower ) ); } ).promise(); }, // Get a promise for this deferred // If obj is provided, the promise aspect is added to the object promise: function( obj ) { return obj != null ? jQuery.extend( obj, promise ) : promise; } }, deferred = {}; // Add list-specific methods jQuery.each( tuples, function( i, tuple ) { var list = tuple[ 2 ], stateString = tuple[ 5 ]; // promise.progress = list.add // promise.done = list.add // promise.fail = list.add promise[ tuple[ 1 ] ] = list.add; // Handle state if ( stateString ) { list.add( function() { // state = "resolved" (i.e., fulfilled) // state = "rejected" state = stateString; }, // rejected_callbacks.disable // fulfilled_callbacks.disable tuples[ 3 - i ][ 2 ].disable, // rejected_handlers.disable // fulfilled_handlers.disable tuples[ 3 - i ][ 3 ].disable, // progress_callbacks.lock tuples[ 0 ][ 2 ].lock, // progress_handlers.lock tuples[ 0 ][ 3 ].lock ); } // progress_handlers.fire // fulfilled_handlers.fire // rejected_handlers.fire list.add( tuple[ 3 ].fire ); // deferred.notify = function() { deferred.notifyWith(...) } // deferred.resolve = function() { deferred.resolveWith(...) } // deferred.reject = function() { deferred.rejectWith(...) } deferred[ tuple[ 0 ] ] = function() { deferred[ tuple[ 0 ] + "With" ]( this === deferred ? undefined : this, arguments ); return this; }; // deferred.notifyWith = list.fireWith // deferred.resolveWith = list.fireWith // deferred.rejectWith = list.fireWith deferred[ tuple[ 0 ] + "With" ] = list.fireWith; } ); // Make the deferred a promise promise.promise( deferred ); // Call given func if any if ( func ) { func.call( deferred, deferred ); } // All done! return deferred; }, // Deferred helper when: function( singleValue ) { var // count of uncompleted subordinates remaining = arguments.length, // count of unprocessed arguments i = remaining, // subordinate fulfillment data resolveContexts = Array( i ), resolveValues = slice.call( arguments ), // the primary Deferred primary = jQuery.Deferred(), // subordinate callback factory updateFunc = function( i ) { return function( value ) { resolveContexts[ i ] = this; resolveValues[ i ] = arguments.length > 1 ? slice.call( arguments ) : value; if ( !( --remaining ) ) { primary.resolveWith( resolveContexts, resolveValues ); } }; }; // Single- and empty arguments are adopted like Promise.resolve if ( remaining <= 1 ) { adoptValue( singleValue, primary.done( updateFunc( i ) ).resolve, primary.reject, !remaining ); // Use .then() to unwrap secondary thenables (cf. gh-3000) if ( primary.state() === "pending" || isFunction( resolveValues[ i ] && resolveValues[ i ].then ) ) { return primary.then(); } } // Multiple arguments are aggregated like Promise.all array elements while ( i-- ) { adoptValue( resolveValues[ i ], updateFunc( i ), primary.reject ); } return primary.promise(); } } ); // These usually indicate a programmer mistake during development, // warn about them ASAP rather than swallowing them by default. var rerrorNames = /^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/; jQuery.Deferred.exceptionHook = function( error, stack ) { // Support: IE 8 - 9 only // Console exists when dev tools are open, which can happen at any time if ( window.console && window.console.warn && error && rerrorNames.test( error.name ) ) { window.console.warn( "jQuery.Deferred exception: " + error.message, error.stack, stack ); } }; jQuery.readyException = function( error ) { window.setTimeout( function() { throw error; } ); }; // The deferred used on DOM ready var readyList = jQuery.Deferred(); jQuery.fn.ready = function( fn ) { readyList .then( fn ) // Wrap jQuery.readyException in a function so that the lookup // happens at the time of error handling instead of callback // registration. .catch( function( error ) { jQuery.readyException( error ); } ); return this; }; jQuery.extend( { // Is the DOM ready to be used? Set to true once it occurs. isReady: false, // A counter to track how many items to wait for before // the ready event fires. See #6781 readyWait: 1, // Handle when the DOM is ready ready: function( wait ) { // Abort if there are pending holds or we're already ready if ( wait === true ? --jQuery.readyWait : jQuery.isReady ) { return; } // Remember that the DOM is ready jQuery.isReady = true; // If a normal DOM Ready event fired, decrement, and wait if need be if ( wait !== true && --jQuery.readyWait > 0 ) { return; } // If there are functions bound, to execute readyList.resolveWith( document, [ jQuery ] ); } } ); jQuery.ready.then = readyList.then; // The ready event handler and self cleanup method function completed() { document.removeEventListener( "DOMContentLoaded", completed ); window.removeEventListener( "load", completed ); jQuery.ready(); } // Catch cases where $(document).ready() is called // after the browser event has already occurred. // Support: IE <=9 - 10 only // Older IE sometimes signals "interactive" too soon if ( document.readyState === "complete" || ( document.readyState !== "loading" && !document.documentElement.doScroll ) ) { // Handle it asynchronously to allow scripts the opportunity to delay ready window.setTimeout( jQuery.ready ); } else { // Use the handy event callback document.addEventListener( "DOMContentLoaded", completed ); // A fallback to window.onload, that will always work window.addEventListener( "load", completed ); } // Multifunctional method to get and set values of a collection // The value/s can optionally be executed if it's a function var access = function( elems, fn, key, value, chainable, emptyGet, raw ) { var i = 0, len = elems.length, bulk = key == null; // Sets many values if ( toType( key ) === "object" ) { chainable = true; for ( i in key ) { access( elems, fn, i, key[ i ], true, emptyGet, raw ); } // Sets one value } else if ( value !== undefined ) { chainable = true; if ( !isFunction( value ) ) { raw = true; } if ( bulk ) { // Bulk operations run against the entire set if ( raw ) { fn.call( elems, value ); fn = null; // ...except when executing function values } else { bulk = fn; fn = function( elem, _key, value ) { return bulk.call( jQuery( elem ), value ); }; } } if ( fn ) { for ( ; i < len; i++ ) { fn( elems[ i ], key, raw ? value : value.call( elems[ i ], i, fn( elems[ i ], key ) ) ); } } } if ( chainable ) { return elems; } // Gets if ( bulk ) { return fn.call( elems ); } return len ? fn( elems[ 0 ], key ) : emptyGet; }; // Matches dashed string for camelizing var rmsPrefix = /^-ms-/, rdashAlpha = /-([a-z])/g; // Used by camelCase as callback to replace() function fcamelCase( _all, letter ) { return letter.toUpperCase(); } // Convert dashed to camelCase; used by the css and data modules // Support: IE <=9 - 11, Edge 12 - 15 // Microsoft forgot to hump their vendor prefix (#9572) function camelCase( string ) { return string.replace( rmsPrefix, "ms-" ).replace( rdashAlpha, fcamelCase ); } var acceptData = function( owner ) { // Accepts only: // - Node // - Node.ELEMENT_NODE // - Node.DOCUMENT_NODE // - Object // - Any return owner.nodeType === 1 || owner.nodeType === 9 || !( +owner.nodeType ); }; function Data() { this.expando = jQuery.expando + Data.uid++; } Data.uid = 1; Data.prototype = { cache: function( owner ) { // Check if the owner object already has a cache var value = owner[ this.expando ]; // If not, create one if ( !value ) { value = {}; // We can accept data for non-element nodes in modern browsers, // but we should not, see #8335. // Always return an empty object. if ( acceptData( owner ) ) { // If it is a node unlikely to be stringify-ed or looped over // use plain assignment if ( owner.nodeType ) { owner[ this.expando ] = value; // Otherwise secure it in a non-enumerable property // configurable must be true to allow the property to be // deleted when data is removed } else { Object.defineProperty( owner, this.expando, { value: value, configurable: true } ); } } } return value; }, set: function( owner, data, value ) { var prop, cache = this.cache( owner ); // Handle: [ owner, key, value ] args // Always use camelCase key (gh-2257) if ( typeof data === "string" ) { cache[ camelCase( data ) ] = value; // Handle: [ owner, { properties } ] args } else { // Copy the properties one-by-one to the cache object for ( prop in data ) { cache[ camelCase( prop ) ] = data[ prop ]; } } return cache; }, get: function( owner, key ) { return key === undefined ? this.cache( owner ) : // Always use camelCase key (gh-2257) owner[ this.expando ] && owner[ this.expando ][ camelCase( key ) ]; }, access: function( owner, key, value ) { // In cases where either: // // 1. No key was specified // 2. A string key was specified, but no value provided // // Take the "read" path and allow the get method to determine // which value to return, respectively either: // // 1. The entire cache object // 2. The data stored at the key // if ( key === undefined || ( ( key && typeof key === "string" ) && value === undefined ) ) { return this.get( owner, key ); } // When the key is not a string, or both a key and value // are specified, set or extend (existing objects) with either: // // 1. An object of properties // 2. A key and value // this.set( owner, key, value ); // Since the "set" path can have two possible entry points // return the expected data based on which path was taken[*] return value !== undefined ? value : key; }, remove: function( owner, key ) { var i, cache = owner[ this.expando ]; if ( cache === undefined ) { return; } if ( key !== undefined ) { // Support array or space separated string of keys if ( Array.isArray( key ) ) { // If key is an array of keys... // We always set camelCase keys, so remove that. key = key.map( camelCase ); } else { key = camelCase( key ); // If a key with the spaces exists, use it. // Otherwise, create an array by matching non-whitespace key = key in cache ? [ key ] : ( key.match( rnothtmlwhite ) || [] ); } i = key.length; while ( i-- ) { delete cache[ key[ i ] ]; } } // Remove the expando if there's no more data if ( key === undefined || jQuery.isEmptyObject( cache ) ) { // Support: Chrome <=35 - 45 // Webkit & Blink performance suffers when deleting properties // from DOM nodes, so set to undefined instead // https://bugs.chromium.org/p/chromium/issues/detail?id=378607 (bug restricted) if ( owner.nodeType ) { owner[ this.expando ] = undefined; } else { delete owner[ this.expando ]; } } }, hasData: function( owner ) { var cache = owner[ this.expando ]; return cache !== undefined && !jQuery.isEmptyObject( cache ); } }; var dataPriv = new Data(); var dataUser = new Data(); // Implementation Summary // // 1. Enforce API surface and semantic compatibility with 1.9.x branch // 2. Improve the module's maintainability by reducing the storage // paths to a single mechanism. // 3. Use the same single mechanism to support "private" and "user" data. // 4. _Never_ expose "private" data to user code (TODO: Drop _data, _removeData) // 5. Avoid exposing implementation details on user objects (eg. expando properties) // 6. Provide a clear path for implementation upgrade to WeakMap in 2014 var rbrace = /^(?:\{[\w\W]*\}|\[[\w\W]*\])$/, rmultiDash = /[A-Z]/g; function getData( data ) { if ( data === "true" ) { return true; } if ( data === "false" ) { return false; } if ( data === "null" ) { return null; } // Only convert to a number if it doesn't change the string if ( data === +data + "" ) { return +data; } if ( rbrace.test( data ) ) { return JSON.parse( data ); } return data; } function dataAttr( elem, key, data ) { var name; // If nothing was found internally, try to fetch any // data from the HTML5 data-* attribute if ( data === undefined && elem.nodeType === 1 ) { name = "data-" + key.replace( rmultiDash, "-$&" ).toLowerCase(); data = elem.getAttribute( name ); if ( typeof data === "string" ) { try { data = getData( data ); } catch ( e ) {} // Make sure we set the data so it isn't changed later dataUser.set( elem, key, data ); } else { data = undefined; } } return data; } jQuery.extend( { hasData: function( elem ) { return dataUser.hasData( elem ) || dataPriv.hasData( elem ); }, data: function( elem, name, data ) { return dataUser.access( elem, name, data ); }, removeData: function( elem, name ) { dataUser.remove( elem, name ); }, // TODO: Now that all calls to _data and _removeData have been replaced // with direct calls to dataPriv methods, these can be deprecated. _data: function( elem, name, data ) { return dataPriv.access( elem, name, data ); }, _removeData: function( elem, name ) { dataPriv.remove( elem, name ); } } ); jQuery.fn.extend( { data: function( key, value ) { var i, name, data, elem = this[ 0 ], attrs = elem && elem.attributes; // Gets all values if ( key === undefined ) { if ( this.length ) { data = dataUser.get( elem ); if ( elem.nodeType === 1 && !dataPriv.get( elem, "hasDataAttrs" ) ) { i = attrs.length; while ( i-- ) { // Support: IE 11 only // The attrs elements can be null (#14894) if ( attrs[ i ] ) { name = attrs[ i ].name; if ( name.indexOf( "data-" ) === 0 ) { name = camelCase( name.slice( 5 ) ); dataAttr( elem, name, data[ name ] ); } } } dataPriv.set( elem, "hasDataAttrs", true ); } } return data; } // Sets multiple values if ( typeof key === "object" ) { return this.each( function() { dataUser.set( this, key ); } ); } return access( this, function( value ) { var data; // The calling jQuery object (element matches) is not empty // (and therefore has an element appears at this[ 0 ]) and the // `value` parameter was not undefined. An empty jQuery object // will result in `undefined` for elem = this[ 0 ] which will // throw an exception if an attempt to read a data cache is made. if ( elem && value === undefined ) { // Attempt to get data from the cache // The key will always be camelCased in Data data = dataUser.get( elem, key ); if ( data !== undefined ) { return data; } // Attempt to "discover" the data in // HTML5 custom data-* attrs data = dataAttr( elem, key ); if ( data !== undefined ) { return data; } // We tried really hard, but the data doesn't exist. return; } // Set the data... this.each( function() { // We always store the camelCased key dataUser.set( this, key, value ); } ); }, null, value, arguments.length > 1, null, true ); }, removeData: function( key ) { return this.each( function() { dataUser.remove( this, key ); } ); } } ); jQuery.extend( { queue: function( elem, type, data ) { var queue; if ( elem ) { type = ( type || "fx" ) + "queue"; queue = dataPriv.get( elem, type ); // Speed up dequeue by getting out quickly if this is just a lookup if ( data ) { if ( !queue || Array.isArray( data ) ) { queue = dataPriv.access( elem, type, jQuery.makeArray( data ) ); } else { queue.push( data ); } } return queue || []; } }, dequeue: function( elem, type ) { type = type || "fx"; var queue = jQuery.queue( elem, type ), startLength = queue.length, fn = queue.shift(), hooks = jQuery._queueHooks( elem, type ), next = function() { jQuery.dequeue( elem, type ); }; // If the fx queue is dequeued, always remove the progress sentinel if ( fn === "inprogress" ) { fn = queue.shift(); startLength--; } if ( fn ) { // Add a progress sentinel to prevent the fx queue from being // automatically dequeued if ( type === "fx" ) { queue.unshift( "inprogress" ); } // Clear up the last queue stop function delete hooks.stop; fn.call( elem, next, hooks ); } if ( !startLength && hooks ) { hooks.empty.fire(); } }, // Not public - generate a queueHooks object, or return the current one _queueHooks: function( elem, type ) { var key = type + "queueHooks"; return dataPriv.get( elem, key ) || dataPriv.access( elem, key, { empty: jQuery.Callbacks( "once memory" ).add( function() { dataPriv.remove( elem, [ type + "queue", key ] ); } ) } ); } } ); jQuery.fn.extend( { queue: function( type, data ) { var setter = 2; if ( typeof type !== "string" ) { data = type; type = "fx"; setter--; } if ( arguments.length < setter ) { return jQuery.queue( this[ 0 ], type ); } return data === undefined ? this : this.each( function() { var queue = jQuery.queue( this, type, data ); // Ensure a hooks for this queue jQuery._queueHooks( this, type ); if ( type === "fx" && queue[ 0 ] !== "inprogress" ) { jQuery.dequeue( this, type ); } } ); }, dequeue: function( type ) { return this.each( function() { jQuery.dequeue( this, type ); } ); }, clearQueue: function( type ) { return this.queue( type || "fx", [] ); }, // Get a promise resolved when queues of a certain type // are emptied (fx is the type by default) promise: function( type, obj ) { var tmp, count = 1, defer = jQuery.Deferred(), elements = this, i = this.length, resolve = function() { if ( !( --count ) ) { defer.resolveWith( elements, [ elements ] ); } }; if ( typeof type !== "string" ) { obj = type; type = undefined; } type = type || "fx"; while ( i-- ) { tmp = dataPriv.get( elements[ i ], type + "queueHooks" ); if ( tmp && tmp.empty ) { count++; tmp.empty.add( resolve ); } } resolve(); return defer.promise( obj ); } } ); var pnum = ( /[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/ ).source; var rcssNum = new RegExp( "^(?:([+-])=|)(" + pnum + ")([a-z%]*)$", "i" ); var cssExpand = [ "Top", "Right", "Bottom", "Left" ]; var documentElement = document.documentElement; var isAttached = function( elem ) { return jQuery.contains( elem.ownerDocument, elem ); }, composed = { composed: true }; // Support: IE 9 - 11+, Edge 12 - 18+, iOS 10.0 - 10.2 only // Check attachment across shadow DOM boundaries when possible (gh-3504) // Support: iOS 10.0-10.2 only // Early iOS 10 versions support `attachShadow` but not `getRootNode`, // leading to errors. We need to check for `getRootNode`. if ( documentElement.getRootNode ) { isAttached = function( elem ) { return jQuery.contains( elem.ownerDocument, elem ) || elem.getRootNode( composed ) === elem.ownerDocument; }; } var isHiddenWithinTree = function( elem, el ) { // isHiddenWithinTree might be called from jQuery#filter function; // in that case, element will be second argument elem = el || elem; // Inline style trumps all return elem.style.display === "none" || elem.style.display === "" && // Otherwise, check computed style // Support: Firefox <=43 - 45 // Disconnected elements can have computed display: none, so first confirm that elem is // in the document. isAttached( elem ) && jQuery.css( elem, "display" ) === "none"; }; function adjustCSS( elem, prop, valueParts, tween ) { var adjusted, scale, maxIterations = 20, currentValue = tween ? function() { return tween.cur(); } : function() { return jQuery.css( elem, prop, "" ); }, initial = currentValue(), unit = valueParts && valueParts[ 3 ] || ( jQuery.cssNumber[ prop ] ? "" : "px" ), // Starting value computation is required for potential unit mismatches initialInUnit = elem.nodeType && ( jQuery.cssNumber[ prop ] || unit !== "px" && +initial ) && rcssNum.exec( jQuery.css( elem, prop ) ); if ( initialInUnit && initialInUnit[ 3 ] !== unit ) { // Support: Firefox <=54 // Halve the iteration target value to prevent interference from CSS upper bounds (gh-2144) initial = initial / 2; // Trust units reported by jQuery.css unit = unit || initialInUnit[ 3 ]; // Iteratively approximate from a nonzero starting point initialInUnit = +initial || 1; while ( maxIterations-- ) { // Evaluate and update our best guess (doubling guesses that zero out). // Finish if the scale equals or crosses 1 (making the old*new product non-positive). jQuery.style( elem, prop, initialInUnit + unit ); if ( ( 1 - scale ) * ( 1 - ( scale = currentValue() / initial || 0.5 ) ) <= 0 ) { maxIterations = 0; } initialInUnit = initialInUnit / scale; } initialInUnit = initialInUnit * 2; jQuery.style( elem, prop, initialInUnit + unit ); // Make sure we update the tween properties later on valueParts = valueParts || []; } if ( valueParts ) { initialInUnit = +initialInUnit || +initial || 0; // Apply relative offset (+=/-=) if specified adjusted = valueParts[ 1 ] ? initialInUnit + ( valueParts[ 1 ] + 1 ) * valueParts[ 2 ] : +valueParts[ 2 ]; if ( tween ) { tween.unit = unit; tween.start = initialInUnit; tween.end = adjusted; } } return adjusted; } var defaultDisplayMap = {}; function getDefaultDisplay( elem ) { var temp, doc = elem.ownerDocument, nodeName = elem.nodeName, display = defaultDisplayMap[ nodeName ]; if ( display ) { return display; } temp = doc.body.appendChild( doc.createElement( nodeName ) ); display = jQuery.css( temp, "display" ); temp.parentNode.removeChild( temp ); if ( display === "none" ) { display = "block"; } defaultDisplayMap[ nodeName ] = display; return display; } function showHide( elements, show ) { var display, elem, values = [], index = 0, length = elements.length; // Determine new display value for elements that need to change for ( ; index < length; index++ ) { elem = elements[ index ]; if ( !elem.style ) { continue; } display = elem.style.display; if ( show ) { // Since we force visibility upon cascade-hidden elements, an immediate (and slow) // check is required in this first loop unless we have a nonempty display value (either // inline or about-to-be-restored) if ( display === "none" ) { values[ index ] = dataPriv.get( elem, "display" ) || null; if ( !values[ index ] ) { elem.style.display = ""; } } if ( elem.style.display === "" && isHiddenWithinTree( elem ) ) { values[ index ] = getDefaultDisplay( elem ); } } else { if ( display !== "none" ) { values[ index ] = "none"; // Remember what we're overwriting dataPriv.set( elem, "display", display ); } } } // Set the display of the elements in a second loop to avoid constant reflow for ( index = 0; index < length; index++ ) { if ( values[ index ] != null ) { elements[ index ].style.display = values[ index ]; } } return elements; } jQuery.fn.extend( { show: function() { return showHide( this, true ); }, hide: function() { return showHide( this ); }, toggle: function( state ) { if ( typeof state === "boolean" ) { return state ? this.show() : this.hide(); } return this.each( function() { if ( isHiddenWithinTree( this ) ) { jQuery( this ).show(); } else { jQuery( this ).hide(); } } ); } } ); var rcheckableType = ( /^(?:checkbox|radio)$/i ); var rtagName = ( /<([a-z][^\/\0>\x20\t\r\n\f]*)/i ); var rscriptType = ( /^$|^module$|\/(?:java|ecma)script/i ); ( function() { var fragment = document.createDocumentFragment(), div = fragment.appendChild( document.createElement( "div" ) ), input = document.createElement( "input" ); // Support: Android 4.0 - 4.3 only // Check state lost if the name is set (#11217) // Support: Windows Web Apps (WWA) // `name` and `type` must use .setAttribute for WWA (#14901) input.setAttribute( "type", "radio" ); input.setAttribute( "checked", "checked" ); input.setAttribute( "name", "t" ); div.appendChild( input ); // Support: Android <=4.1 only // Older WebKit doesn't clone checked state correctly in fragments support.checkClone = div.cloneNode( true ).cloneNode( true ).lastChild.checked; // Support: IE <=11 only // Make sure textarea (and checkbox) defaultValue is properly cloned div.innerHTML = ""; support.noCloneChecked = !!div.cloneNode( true ).lastChild.defaultValue; // Support: IE <=9 only // IE <=9 replaces "; support.option = !!div.lastChild; } )(); // We have to close these tags to support XHTML (#13200) var wrapMap = { // XHTML parsers do not magically insert elements in the // same way that tag soup parsers do. So we cannot shorten // this by omitting or other required elements. thead: [ 1, "", "
" ], col: [ 2, "", "
" ], tr: [ 2, "", "
" ], td: [ 3, "", "
" ], _default: [ 0, "", "" ] }; wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead; wrapMap.th = wrapMap.td; // Support: IE <=9 only if ( !support.option ) { wrapMap.optgroup = wrapMap.option = [ 1, "" ]; } function getAll( context, tag ) { // Support: IE <=9 - 11 only // Use typeof to avoid zero-argument method invocation on host objects (#15151) var ret; if ( typeof context.getElementsByTagName !== "undefined" ) { ret = context.getElementsByTagName( tag || "*" ); } else if ( typeof context.querySelectorAll !== "undefined" ) { ret = context.querySelectorAll( tag || "*" ); } else { ret = []; } if ( tag === undefined || tag && nodeName( context, tag ) ) { return jQuery.merge( [ context ], ret ); } return ret; } // Mark scripts as having already been evaluated function setGlobalEval( elems, refElements ) { var i = 0, l = elems.length; for ( ; i < l; i++ ) { dataPriv.set( elems[ i ], "globalEval", !refElements || dataPriv.get( refElements[ i ], "globalEval" ) ); } } var rhtml = /<|&#?\w+;/; function buildFragment( elems, context, scripts, selection, ignored ) { var elem, tmp, tag, wrap, attached, j, fragment = context.createDocumentFragment(), nodes = [], i = 0, l = elems.length; for ( ; i < l; i++ ) { elem = elems[ i ]; if ( elem || elem === 0 ) { // Add nodes directly if ( toType( elem ) === "object" ) { // Support: Android <=4.0 only, PhantomJS 1 only // push.apply(_, arraylike) throws on ancient WebKit jQuery.merge( nodes, elem.nodeType ? [ elem ] : elem ); // Convert non-html into a text node } else if ( !rhtml.test( elem ) ) { nodes.push( context.createTextNode( elem ) ); // Convert html into DOM nodes } else { tmp = tmp || fragment.appendChild( context.createElement( "div" ) ); // Deserialize a standard representation tag = ( rtagName.exec( elem ) || [ "", "" ] )[ 1 ].toLowerCase(); wrap = wrapMap[ tag ] || wrapMap._default; tmp.innerHTML = wrap[ 1 ] + jQuery.htmlPrefilter( elem ) + wrap[ 2 ]; // Descend through wrappers to the right content j = wrap[ 0 ]; while ( j-- ) { tmp = tmp.lastChild; } // Support: Android <=4.0 only, PhantomJS 1 only // push.apply(_, arraylike) throws on ancient WebKit jQuery.merge( nodes, tmp.childNodes ); // Remember the top-level container tmp = fragment.firstChild; // Ensure the created nodes are orphaned (#12392) tmp.textContent = ""; } } } // Remove wrapper from fragment fragment.textContent = ""; i = 0; while ( ( elem = nodes[ i++ ] ) ) { // Skip elements already in the context collection (trac-4087) if ( selection && jQuery.inArray( elem, selection ) > -1 ) { if ( ignored ) { ignored.push( elem ); } continue; } attached = isAttached( elem ); // Append to fragment tmp = getAll( fragment.appendChild( elem ), "script" ); // Preserve script evaluation history if ( attached ) { setGlobalEval( tmp ); } // Capture executables if ( scripts ) { j = 0; while ( ( elem = tmp[ j++ ] ) ) { if ( rscriptType.test( elem.type || "" ) ) { scripts.push( elem ); } } } } return fragment; } var rtypenamespace = /^([^.]*)(?:\.(.+)|)/; function returnTrue() { return true; } function returnFalse() { return false; } // Support: IE <=9 - 11+ // focus() and blur() are asynchronous, except when they are no-op. // So expect focus to be synchronous when the element is already active, // and blur to be synchronous when the element is not already active. // (focus and blur are always synchronous in other supported browsers, // this just defines when we can count on it). function expectSync( elem, type ) { return ( elem === safeActiveElement() ) === ( type === "focus" ); } // Support: IE <=9 only // Accessing document.activeElement can throw unexpectedly // https://bugs.jquery.com/ticket/13393 function safeActiveElement() { try { return document.activeElement; } catch ( err ) { } } function on( elem, types, selector, data, fn, one ) { var origFn, type; // Types can be a map of types/handlers if ( typeof types === "object" ) { // ( types-Object, selector, data ) if ( typeof selector !== "string" ) { // ( types-Object, data ) data = data || selector; selector = undefined; } for ( type in types ) { on( elem, type, selector, data, types[ type ], one ); } return elem; } if ( data == null && fn == null ) { // ( types, fn ) fn = selector; data = selector = undefined; } else if ( fn == null ) { if ( typeof selector === "string" ) { // ( types, selector, fn ) fn = data; data = undefined; } else { // ( types, data, fn ) fn = data; data = selector; selector = undefined; } } if ( fn === false ) { fn = returnFalse; } else if ( !fn ) { return elem; } if ( one === 1 ) { origFn = fn; fn = function( event ) { // Can use an empty set, since event contains the info jQuery().off( event ); return origFn.apply( this, arguments ); }; // Use same guid so caller can remove using origFn fn.guid = origFn.guid || ( origFn.guid = jQuery.guid++ ); } return elem.each( function() { jQuery.event.add( this, types, fn, data, selector ); } ); } /* * Helper functions for managing events -- not part of the public interface. * Props to Dean Edwards' addEvent library for many of the ideas. */ jQuery.event = { global: {}, add: function( elem, types, handler, data, selector ) { var handleObjIn, eventHandle, tmp, events, t, handleObj, special, handlers, type, namespaces, origType, elemData = dataPriv.get( elem ); // Only attach events to objects that accept data if ( !acceptData( elem ) ) { return; } // Caller can pass in an object of custom data in lieu of the handler if ( handler.handler ) { handleObjIn = handler; handler = handleObjIn.handler; selector = handleObjIn.selector; } // Ensure that invalid selectors throw exceptions at attach time // Evaluate against documentElement in case elem is a non-element node (e.g., document) if ( selector ) { jQuery.find.matchesSelector( documentElement, selector ); } // Make sure that the handler has a unique ID, used to find/remove it later if ( !handler.guid ) { handler.guid = jQuery.guid++; } // Init the element's event structure and main handler, if this is the first if ( !( events = elemData.events ) ) { events = elemData.events = Object.create( null ); } if ( !( eventHandle = elemData.handle ) ) { eventHandle = elemData.handle = function( e ) { // Discard the second event of a jQuery.event.trigger() and // when an event is called after a page has unloaded return typeof jQuery !== "undefined" && jQuery.event.triggered !== e.type ? jQuery.event.dispatch.apply( elem, arguments ) : undefined; }; } // Handle multiple events separated by a space types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; t = types.length; while ( t-- ) { tmp = rtypenamespace.exec( types[ t ] ) || []; type = origType = tmp[ 1 ]; namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); // There *must* be a type, no attaching namespace-only handlers if ( !type ) { continue; } // If event changes its type, use the special event handlers for the changed type special = jQuery.event.special[ type ] || {}; // If selector defined, determine special event api type, otherwise given type type = ( selector ? special.delegateType : special.bindType ) || type; // Update special based on newly reset type special = jQuery.event.special[ type ] || {}; // handleObj is passed to all event handlers handleObj = jQuery.extend( { type: type, origType: origType, data: data, handler: handler, guid: handler.guid, selector: selector, needsContext: selector && jQuery.expr.match.needsContext.test( selector ), namespace: namespaces.join( "." ) }, handleObjIn ); // Init the event handler queue if we're the first if ( !( handlers = events[ type ] ) ) { handlers = events[ type ] = []; handlers.delegateCount = 0; // Only use addEventListener if the special events handler returns false if ( !special.setup || special.setup.call( elem, data, namespaces, eventHandle ) === false ) { if ( elem.addEventListener ) { elem.addEventListener( type, eventHandle ); } } } if ( special.add ) { special.add.call( elem, handleObj ); if ( !handleObj.handler.guid ) { handleObj.handler.guid = handler.guid; } } // Add to the element's handler list, delegates in front if ( selector ) { handlers.splice( handlers.delegateCount++, 0, handleObj ); } else { handlers.push( handleObj ); } // Keep track of which events have ever been used, for event optimization jQuery.event.global[ type ] = true; } }, // Detach an event or set of events from an element remove: function( elem, types, handler, selector, mappedTypes ) { var j, origCount, tmp, events, t, handleObj, special, handlers, type, namespaces, origType, elemData = dataPriv.hasData( elem ) && dataPriv.get( elem ); if ( !elemData || !( events = elemData.events ) ) { return; } // Once for each type.namespace in types; type may be omitted types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; t = types.length; while ( t-- ) { tmp = rtypenamespace.exec( types[ t ] ) || []; type = origType = tmp[ 1 ]; namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); // Unbind all events (on this namespace, if provided) for the element if ( !type ) { for ( type in events ) { jQuery.event.remove( elem, type + types[ t ], handler, selector, true ); } continue; } special = jQuery.event.special[ type ] || {}; type = ( selector ? special.delegateType : special.bindType ) || type; handlers = events[ type ] || []; tmp = tmp[ 2 ] && new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ); // Remove matching events origCount = j = handlers.length; while ( j-- ) { handleObj = handlers[ j ]; if ( ( mappedTypes || origType === handleObj.origType ) && ( !handler || handler.guid === handleObj.guid ) && ( !tmp || tmp.test( handleObj.namespace ) ) && ( !selector || selector === handleObj.selector || selector === "**" && handleObj.selector ) ) { handlers.splice( j, 1 ); if ( handleObj.selector ) { handlers.delegateCount--; } if ( special.remove ) { special.remove.call( elem, handleObj ); } } } // Remove generic event handler if we removed something and no more handlers exist // (avoids potential for endless recursion during removal of special event handlers) if ( origCount && !handlers.length ) { if ( !special.teardown || special.teardown.call( elem, namespaces, elemData.handle ) === false ) { jQuery.removeEvent( elem, type, elemData.handle ); } delete events[ type ]; } } // Remove data and the expando if it's no longer used if ( jQuery.isEmptyObject( events ) ) { dataPriv.remove( elem, "handle events" ); } }, dispatch: function( nativeEvent ) { var i, j, ret, matched, handleObj, handlerQueue, args = new Array( arguments.length ), // Make a writable jQuery.Event from the native event object event = jQuery.event.fix( nativeEvent ), handlers = ( dataPriv.get( this, "events" ) || Object.create( null ) )[ event.type ] || [], special = jQuery.event.special[ event.type ] || {}; // Use the fix-ed jQuery.Event rather than the (read-only) native event args[ 0 ] = event; for ( i = 1; i < arguments.length; i++ ) { args[ i ] = arguments[ i ]; } event.delegateTarget = this; // Call the preDispatch hook for the mapped type, and let it bail if desired if ( special.preDispatch && special.preDispatch.call( this, event ) === false ) { return; } // Determine handlers handlerQueue = jQuery.event.handlers.call( this, event, handlers ); // Run delegates first; they may want to stop propagation beneath us i = 0; while ( ( matched = handlerQueue[ i++ ] ) && !event.isPropagationStopped() ) { event.currentTarget = matched.elem; j = 0; while ( ( handleObj = matched.handlers[ j++ ] ) && !event.isImmediatePropagationStopped() ) { // If the event is namespaced, then each handler is only invoked if it is // specially universal or its namespaces are a superset of the event's. if ( !event.rnamespace || handleObj.namespace === false || event.rnamespace.test( handleObj.namespace ) ) { event.handleObj = handleObj; event.data = handleObj.data; ret = ( ( jQuery.event.special[ handleObj.origType ] || {} ).handle || handleObj.handler ).apply( matched.elem, args ); if ( ret !== undefined ) { if ( ( event.result = ret ) === false ) { event.preventDefault(); event.stopPropagation(); } } } } } // Call the postDispatch hook for the mapped type if ( special.postDispatch ) { special.postDispatch.call( this, event ); } return event.result; }, handlers: function( event, handlers ) { var i, handleObj, sel, matchedHandlers, matchedSelectors, handlerQueue = [], delegateCount = handlers.delegateCount, cur = event.target; // Find delegate handlers if ( delegateCount && // Support: IE <=9 // Black-hole SVG instance trees (trac-13180) cur.nodeType && // Support: Firefox <=42 // Suppress spec-violating clicks indicating a non-primary pointer button (trac-3861) // https://www.w3.org/TR/DOM-Level-3-Events/#event-type-click // Support: IE 11 only // ...but not arrow key "clicks" of radio inputs, which can have `button` -1 (gh-2343) !( event.type === "click" && event.button >= 1 ) ) { for ( ; cur !== this; cur = cur.parentNode || this ) { // Don't check non-elements (#13208) // Don't process clicks on disabled elements (#6911, #8165, #11382, #11764) if ( cur.nodeType === 1 && !( event.type === "click" && cur.disabled === true ) ) { matchedHandlers = []; matchedSelectors = {}; for ( i = 0; i < delegateCount; i++ ) { handleObj = handlers[ i ]; // Don't conflict with Object.prototype properties (#13203) sel = handleObj.selector + " "; if ( matchedSelectors[ sel ] === undefined ) { matchedSelectors[ sel ] = handleObj.needsContext ? jQuery( sel, this ).index( cur ) > -1 : jQuery.find( sel, this, null, [ cur ] ).length; } if ( matchedSelectors[ sel ] ) { matchedHandlers.push( handleObj ); } } if ( matchedHandlers.length ) { handlerQueue.push( { elem: cur, handlers: matchedHandlers } ); } } } } // Add the remaining (directly-bound) handlers cur = this; if ( delegateCount < handlers.length ) { handlerQueue.push( { elem: cur, handlers: handlers.slice( delegateCount ) } ); } return handlerQueue; }, addProp: function( name, hook ) { Object.defineProperty( jQuery.Event.prototype, name, { enumerable: true, configurable: true, get: isFunction( hook ) ? function() { if ( this.originalEvent ) { return hook( this.originalEvent ); } } : function() { if ( this.originalEvent ) { return this.originalEvent[ name ]; } }, set: function( value ) { Object.defineProperty( this, name, { enumerable: true, configurable: true, writable: true, value: value } ); } } ); }, fix: function( originalEvent ) { return originalEvent[ jQuery.expando ] ? originalEvent : new jQuery.Event( originalEvent ); }, special: { load: { // Prevent triggered image.load events from bubbling to window.load noBubble: true }, click: { // Utilize native event to ensure correct state for checkable inputs setup: function( data ) { // For mutual compressibility with _default, replace `this` access with a local var. // `|| data` is dead code meant only to preserve the variable through minification. var el = this || data; // Claim the first handler if ( rcheckableType.test( el.type ) && el.click && nodeName( el, "input" ) ) { // dataPriv.set( el, "click", ... ) leverageNative( el, "click", returnTrue ); } // Return false to allow normal processing in the caller return false; }, trigger: function( data ) { // For mutual compressibility with _default, replace `this` access with a local var. // `|| data` is dead code meant only to preserve the variable through minification. var el = this || data; // Force setup before triggering a click if ( rcheckableType.test( el.type ) && el.click && nodeName( el, "input" ) ) { leverageNative( el, "click" ); } // Return non-false to allow normal event-path propagation return true; }, // For cross-browser consistency, suppress native .click() on links // Also prevent it if we're currently inside a leveraged native-event stack _default: function( event ) { var target = event.target; return rcheckableType.test( target.type ) && target.click && nodeName( target, "input" ) && dataPriv.get( target, "click" ) || nodeName( target, "a" ); } }, beforeunload: { postDispatch: function( event ) { // Support: Firefox 20+ // Firefox doesn't alert if the returnValue field is not set. if ( event.result !== undefined && event.originalEvent ) { event.originalEvent.returnValue = event.result; } } } } }; // Ensure the presence of an event listener that handles manually-triggered // synthetic events by interrupting progress until reinvoked in response to // *native* events that it fires directly, ensuring that state changes have // already occurred before other listeners are invoked. function leverageNative( el, type, expectSync ) { // Missing expectSync indicates a trigger call, which must force setup through jQuery.event.add if ( !expectSync ) { if ( dataPriv.get( el, type ) === undefined ) { jQuery.event.add( el, type, returnTrue ); } return; } // Register the controller as a special universal handler for all event namespaces dataPriv.set( el, type, false ); jQuery.event.add( el, type, { namespace: false, handler: function( event ) { var notAsync, result, saved = dataPriv.get( this, type ); if ( ( event.isTrigger & 1 ) && this[ type ] ) { // Interrupt processing of the outer synthetic .trigger()ed event // Saved data should be false in such cases, but might be a leftover capture object // from an async native handler (gh-4350) if ( !saved.length ) { // Store arguments for use when handling the inner native event // There will always be at least one argument (an event object), so this array // will not be confused with a leftover capture object. saved = slice.call( arguments ); dataPriv.set( this, type, saved ); // Trigger the native event and capture its result // Support: IE <=9 - 11+ // focus() and blur() are asynchronous notAsync = expectSync( this, type ); this[ type ](); result = dataPriv.get( this, type ); if ( saved !== result || notAsync ) { dataPriv.set( this, type, false ); } else { result = {}; } if ( saved !== result ) { // Cancel the outer synthetic event event.stopImmediatePropagation(); event.preventDefault(); // Support: Chrome 86+ // In Chrome, if an element having a focusout handler is blurred by // clicking outside of it, it invokes the handler synchronously. If // that handler calls `.remove()` on the element, the data is cleared, // leaving `result` undefined. We need to guard against this. return result && result.value; } // If this is an inner synthetic event for an event with a bubbling surrogate // (focus or blur), assume that the surrogate already propagated from triggering the // native event and prevent that from happening again here. // This technically gets the ordering wrong w.r.t. to `.trigger()` (in which the // bubbling surrogate propagates *after* the non-bubbling base), but that seems // less bad than duplication. } else if ( ( jQuery.event.special[ type ] || {} ).delegateType ) { event.stopPropagation(); } // If this is a native event triggered above, everything is now in order // Fire an inner synthetic event with the original arguments } else if ( saved.length ) { // ...and capture the result dataPriv.set( this, type, { value: jQuery.event.trigger( // Support: IE <=9 - 11+ // Extend with the prototype to reset the above stopImmediatePropagation() jQuery.extend( saved[ 0 ], jQuery.Event.prototype ), saved.slice( 1 ), this ) } ); // Abort handling of the native event event.stopImmediatePropagation(); } } } ); } jQuery.removeEvent = function( elem, type, handle ) { // This "if" is needed for plain objects if ( elem.removeEventListener ) { elem.removeEventListener( type, handle ); } }; jQuery.Event = function( src, props ) { // Allow instantiation without the 'new' keyword if ( !( this instanceof jQuery.Event ) ) { return new jQuery.Event( src, props ); } // Event object if ( src && src.type ) { this.originalEvent = src; this.type = src.type; // Events bubbling up the document may have been marked as prevented // by a handler lower down the tree; reflect the correct value. this.isDefaultPrevented = src.defaultPrevented || src.defaultPrevented === undefined && // Support: Android <=2.3 only src.returnValue === false ? returnTrue : returnFalse; // Create target properties // Support: Safari <=6 - 7 only // Target should not be a text node (#504, #13143) this.target = ( src.target && src.target.nodeType === 3 ) ? src.target.parentNode : src.target; this.currentTarget = src.currentTarget; this.relatedTarget = src.relatedTarget; // Event type } else { this.type = src; } // Put explicitly provided properties onto the event object if ( props ) { jQuery.extend( this, props ); } // Create a timestamp if incoming event doesn't have one this.timeStamp = src && src.timeStamp || Date.now(); // Mark it as fixed this[ jQuery.expando ] = true; }; // jQuery.Event is based on DOM3 Events as specified by the ECMAScript Language Binding // https://www.w3.org/TR/2003/WD-DOM-Level-3-Events-20030331/ecma-script-binding.html jQuery.Event.prototype = { constructor: jQuery.Event, isDefaultPrevented: returnFalse, isPropagationStopped: returnFalse, isImmediatePropagationStopped: returnFalse, isSimulated: false, preventDefault: function() { var e = this.originalEvent; this.isDefaultPrevented = returnTrue; if ( e && !this.isSimulated ) { e.preventDefault(); } }, stopPropagation: function() { var e = this.originalEvent; this.isPropagationStopped = returnTrue; if ( e && !this.isSimulated ) { e.stopPropagation(); } }, stopImmediatePropagation: function() { var e = this.originalEvent; this.isImmediatePropagationStopped = returnTrue; if ( e && !this.isSimulated ) { e.stopImmediatePropagation(); } this.stopPropagation(); } }; // Includes all common event props including KeyEvent and MouseEvent specific props jQuery.each( { altKey: true, bubbles: true, cancelable: true, changedTouches: true, ctrlKey: true, detail: true, eventPhase: true, metaKey: true, pageX: true, pageY: true, shiftKey: true, view: true, "char": true, code: true, charCode: true, key: true, keyCode: true, button: true, buttons: true, clientX: true, clientY: true, offsetX: true, offsetY: true, pointerId: true, pointerType: true, screenX: true, screenY: true, targetTouches: true, toElement: true, touches: true, which: true }, jQuery.event.addProp ); jQuery.each( { focus: "focusin", blur: "focusout" }, function( type, delegateType ) { jQuery.event.special[ type ] = { // Utilize native event if possible so blur/focus sequence is correct setup: function() { // Claim the first handler // dataPriv.set( this, "focus", ... ) // dataPriv.set( this, "blur", ... ) leverageNative( this, type, expectSync ); // Return false to allow normal processing in the caller return false; }, trigger: function() { // Force setup before trigger leverageNative( this, type ); // Return non-false to allow normal event-path propagation return true; }, // Suppress native focus or blur as it's already being fired // in leverageNative. _default: function() { return true; }, delegateType: delegateType }; } ); // Create mouseenter/leave events using mouseover/out and event-time checks // so that event delegation works in jQuery. // Do the same for pointerenter/pointerleave and pointerover/pointerout // // Support: Safari 7 only // Safari sends mouseenter too often; see: // https://bugs.chromium.org/p/chromium/issues/detail?id=470258 // for the description of the bug (it existed in older Chrome versions as well). jQuery.each( { mouseenter: "mouseover", mouseleave: "mouseout", pointerenter: "pointerover", pointerleave: "pointerout" }, function( orig, fix ) { jQuery.event.special[ orig ] = { delegateType: fix, bindType: fix, handle: function( event ) { var ret, target = this, related = event.relatedTarget, handleObj = event.handleObj; // For mouseenter/leave call the handler if related is outside the target. // NB: No relatedTarget if the mouse left/entered the browser window if ( !related || ( related !== target && !jQuery.contains( target, related ) ) ) { event.type = handleObj.origType; ret = handleObj.handler.apply( this, arguments ); event.type = fix; } return ret; } }; } ); jQuery.fn.extend( { on: function( types, selector, data, fn ) { return on( this, types, selector, data, fn ); }, one: function( types, selector, data, fn ) { return on( this, types, selector, data, fn, 1 ); }, off: function( types, selector, fn ) { var handleObj, type; if ( types && types.preventDefault && types.handleObj ) { // ( event ) dispatched jQuery.Event handleObj = types.handleObj; jQuery( types.delegateTarget ).off( handleObj.namespace ? handleObj.origType + "." + handleObj.namespace : handleObj.origType, handleObj.selector, handleObj.handler ); return this; } if ( typeof types === "object" ) { // ( types-object [, selector] ) for ( type in types ) { this.off( type, selector, types[ type ] ); } return this; } if ( selector === false || typeof selector === "function" ) { // ( types [, fn] ) fn = selector; selector = undefined; } if ( fn === false ) { fn = returnFalse; } return this.each( function() { jQuery.event.remove( this, types, fn, selector ); } ); } } ); var // Support: IE <=10 - 11, Edge 12 - 13 only // In IE/Edge using regex groups here causes severe slowdowns. // See https://connect.microsoft.com/IE/feedback/details/1736512/ rnoInnerhtml = /\s*$/g; // Prefer a tbody over its parent table for containing new rows function manipulationTarget( elem, content ) { if ( nodeName( elem, "table" ) && nodeName( content.nodeType !== 11 ? content : content.firstChild, "tr" ) ) { return jQuery( elem ).children( "tbody" )[ 0 ] || elem; } return elem; } // Replace/restore the type attribute of script elements for safe DOM manipulation function disableScript( elem ) { elem.type = ( elem.getAttribute( "type" ) !== null ) + "/" + elem.type; return elem; } function restoreScript( elem ) { if ( ( elem.type || "" ).slice( 0, 5 ) === "true/" ) { elem.type = elem.type.slice( 5 ); } else { elem.removeAttribute( "type" ); } return elem; } function cloneCopyEvent( src, dest ) { var i, l, type, pdataOld, udataOld, udataCur, events; if ( dest.nodeType !== 1 ) { return; } // 1. Copy private data: events, handlers, etc. if ( dataPriv.hasData( src ) ) { pdataOld = dataPriv.get( src ); events = pdataOld.events; if ( events ) { dataPriv.remove( dest, "handle events" ); for ( type in events ) { for ( i = 0, l = events[ type ].length; i < l; i++ ) { jQuery.event.add( dest, type, events[ type ][ i ] ); } } } } // 2. Copy user data if ( dataUser.hasData( src ) ) { udataOld = dataUser.access( src ); udataCur = jQuery.extend( {}, udataOld ); dataUser.set( dest, udataCur ); } } // Fix IE bugs, see support tests function fixInput( src, dest ) { var nodeName = dest.nodeName.toLowerCase(); // Fails to persist the checked state of a cloned checkbox or radio button. if ( nodeName === "input" && rcheckableType.test( src.type ) ) { dest.checked = src.checked; // Fails to return the selected option to the default selected state when cloning options } else if ( nodeName === "input" || nodeName === "textarea" ) { dest.defaultValue = src.defaultValue; } } function domManip( collection, args, callback, ignored ) { // Flatten any nested arrays args = flat( args ); var fragment, first, scripts, hasScripts, node, doc, i = 0, l = collection.length, iNoClone = l - 1, value = args[ 0 ], valueIsFunction = isFunction( value ); // We can't cloneNode fragments that contain checked, in WebKit if ( valueIsFunction || ( l > 1 && typeof value === "string" && !support.checkClone && rchecked.test( value ) ) ) { return collection.each( function( index ) { var self = collection.eq( index ); if ( valueIsFunction ) { args[ 0 ] = value.call( this, index, self.html() ); } domManip( self, args, callback, ignored ); } ); } if ( l ) { fragment = buildFragment( args, collection[ 0 ].ownerDocument, false, collection, ignored ); first = fragment.firstChild; if ( fragment.childNodes.length === 1 ) { fragment = first; } // Require either new content or an interest in ignored elements to invoke the callback if ( first || ignored ) { scripts = jQuery.map( getAll( fragment, "script" ), disableScript ); hasScripts = scripts.length; // Use the original fragment for the last item // instead of the first because it can end up // being emptied incorrectly in certain situations (#8070). for ( ; i < l; i++ ) { node = fragment; if ( i !== iNoClone ) { node = jQuery.clone( node, true, true ); // Keep references to cloned scripts for later restoration if ( hasScripts ) { // Support: Android <=4.0 only, PhantomJS 1 only // push.apply(_, arraylike) throws on ancient WebKit jQuery.merge( scripts, getAll( node, "script" ) ); } } callback.call( collection[ i ], node, i ); } if ( hasScripts ) { doc = scripts[ scripts.length - 1 ].ownerDocument; // Reenable scripts jQuery.map( scripts, restoreScript ); // Evaluate executable scripts on first document insertion for ( i = 0; i < hasScripts; i++ ) { node = scripts[ i ]; if ( rscriptType.test( node.type || "" ) && !dataPriv.access( node, "globalEval" ) && jQuery.contains( doc, node ) ) { if ( node.src && ( node.type || "" ).toLowerCase() !== "module" ) { // Optional AJAX dependency, but won't run scripts if not present if ( jQuery._evalUrl && !node.noModule ) { jQuery._evalUrl( node.src, { nonce: node.nonce || node.getAttribute( "nonce" ) }, doc ); } } else { DOMEval( node.textContent.replace( rcleanScript, "" ), node, doc ); } } } } } } return collection; } function remove( elem, selector, keepData ) { var node, nodes = selector ? jQuery.filter( selector, elem ) : elem, i = 0; for ( ; ( node = nodes[ i ] ) != null; i++ ) { if ( !keepData && node.nodeType === 1 ) { jQuery.cleanData( getAll( node ) ); } if ( node.parentNode ) { if ( keepData && isAttached( node ) ) { setGlobalEval( getAll( node, "script" ) ); } node.parentNode.removeChild( node ); } } return elem; } jQuery.extend( { htmlPrefilter: function( html ) { return html; }, clone: function( elem, dataAndEvents, deepDataAndEvents ) { var i, l, srcElements, destElements, clone = elem.cloneNode( true ), inPage = isAttached( elem ); // Fix IE cloning issues if ( !support.noCloneChecked && ( elem.nodeType === 1 || elem.nodeType === 11 ) && !jQuery.isXMLDoc( elem ) ) { // We eschew Sizzle here for performance reasons: https://jsperf.com/getall-vs-sizzle/2 destElements = getAll( clone ); srcElements = getAll( elem ); for ( i = 0, l = srcElements.length; i < l; i++ ) { fixInput( srcElements[ i ], destElements[ i ] ); } } // Copy the events from the original to the clone if ( dataAndEvents ) { if ( deepDataAndEvents ) { srcElements = srcElements || getAll( elem ); destElements = destElements || getAll( clone ); for ( i = 0, l = srcElements.length; i < l; i++ ) { cloneCopyEvent( srcElements[ i ], destElements[ i ] ); } } else { cloneCopyEvent( elem, clone ); } } // Preserve script evaluation history destElements = getAll( clone, "script" ); if ( destElements.length > 0 ) { setGlobalEval( destElements, !inPage && getAll( elem, "script" ) ); } // Return the cloned set return clone; }, cleanData: function( elems ) { var data, elem, type, special = jQuery.event.special, i = 0; for ( ; ( elem = elems[ i ] ) !== undefined; i++ ) { if ( acceptData( elem ) ) { if ( ( data = elem[ dataPriv.expando ] ) ) { if ( data.events ) { for ( type in data.events ) { if ( special[ type ] ) { jQuery.event.remove( elem, type ); // This is a shortcut to avoid jQuery.event.remove's overhead } else { jQuery.removeEvent( elem, type, data.handle ); } } } // Support: Chrome <=35 - 45+ // Assign undefined instead of using delete, see Data#remove elem[ dataPriv.expando ] = undefined; } if ( elem[ dataUser.expando ] ) { // Support: Chrome <=35 - 45+ // Assign undefined instead of using delete, see Data#remove elem[ dataUser.expando ] = undefined; } } } } } ); jQuery.fn.extend( { detach: function( selector ) { return remove( this, selector, true ); }, remove: function( selector ) { return remove( this, selector ); }, text: function( value ) { return access( this, function( value ) { return value === undefined ? jQuery.text( this ) : this.empty().each( function() { if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { this.textContent = value; } } ); }, null, value, arguments.length ); }, append: function() { return domManip( this, arguments, function( elem ) { if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { var target = manipulationTarget( this, elem ); target.appendChild( elem ); } } ); }, prepend: function() { return domManip( this, arguments, function( elem ) { if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { var target = manipulationTarget( this, elem ); target.insertBefore( elem, target.firstChild ); } } ); }, before: function() { return domManip( this, arguments, function( elem ) { if ( this.parentNode ) { this.parentNode.insertBefore( elem, this ); } } ); }, after: function() { return domManip( this, arguments, function( elem ) { if ( this.parentNode ) { this.parentNode.insertBefore( elem, this.nextSibling ); } } ); }, empty: function() { var elem, i = 0; for ( ; ( elem = this[ i ] ) != null; i++ ) { if ( elem.nodeType === 1 ) { // Prevent memory leaks jQuery.cleanData( getAll( elem, false ) ); // Remove any remaining nodes elem.textContent = ""; } } return this; }, clone: function( dataAndEvents, deepDataAndEvents ) { dataAndEvents = dataAndEvents == null ? false : dataAndEvents; deepDataAndEvents = deepDataAndEvents == null ? dataAndEvents : deepDataAndEvents; return this.map( function() { return jQuery.clone( this, dataAndEvents, deepDataAndEvents ); } ); }, html: function( value ) { return access( this, function( value ) { var elem = this[ 0 ] || {}, i = 0, l = this.length; if ( value === undefined && elem.nodeType === 1 ) { return elem.innerHTML; } // See if we can take a shortcut and just use innerHTML if ( typeof value === "string" && !rnoInnerhtml.test( value ) && !wrapMap[ ( rtagName.exec( value ) || [ "", "" ] )[ 1 ].toLowerCase() ] ) { value = jQuery.htmlPrefilter( value ); try { for ( ; i < l; i++ ) { elem = this[ i ] || {}; // Remove element nodes and prevent memory leaks if ( elem.nodeType === 1 ) { jQuery.cleanData( getAll( elem, false ) ); elem.innerHTML = value; } } elem = 0; // If using innerHTML throws an exception, use the fallback method } catch ( e ) {} } if ( elem ) { this.empty().append( value ); } }, null, value, arguments.length ); }, replaceWith: function() { var ignored = []; // Make the changes, replacing each non-ignored context element with the new content return domManip( this, arguments, function( elem ) { var parent = this.parentNode; if ( jQuery.inArray( this, ignored ) < 0 ) { jQuery.cleanData( getAll( this ) ); if ( parent ) { parent.replaceChild( elem, this ); } } // Force callback invocation }, ignored ); } } ); jQuery.each( { appendTo: "append", prependTo: "prepend", insertBefore: "before", insertAfter: "after", replaceAll: "replaceWith" }, function( name, original ) { jQuery.fn[ name ] = function( selector ) { var elems, ret = [], insert = jQuery( selector ), last = insert.length - 1, i = 0; for ( ; i <= last; i++ ) { elems = i === last ? this : this.clone( true ); jQuery( insert[ i ] )[ original ]( elems ); // Support: Android <=4.0 only, PhantomJS 1 only // .get() because push.apply(_, arraylike) throws on ancient WebKit push.apply( ret, elems.get() ); } return this.pushStack( ret ); }; } ); var rnumnonpx = new RegExp( "^(" + pnum + ")(?!px)[a-z%]+$", "i" ); var getStyles = function( elem ) { // Support: IE <=11 only, Firefox <=30 (#15098, #14150) // IE throws on elements created in popups // FF meanwhile throws on frame elements through "defaultView.getComputedStyle" var view = elem.ownerDocument.defaultView; if ( !view || !view.opener ) { view = window; } return view.getComputedStyle( elem ); }; var swap = function( elem, options, callback ) { var ret, name, old = {}; // Remember the old values, and insert the new ones for ( name in options ) { old[ name ] = elem.style[ name ]; elem.style[ name ] = options[ name ]; } ret = callback.call( elem ); // Revert the old values for ( name in options ) { elem.style[ name ] = old[ name ]; } return ret; }; var rboxStyle = new RegExp( cssExpand.join( "|" ), "i" ); ( function() { // Executing both pixelPosition & boxSizingReliable tests require only one layout // so they're executed at the same time to save the second computation. function computeStyleTests() { // This is a singleton, we need to execute it only once if ( !div ) { return; } container.style.cssText = "position:absolute;left:-11111px;width:60px;" + "margin-top:1px;padding:0;border:0"; div.style.cssText = "position:relative;display:block;box-sizing:border-box;overflow:scroll;" + "margin:auto;border:1px;padding:1px;" + "width:60%;top:1%"; documentElement.appendChild( container ).appendChild( div ); var divStyle = window.getComputedStyle( div ); pixelPositionVal = divStyle.top !== "1%"; // Support: Android 4.0 - 4.3 only, Firefox <=3 - 44 reliableMarginLeftVal = roundPixelMeasures( divStyle.marginLeft ) === 12; // Support: Android 4.0 - 4.3 only, Safari <=9.1 - 10.1, iOS <=7.0 - 9.3 // Some styles come back with percentage values, even though they shouldn't div.style.right = "60%"; pixelBoxStylesVal = roundPixelMeasures( divStyle.right ) === 36; // Support: IE 9 - 11 only // Detect misreporting of content dimensions for box-sizing:border-box elements boxSizingReliableVal = roundPixelMeasures( divStyle.width ) === 36; // Support: IE 9 only // Detect overflow:scroll screwiness (gh-3699) // Support: Chrome <=64 // Don't get tricked when zoom affects offsetWidth (gh-4029) div.style.position = "absolute"; scrollboxSizeVal = roundPixelMeasures( div.offsetWidth / 3 ) === 12; documentElement.removeChild( container ); // Nullify the div so it wouldn't be stored in the memory and // it will also be a sign that checks already performed div = null; } function roundPixelMeasures( measure ) { return Math.round( parseFloat( measure ) ); } var pixelPositionVal, boxSizingReliableVal, scrollboxSizeVal, pixelBoxStylesVal, reliableTrDimensionsVal, reliableMarginLeftVal, container = document.createElement( "div" ), div = document.createElement( "div" ); // Finish early in limited (non-browser) environments if ( !div.style ) { return; } // Support: IE <=9 - 11 only // Style of cloned element affects source element cloned (#8908) div.style.backgroundClip = "content-box"; div.cloneNode( true ).style.backgroundClip = ""; support.clearCloneStyle = div.style.backgroundClip === "content-box"; jQuery.extend( support, { boxSizingReliable: function() { computeStyleTests(); return boxSizingReliableVal; }, pixelBoxStyles: function() { computeStyleTests(); return pixelBoxStylesVal; }, pixelPosition: function() { computeStyleTests(); return pixelPositionVal; }, reliableMarginLeft: function() { computeStyleTests(); return reliableMarginLeftVal; }, scrollboxSize: function() { computeStyleTests(); return scrollboxSizeVal; }, // Support: IE 9 - 11+, Edge 15 - 18+ // IE/Edge misreport `getComputedStyle` of table rows with width/height // set in CSS while `offset*` properties report correct values. // Behavior in IE 9 is more subtle than in newer versions & it passes // some versions of this test; make sure not to make it pass there! // // Support: Firefox 70+ // Only Firefox includes border widths // in computed dimensions. (gh-4529) reliableTrDimensions: function() { var table, tr, trChild, trStyle; if ( reliableTrDimensionsVal == null ) { table = document.createElement( "table" ); tr = document.createElement( "tr" ); trChild = document.createElement( "div" ); table.style.cssText = "position:absolute;left:-11111px;border-collapse:separate"; tr.style.cssText = "border:1px solid"; // Support: Chrome 86+ // Height set through cssText does not get applied. // Computed height then comes back as 0. tr.style.height = "1px"; trChild.style.height = "9px"; // Support: Android 8 Chrome 86+ // In our bodyBackground.html iframe, // display for all div elements is set to "inline", // which causes a problem only in Android 8 Chrome 86. // Ensuring the div is display: block // gets around this issue. trChild.style.display = "block"; documentElement .appendChild( table ) .appendChild( tr ) .appendChild( trChild ); trStyle = window.getComputedStyle( tr ); reliableTrDimensionsVal = ( parseInt( trStyle.height, 10 ) + parseInt( trStyle.borderTopWidth, 10 ) + parseInt( trStyle.borderBottomWidth, 10 ) ) === tr.offsetHeight; documentElement.removeChild( table ); } return reliableTrDimensionsVal; } } ); } )(); function curCSS( elem, name, computed ) { var width, minWidth, maxWidth, ret, // Support: Firefox 51+ // Retrieving style before computed somehow // fixes an issue with getting wrong values // on detached elements style = elem.style; computed = computed || getStyles( elem ); // getPropertyValue is needed for: // .css('filter') (IE 9 only, #12537) // .css('--customProperty) (#3144) if ( computed ) { ret = computed.getPropertyValue( name ) || computed[ name ]; if ( ret === "" && !isAttached( elem ) ) { ret = jQuery.style( elem, name ); } // A tribute to the "awesome hack by Dean Edwards" // Android Browser returns percentage for some values, // but width seems to be reliably pixels. // This is against the CSSOM draft spec: // https://drafts.csswg.org/cssom/#resolved-values if ( !support.pixelBoxStyles() && rnumnonpx.test( ret ) && rboxStyle.test( name ) ) { // Remember the original values width = style.width; minWidth = style.minWidth; maxWidth = style.maxWidth; // Put in the new values to get a computed value out style.minWidth = style.maxWidth = style.width = ret; ret = computed.width; // Revert the changed values style.width = width; style.minWidth = minWidth; style.maxWidth = maxWidth; } } return ret !== undefined ? // Support: IE <=9 - 11 only // IE returns zIndex value as an integer. ret + "" : ret; } function addGetHookIf( conditionFn, hookFn ) { // Define the hook, we'll check on the first run if it's really needed. return { get: function() { if ( conditionFn() ) { // Hook not needed (or it's not possible to use it due // to missing dependency), remove it. delete this.get; return; } // Hook needed; redefine it so that the support test is not executed again. return ( this.get = hookFn ).apply( this, arguments ); } }; } var cssPrefixes = [ "Webkit", "Moz", "ms" ], emptyStyle = document.createElement( "div" ).style, vendorProps = {}; // Return a vendor-prefixed property or undefined function vendorPropName( name ) { // Check for vendor prefixed names var capName = name[ 0 ].toUpperCase() + name.slice( 1 ), i = cssPrefixes.length; while ( i-- ) { name = cssPrefixes[ i ] + capName; if ( name in emptyStyle ) { return name; } } } // Return a potentially-mapped jQuery.cssProps or vendor prefixed property function finalPropName( name ) { var final = jQuery.cssProps[ name ] || vendorProps[ name ]; if ( final ) { return final; } if ( name in emptyStyle ) { return name; } return vendorProps[ name ] = vendorPropName( name ) || name; } var // Swappable if display is none or starts with table // except "table", "table-cell", or "table-caption" // See here for display values: https://developer.mozilla.org/en-US/docs/CSS/display rdisplayswap = /^(none|table(?!-c[ea]).+)/, rcustomProp = /^--/, cssShow = { position: "absolute", visibility: "hidden", display: "block" }, cssNormalTransform = { letterSpacing: "0", fontWeight: "400" }; function setPositiveNumber( _elem, value, subtract ) { // Any relative (+/-) values have already been // normalized at this point var matches = rcssNum.exec( value ); return matches ? // Guard against undefined "subtract", e.g., when used as in cssHooks Math.max( 0, matches[ 2 ] - ( subtract || 0 ) ) + ( matches[ 3 ] || "px" ) : value; } function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, computedVal ) { var i = dimension === "width" ? 1 : 0, extra = 0, delta = 0; // Adjustment may not be necessary if ( box === ( isBorderBox ? "border" : "content" ) ) { return 0; } for ( ; i < 4; i += 2 ) { // Both box models exclude margin if ( box === "margin" ) { delta += jQuery.css( elem, box + cssExpand[ i ], true, styles ); } // If we get here with a content-box, we're seeking "padding" or "border" or "margin" if ( !isBorderBox ) { // Add padding delta += jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); // For "border" or "margin", add border if ( box !== "padding" ) { delta += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); // But still keep track of it otherwise } else { extra += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); } // If we get here with a border-box (content + padding + border), we're seeking "content" or // "padding" or "margin" } else { // For "content", subtract padding if ( box === "content" ) { delta -= jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); } // For "content" or "padding", subtract border if ( box !== "margin" ) { delta -= jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); } } } // Account for positive content-box scroll gutter when requested by providing computedVal if ( !isBorderBox && computedVal >= 0 ) { // offsetWidth/offsetHeight is a rounded sum of content, padding, scroll gutter, and border // Assuming integer scroll gutter, subtract the rest and round down delta += Math.max( 0, Math.ceil( elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - computedVal - delta - extra - 0.5 // If offsetWidth/offsetHeight is unknown, then we can't determine content-box scroll gutter // Use an explicit zero to avoid NaN (gh-3964) ) ) || 0; } return delta; } function getWidthOrHeight( elem, dimension, extra ) { // Start with computed style var styles = getStyles( elem ), // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-4322). // Fake content-box until we know it's needed to know the true value. boxSizingNeeded = !support.boxSizingReliable() || extra, isBorderBox = boxSizingNeeded && jQuery.css( elem, "boxSizing", false, styles ) === "border-box", valueIsBorderBox = isBorderBox, val = curCSS( elem, dimension, styles ), offsetProp = "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ); // Support: Firefox <=54 // Return a confounding non-pixel value or feign ignorance, as appropriate. if ( rnumnonpx.test( val ) ) { if ( !extra ) { return val; } val = "auto"; } // Support: IE 9 - 11 only // Use offsetWidth/offsetHeight for when box sizing is unreliable. // In those cases, the computed value can be trusted to be border-box. if ( ( !support.boxSizingReliable() && isBorderBox || // Support: IE 10 - 11+, Edge 15 - 18+ // IE/Edge misreport `getComputedStyle` of table rows with width/height // set in CSS while `offset*` properties report correct values. // Interestingly, in some cases IE 9 doesn't suffer from this issue. !support.reliableTrDimensions() && nodeName( elem, "tr" ) || // Fall back to offsetWidth/offsetHeight when value is "auto" // This happens for inline elements with no explicit setting (gh-3571) val === "auto" || // Support: Android <=4.1 - 4.3 only // Also use offsetWidth/offsetHeight for misreported inline dimensions (gh-3602) !parseFloat( val ) && jQuery.css( elem, "display", false, styles ) === "inline" ) && // Make sure the element is visible & connected elem.getClientRects().length ) { isBorderBox = jQuery.css( elem, "boxSizing", false, styles ) === "border-box"; // Where available, offsetWidth/offsetHeight approximate border box dimensions. // Where not available (e.g., SVG), assume unreliable box-sizing and interpret the // retrieved value as a content box dimension. valueIsBorderBox = offsetProp in elem; if ( valueIsBorderBox ) { val = elem[ offsetProp ]; } } // Normalize "" and auto val = parseFloat( val ) || 0; // Adjust for the element's box model return ( val + boxModelAdjustment( elem, dimension, extra || ( isBorderBox ? "border" : "content" ), valueIsBorderBox, styles, // Provide the current computed size to request scroll gutter calculation (gh-3589) val ) ) + "px"; } jQuery.extend( { // Add in style property hooks for overriding the default // behavior of getting and setting a style property cssHooks: { opacity: { get: function( elem, computed ) { if ( computed ) { // We should always get a number back from opacity var ret = curCSS( elem, "opacity" ); return ret === "" ? "1" : ret; } } } }, // Don't automatically add "px" to these possibly-unitless properties cssNumber: { "animationIterationCount": true, "columnCount": true, "fillOpacity": true, "flexGrow": true, "flexShrink": true, "fontWeight": true, "gridArea": true, "gridColumn": true, "gridColumnEnd": true, "gridColumnStart": true, "gridRow": true, "gridRowEnd": true, "gridRowStart": true, "lineHeight": true, "opacity": true, "order": true, "orphans": true, "widows": true, "zIndex": true, "zoom": true }, // Add in properties whose names you wish to fix before // setting or getting the value cssProps: {}, // Get and set the style property on a DOM Node style: function( elem, name, value, extra ) { // Don't set styles on text and comment nodes if ( !elem || elem.nodeType === 3 || elem.nodeType === 8 || !elem.style ) { return; } // Make sure that we're working with the right name var ret, type, hooks, origName = camelCase( name ), isCustomProp = rcustomProp.test( name ), style = elem.style; // Make sure that we're working with the right name. We don't // want to query the value if it is a CSS custom property // since they are user-defined. if ( !isCustomProp ) { name = finalPropName( origName ); } // Gets hook for the prefixed version, then unprefixed version hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; // Check if we're setting a value if ( value !== undefined ) { type = typeof value; // Convert "+=" or "-=" to relative numbers (#7345) if ( type === "string" && ( ret = rcssNum.exec( value ) ) && ret[ 1 ] ) { value = adjustCSS( elem, name, ret ); // Fixes bug #9237 type = "number"; } // Make sure that null and NaN values aren't set (#7116) if ( value == null || value !== value ) { return; } // If a number was passed in, add the unit (except for certain CSS properties) // The isCustomProp check can be removed in jQuery 4.0 when we only auto-append // "px" to a few hardcoded values. if ( type === "number" && !isCustomProp ) { value += ret && ret[ 3 ] || ( jQuery.cssNumber[ origName ] ? "" : "px" ); } // background-* props affect original clone's values if ( !support.clearCloneStyle && value === "" && name.indexOf( "background" ) === 0 ) { style[ name ] = "inherit"; } // If a hook was provided, use that value, otherwise just set the specified value if ( !hooks || !( "set" in hooks ) || ( value = hooks.set( elem, value, extra ) ) !== undefined ) { if ( isCustomProp ) { style.setProperty( name, value ); } else { style[ name ] = value; } } } else { // If a hook was provided get the non-computed value from there if ( hooks && "get" in hooks && ( ret = hooks.get( elem, false, extra ) ) !== undefined ) { return ret; } // Otherwise just get the value from the style object return style[ name ]; } }, css: function( elem, name, extra, styles ) { var val, num, hooks, origName = camelCase( name ), isCustomProp = rcustomProp.test( name ); // Make sure that we're working with the right name. We don't // want to modify the value if it is a CSS custom property // since they are user-defined. if ( !isCustomProp ) { name = finalPropName( origName ); } // Try prefixed name followed by the unprefixed name hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; // If a hook was provided get the computed value from there if ( hooks && "get" in hooks ) { val = hooks.get( elem, true, extra ); } // Otherwise, if a way to get the computed value exists, use that if ( val === undefined ) { val = curCSS( elem, name, styles ); } // Convert "normal" to computed value if ( val === "normal" && name in cssNormalTransform ) { val = cssNormalTransform[ name ]; } // Make numeric if forced or a qualifier was provided and val looks numeric if ( extra === "" || extra ) { num = parseFloat( val ); return extra === true || isFinite( num ) ? num || 0 : val; } return val; } } ); jQuery.each( [ "height", "width" ], function( _i, dimension ) { jQuery.cssHooks[ dimension ] = { get: function( elem, computed, extra ) { if ( computed ) { // Certain elements can have dimension info if we invisibly show them // but it must have a current display style that would benefit return rdisplayswap.test( jQuery.css( elem, "display" ) ) && // Support: Safari 8+ // Table columns in Safari have non-zero offsetWidth & zero // getBoundingClientRect().width unless display is changed. // Support: IE <=11 only // Running getBoundingClientRect on a disconnected node // in IE throws an error. ( !elem.getClientRects().length || !elem.getBoundingClientRect().width ) ? swap( elem, cssShow, function() { return getWidthOrHeight( elem, dimension, extra ); } ) : getWidthOrHeight( elem, dimension, extra ); } }, set: function( elem, value, extra ) { var matches, styles = getStyles( elem ), // Only read styles.position if the test has a chance to fail // to avoid forcing a reflow. scrollboxSizeBuggy = !support.scrollboxSize() && styles.position === "absolute", // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-3991) boxSizingNeeded = scrollboxSizeBuggy || extra, isBorderBox = boxSizingNeeded && jQuery.css( elem, "boxSizing", false, styles ) === "border-box", subtract = extra ? boxModelAdjustment( elem, dimension, extra, isBorderBox, styles ) : 0; // Account for unreliable border-box dimensions by comparing offset* to computed and // faking a content-box to get border and padding (gh-3699) if ( isBorderBox && scrollboxSizeBuggy ) { subtract -= Math.ceil( elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - parseFloat( styles[ dimension ] ) - boxModelAdjustment( elem, dimension, "border", false, styles ) - 0.5 ); } // Convert to pixels if value adjustment is needed if ( subtract && ( matches = rcssNum.exec( value ) ) && ( matches[ 3 ] || "px" ) !== "px" ) { elem.style[ dimension ] = value; value = jQuery.css( elem, dimension ); } return setPositiveNumber( elem, value, subtract ); } }; } ); jQuery.cssHooks.marginLeft = addGetHookIf( support.reliableMarginLeft, function( elem, computed ) { if ( computed ) { return ( parseFloat( curCSS( elem, "marginLeft" ) ) || elem.getBoundingClientRect().left - swap( elem, { marginLeft: 0 }, function() { return elem.getBoundingClientRect().left; } ) ) + "px"; } } ); // These hooks are used by animate to expand properties jQuery.each( { margin: "", padding: "", border: "Width" }, function( prefix, suffix ) { jQuery.cssHooks[ prefix + suffix ] = { expand: function( value ) { var i = 0, expanded = {}, // Assumes a single number if not a string parts = typeof value === "string" ? value.split( " " ) : [ value ]; for ( ; i < 4; i++ ) { expanded[ prefix + cssExpand[ i ] + suffix ] = parts[ i ] || parts[ i - 2 ] || parts[ 0 ]; } return expanded; } }; if ( prefix !== "margin" ) { jQuery.cssHooks[ prefix + suffix ].set = setPositiveNumber; } } ); jQuery.fn.extend( { css: function( name, value ) { return access( this, function( elem, name, value ) { var styles, len, map = {}, i = 0; if ( Array.isArray( name ) ) { styles = getStyles( elem ); len = name.length; for ( ; i < len; i++ ) { map[ name[ i ] ] = jQuery.css( elem, name[ i ], false, styles ); } return map; } return value !== undefined ? jQuery.style( elem, name, value ) : jQuery.css( elem, name ); }, name, value, arguments.length > 1 ); } } ); function Tween( elem, options, prop, end, easing ) { return new Tween.prototype.init( elem, options, prop, end, easing ); } jQuery.Tween = Tween; Tween.prototype = { constructor: Tween, init: function( elem, options, prop, end, easing, unit ) { this.elem = elem; this.prop = prop; this.easing = easing || jQuery.easing._default; this.options = options; this.start = this.now = this.cur(); this.end = end; this.unit = unit || ( jQuery.cssNumber[ prop ] ? "" : "px" ); }, cur: function() { var hooks = Tween.propHooks[ this.prop ]; return hooks && hooks.get ? hooks.get( this ) : Tween.propHooks._default.get( this ); }, run: function( percent ) { var eased, hooks = Tween.propHooks[ this.prop ]; if ( this.options.duration ) { this.pos = eased = jQuery.easing[ this.easing ]( percent, this.options.duration * percent, 0, 1, this.options.duration ); } else { this.pos = eased = percent; } this.now = ( this.end - this.start ) * eased + this.start; if ( this.options.step ) { this.options.step.call( this.elem, this.now, this ); } if ( hooks && hooks.set ) { hooks.set( this ); } else { Tween.propHooks._default.set( this ); } return this; } }; Tween.prototype.init.prototype = Tween.prototype; Tween.propHooks = { _default: { get: function( tween ) { var result; // Use a property on the element directly when it is not a DOM element, // or when there is no matching style property that exists. if ( tween.elem.nodeType !== 1 || tween.elem[ tween.prop ] != null && tween.elem.style[ tween.prop ] == null ) { return tween.elem[ tween.prop ]; } // Passing an empty string as a 3rd parameter to .css will automatically // attempt a parseFloat and fallback to a string if the parse fails. // Simple values such as "10px" are parsed to Float; // complex values such as "rotate(1rad)" are returned as-is. result = jQuery.css( tween.elem, tween.prop, "" ); // Empty strings, null, undefined and "auto" are converted to 0. return !result || result === "auto" ? 0 : result; }, set: function( tween ) { // Use step hook for back compat. // Use cssHook if its there. // Use .style if available and use plain properties where available. if ( jQuery.fx.step[ tween.prop ] ) { jQuery.fx.step[ tween.prop ]( tween ); } else if ( tween.elem.nodeType === 1 && ( jQuery.cssHooks[ tween.prop ] || tween.elem.style[ finalPropName( tween.prop ) ] != null ) ) { jQuery.style( tween.elem, tween.prop, tween.now + tween.unit ); } else { tween.elem[ tween.prop ] = tween.now; } } } }; // Support: IE <=9 only // Panic based approach to setting things on disconnected nodes Tween.propHooks.scrollTop = Tween.propHooks.scrollLeft = { set: function( tween ) { if ( tween.elem.nodeType && tween.elem.parentNode ) { tween.elem[ tween.prop ] = tween.now; } } }; jQuery.easing = { linear: function( p ) { return p; }, swing: function( p ) { return 0.5 - Math.cos( p * Math.PI ) / 2; }, _default: "swing" }; jQuery.fx = Tween.prototype.init; // Back compat <1.8 extension point jQuery.fx.step = {}; var fxNow, inProgress, rfxtypes = /^(?:toggle|show|hide)$/, rrun = /queueHooks$/; function schedule() { if ( inProgress ) { if ( document.hidden === false && window.requestAnimationFrame ) { window.requestAnimationFrame( schedule ); } else { window.setTimeout( schedule, jQuery.fx.interval ); } jQuery.fx.tick(); } } // Animations created synchronously will run synchronously function createFxNow() { window.setTimeout( function() { fxNow = undefined; } ); return ( fxNow = Date.now() ); } // Generate parameters to create a standard animation function genFx( type, includeWidth ) { var which, i = 0, attrs = { height: type }; // If we include width, step value is 1 to do all cssExpand values, // otherwise step value is 2 to skip over Left and Right includeWidth = includeWidth ? 1 : 0; for ( ; i < 4; i += 2 - includeWidth ) { which = cssExpand[ i ]; attrs[ "margin" + which ] = attrs[ "padding" + which ] = type; } if ( includeWidth ) { attrs.opacity = attrs.width = type; } return attrs; } function createTween( value, prop, animation ) { var tween, collection = ( Animation.tweeners[ prop ] || [] ).concat( Animation.tweeners[ "*" ] ), index = 0, length = collection.length; for ( ; index < length; index++ ) { if ( ( tween = collection[ index ].call( animation, prop, value ) ) ) { // We're done with this property return tween; } } } function defaultPrefilter( elem, props, opts ) { var prop, value, toggle, hooks, oldfire, propTween, restoreDisplay, display, isBox = "width" in props || "height" in props, anim = this, orig = {}, style = elem.style, hidden = elem.nodeType && isHiddenWithinTree( elem ), dataShow = dataPriv.get( elem, "fxshow" ); // Queue-skipping animations hijack the fx hooks if ( !opts.queue ) { hooks = jQuery._queueHooks( elem, "fx" ); if ( hooks.unqueued == null ) { hooks.unqueued = 0; oldfire = hooks.empty.fire; hooks.empty.fire = function() { if ( !hooks.unqueued ) { oldfire(); } }; } hooks.unqueued++; anim.always( function() { // Ensure the complete handler is called before this completes anim.always( function() { hooks.unqueued--; if ( !jQuery.queue( elem, "fx" ).length ) { hooks.empty.fire(); } } ); } ); } // Detect show/hide animations for ( prop in props ) { value = props[ prop ]; if ( rfxtypes.test( value ) ) { delete props[ prop ]; toggle = toggle || value === "toggle"; if ( value === ( hidden ? "hide" : "show" ) ) { // Pretend to be hidden if this is a "show" and // there is still data from a stopped show/hide if ( value === "show" && dataShow && dataShow[ prop ] !== undefined ) { hidden = true; // Ignore all other no-op show/hide data } else { continue; } } orig[ prop ] = dataShow && dataShow[ prop ] || jQuery.style( elem, prop ); } } // Bail out if this is a no-op like .hide().hide() propTween = !jQuery.isEmptyObject( props ); if ( !propTween && jQuery.isEmptyObject( orig ) ) { return; } // Restrict "overflow" and "display" styles during box animations if ( isBox && elem.nodeType === 1 ) { // Support: IE <=9 - 11, Edge 12 - 15 // Record all 3 overflow attributes because IE does not infer the shorthand // from identically-valued overflowX and overflowY and Edge just mirrors // the overflowX value there. opts.overflow = [ style.overflow, style.overflowX, style.overflowY ]; // Identify a display type, preferring old show/hide data over the CSS cascade restoreDisplay = dataShow && dataShow.display; if ( restoreDisplay == null ) { restoreDisplay = dataPriv.get( elem, "display" ); } display = jQuery.css( elem, "display" ); if ( display === "none" ) { if ( restoreDisplay ) { display = restoreDisplay; } else { // Get nonempty value(s) by temporarily forcing visibility showHide( [ elem ], true ); restoreDisplay = elem.style.display || restoreDisplay; display = jQuery.css( elem, "display" ); showHide( [ elem ] ); } } // Animate inline elements as inline-block if ( display === "inline" || display === "inline-block" && restoreDisplay != null ) { if ( jQuery.css( elem, "float" ) === "none" ) { // Restore the original display value at the end of pure show/hide animations if ( !propTween ) { anim.done( function() { style.display = restoreDisplay; } ); if ( restoreDisplay == null ) { display = style.display; restoreDisplay = display === "none" ? "" : display; } } style.display = "inline-block"; } } } if ( opts.overflow ) { style.overflow = "hidden"; anim.always( function() { style.overflow = opts.overflow[ 0 ]; style.overflowX = opts.overflow[ 1 ]; style.overflowY = opts.overflow[ 2 ]; } ); } // Implement show/hide animations propTween = false; for ( prop in orig ) { // General show/hide setup for this element animation if ( !propTween ) { if ( dataShow ) { if ( "hidden" in dataShow ) { hidden = dataShow.hidden; } } else { dataShow = dataPriv.access( elem, "fxshow", { display: restoreDisplay } ); } // Store hidden/visible for toggle so `.stop().toggle()` "reverses" if ( toggle ) { dataShow.hidden = !hidden; } // Show elements before animating them if ( hidden ) { showHide( [ elem ], true ); } /* eslint-disable no-loop-func */ anim.done( function() { /* eslint-enable no-loop-func */ // The final step of a "hide" animation is actually hiding the element if ( !hidden ) { showHide( [ elem ] ); } dataPriv.remove( elem, "fxshow" ); for ( prop in orig ) { jQuery.style( elem, prop, orig[ prop ] ); } } ); } // Per-property setup propTween = createTween( hidden ? dataShow[ prop ] : 0, prop, anim ); if ( !( prop in dataShow ) ) { dataShow[ prop ] = propTween.start; if ( hidden ) { propTween.end = propTween.start; propTween.start = 0; } } } } function propFilter( props, specialEasing ) { var index, name, easing, value, hooks; // camelCase, specialEasing and expand cssHook pass for ( index in props ) { name = camelCase( index ); easing = specialEasing[ name ]; value = props[ index ]; if ( Array.isArray( value ) ) { easing = value[ 1 ]; value = props[ index ] = value[ 0 ]; } if ( index !== name ) { props[ name ] = value; delete props[ index ]; } hooks = jQuery.cssHooks[ name ]; if ( hooks && "expand" in hooks ) { value = hooks.expand( value ); delete props[ name ]; // Not quite $.extend, this won't overwrite existing keys. // Reusing 'index' because we have the correct "name" for ( index in value ) { if ( !( index in props ) ) { props[ index ] = value[ index ]; specialEasing[ index ] = easing; } } } else { specialEasing[ name ] = easing; } } } function Animation( elem, properties, options ) { var result, stopped, index = 0, length = Animation.prefilters.length, deferred = jQuery.Deferred().always( function() { // Don't match elem in the :animated selector delete tick.elem; } ), tick = function() { if ( stopped ) { return false; } var currentTime = fxNow || createFxNow(), remaining = Math.max( 0, animation.startTime + animation.duration - currentTime ), // Support: Android 2.3 only // Archaic crash bug won't allow us to use `1 - ( 0.5 || 0 )` (#12497) temp = remaining / animation.duration || 0, percent = 1 - temp, index = 0, length = animation.tweens.length; for ( ; index < length; index++ ) { animation.tweens[ index ].run( percent ); } deferred.notifyWith( elem, [ animation, percent, remaining ] ); // If there's more to do, yield if ( percent < 1 && length ) { return remaining; } // If this was an empty animation, synthesize a final progress notification if ( !length ) { deferred.notifyWith( elem, [ animation, 1, 0 ] ); } // Resolve the animation and report its conclusion deferred.resolveWith( elem, [ animation ] ); return false; }, animation = deferred.promise( { elem: elem, props: jQuery.extend( {}, properties ), opts: jQuery.extend( true, { specialEasing: {}, easing: jQuery.easing._default }, options ), originalProperties: properties, originalOptions: options, startTime: fxNow || createFxNow(), duration: options.duration, tweens: [], createTween: function( prop, end ) { var tween = jQuery.Tween( elem, animation.opts, prop, end, animation.opts.specialEasing[ prop ] || animation.opts.easing ); animation.tweens.push( tween ); return tween; }, stop: function( gotoEnd ) { var index = 0, // If we are going to the end, we want to run all the tweens // otherwise we skip this part length = gotoEnd ? animation.tweens.length : 0; if ( stopped ) { return this; } stopped = true; for ( ; index < length; index++ ) { animation.tweens[ index ].run( 1 ); } // Resolve when we played the last frame; otherwise, reject if ( gotoEnd ) { deferred.notifyWith( elem, [ animation, 1, 0 ] ); deferred.resolveWith( elem, [ animation, gotoEnd ] ); } else { deferred.rejectWith( elem, [ animation, gotoEnd ] ); } return this; } } ), props = animation.props; propFilter( props, animation.opts.specialEasing ); for ( ; index < length; index++ ) { result = Animation.prefilters[ index ].call( animation, elem, props, animation.opts ); if ( result ) { if ( isFunction( result.stop ) ) { jQuery._queueHooks( animation.elem, animation.opts.queue ).stop = result.stop.bind( result ); } return result; } } jQuery.map( props, createTween, animation ); if ( isFunction( animation.opts.start ) ) { animation.opts.start.call( elem, animation ); } // Attach callbacks from options animation .progress( animation.opts.progress ) .done( animation.opts.done, animation.opts.complete ) .fail( animation.opts.fail ) .always( animation.opts.always ); jQuery.fx.timer( jQuery.extend( tick, { elem: elem, anim: animation, queue: animation.opts.queue } ) ); return animation; } jQuery.Animation = jQuery.extend( Animation, { tweeners: { "*": [ function( prop, value ) { var tween = this.createTween( prop, value ); adjustCSS( tween.elem, prop, rcssNum.exec( value ), tween ); return tween; } ] }, tweener: function( props, callback ) { if ( isFunction( props ) ) { callback = props; props = [ "*" ]; } else { props = props.match( rnothtmlwhite ); } var prop, index = 0, length = props.length; for ( ; index < length; index++ ) { prop = props[ index ]; Animation.tweeners[ prop ] = Animation.tweeners[ prop ] || []; Animation.tweeners[ prop ].unshift( callback ); } }, prefilters: [ defaultPrefilter ], prefilter: function( callback, prepend ) { if ( prepend ) { Animation.prefilters.unshift( callback ); } else { Animation.prefilters.push( callback ); } } } ); jQuery.speed = function( speed, easing, fn ) { var opt = speed && typeof speed === "object" ? jQuery.extend( {}, speed ) : { complete: fn || !fn && easing || isFunction( speed ) && speed, duration: speed, easing: fn && easing || easing && !isFunction( easing ) && easing }; // Go to the end state if fx are off if ( jQuery.fx.off ) { opt.duration = 0; } else { if ( typeof opt.duration !== "number" ) { if ( opt.duration in jQuery.fx.speeds ) { opt.duration = jQuery.fx.speeds[ opt.duration ]; } else { opt.duration = jQuery.fx.speeds._default; } } } // Normalize opt.queue - true/undefined/null -> "fx" if ( opt.queue == null || opt.queue === true ) { opt.queue = "fx"; } // Queueing opt.old = opt.complete; opt.complete = function() { if ( isFunction( opt.old ) ) { opt.old.call( this ); } if ( opt.queue ) { jQuery.dequeue( this, opt.queue ); } }; return opt; }; jQuery.fn.extend( { fadeTo: function( speed, to, easing, callback ) { // Show any hidden elements after setting opacity to 0 return this.filter( isHiddenWithinTree ).css( "opacity", 0 ).show() // Animate to the value specified .end().animate( { opacity: to }, speed, easing, callback ); }, animate: function( prop, speed, easing, callback ) { var empty = jQuery.isEmptyObject( prop ), optall = jQuery.speed( speed, easing, callback ), doAnimation = function() { // Operate on a copy of prop so per-property easing won't be lost var anim = Animation( this, jQuery.extend( {}, prop ), optall ); // Empty animations, or finishing resolves immediately if ( empty || dataPriv.get( this, "finish" ) ) { anim.stop( true ); } }; doAnimation.finish = doAnimation; return empty || optall.queue === false ? this.each( doAnimation ) : this.queue( optall.queue, doAnimation ); }, stop: function( type, clearQueue, gotoEnd ) { var stopQueue = function( hooks ) { var stop = hooks.stop; delete hooks.stop; stop( gotoEnd ); }; if ( typeof type !== "string" ) { gotoEnd = clearQueue; clearQueue = type; type = undefined; } if ( clearQueue ) { this.queue( type || "fx", [] ); } return this.each( function() { var dequeue = true, index = type != null && type + "queueHooks", timers = jQuery.timers, data = dataPriv.get( this ); if ( index ) { if ( data[ index ] && data[ index ].stop ) { stopQueue( data[ index ] ); } } else { for ( index in data ) { if ( data[ index ] && data[ index ].stop && rrun.test( index ) ) { stopQueue( data[ index ] ); } } } for ( index = timers.length; index--; ) { if ( timers[ index ].elem === this && ( type == null || timers[ index ].queue === type ) ) { timers[ index ].anim.stop( gotoEnd ); dequeue = false; timers.splice( index, 1 ); } } // Start the next in the queue if the last step wasn't forced. // Timers currently will call their complete callbacks, which // will dequeue but only if they were gotoEnd. if ( dequeue || !gotoEnd ) { jQuery.dequeue( this, type ); } } ); }, finish: function( type ) { if ( type !== false ) { type = type || "fx"; } return this.each( function() { var index, data = dataPriv.get( this ), queue = data[ type + "queue" ], hooks = data[ type + "queueHooks" ], timers = jQuery.timers, length = queue ? queue.length : 0; // Enable finishing flag on private data data.finish = true; // Empty the queue first jQuery.queue( this, type, [] ); if ( hooks && hooks.stop ) { hooks.stop.call( this, true ); } // Look for any active animations, and finish them for ( index = timers.length; index--; ) { if ( timers[ index ].elem === this && timers[ index ].queue === type ) { timers[ index ].anim.stop( true ); timers.splice( index, 1 ); } } // Look for any animations in the old queue and finish them for ( index = 0; index < length; index++ ) { if ( queue[ index ] && queue[ index ].finish ) { queue[ index ].finish.call( this ); } } // Turn off finishing flag delete data.finish; } ); } } ); jQuery.each( [ "toggle", "show", "hide" ], function( _i, name ) { var cssFn = jQuery.fn[ name ]; jQuery.fn[ name ] = function( speed, easing, callback ) { return speed == null || typeof speed === "boolean" ? cssFn.apply( this, arguments ) : this.animate( genFx( name, true ), speed, easing, callback ); }; } ); // Generate shortcuts for custom animations jQuery.each( { slideDown: genFx( "show" ), slideUp: genFx( "hide" ), slideToggle: genFx( "toggle" ), fadeIn: { opacity: "show" }, fadeOut: { opacity: "hide" }, fadeToggle: { opacity: "toggle" } }, function( name, props ) { jQuery.fn[ name ] = function( speed, easing, callback ) { return this.animate( props, speed, easing, callback ); }; } ); jQuery.timers = []; jQuery.fx.tick = function() { var timer, i = 0, timers = jQuery.timers; fxNow = Date.now(); for ( ; i < timers.length; i++ ) { timer = timers[ i ]; // Run the timer and safely remove it when done (allowing for external removal) if ( !timer() && timers[ i ] === timer ) { timers.splice( i--, 1 ); } } if ( !timers.length ) { jQuery.fx.stop(); } fxNow = undefined; }; jQuery.fx.timer = function( timer ) { jQuery.timers.push( timer ); jQuery.fx.start(); }; jQuery.fx.interval = 13; jQuery.fx.start = function() { if ( inProgress ) { return; } inProgress = true; schedule(); }; jQuery.fx.stop = function() { inProgress = null; }; jQuery.fx.speeds = { slow: 600, fast: 200, // Default speed _default: 400 }; // Based off of the plugin by Clint Helfers, with permission. // https://web.archive.org/web/20100324014747/http://blindsignals.com/index.php/2009/07/jquery-delay/ jQuery.fn.delay = function( time, type ) { time = jQuery.fx ? jQuery.fx.speeds[ time ] || time : time; type = type || "fx"; return this.queue( type, function( next, hooks ) { var timeout = window.setTimeout( next, time ); hooks.stop = function() { window.clearTimeout( timeout ); }; } ); }; ( function() { var input = document.createElement( "input" ), select = document.createElement( "select" ), opt = select.appendChild( document.createElement( "option" ) ); input.type = "checkbox"; // Support: Android <=4.3 only // Default value for a checkbox should be "on" support.checkOn = input.value !== ""; // Support: IE <=11 only // Must access selectedIndex to make default options select support.optSelected = opt.selected; // Support: IE <=11 only // An input loses its value after becoming a radio input = document.createElement( "input" ); input.value = "t"; input.type = "radio"; support.radioValue = input.value === "t"; } )(); var boolHook, attrHandle = jQuery.expr.attrHandle; jQuery.fn.extend( { attr: function( name, value ) { return access( this, jQuery.attr, name, value, arguments.length > 1 ); }, removeAttr: function( name ) { return this.each( function() { jQuery.removeAttr( this, name ); } ); } } ); jQuery.extend( { attr: function( elem, name, value ) { var ret, hooks, nType = elem.nodeType; // Don't get/set attributes on text, comment and attribute nodes if ( nType === 3 || nType === 8 || nType === 2 ) { return; } // Fallback to prop when attributes are not supported if ( typeof elem.getAttribute === "undefined" ) { return jQuery.prop( elem, name, value ); } // Attribute hooks are determined by the lowercase version // Grab necessary hook if one is defined if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { hooks = jQuery.attrHooks[ name.toLowerCase() ] || ( jQuery.expr.match.bool.test( name ) ? boolHook : undefined ); } if ( value !== undefined ) { if ( value === null ) { jQuery.removeAttr( elem, name ); return; } if ( hooks && "set" in hooks && ( ret = hooks.set( elem, value, name ) ) !== undefined ) { return ret; } elem.setAttribute( name, value + "" ); return value; } if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { return ret; } ret = jQuery.find.attr( elem, name ); // Non-existent attributes return null, we normalize to undefined return ret == null ? undefined : ret; }, attrHooks: { type: { set: function( elem, value ) { if ( !support.radioValue && value === "radio" && nodeName( elem, "input" ) ) { var val = elem.value; elem.setAttribute( "type", value ); if ( val ) { elem.value = val; } return value; } } } }, removeAttr: function( elem, value ) { var name, i = 0, // Attribute names can contain non-HTML whitespace characters // https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 attrNames = value && value.match( rnothtmlwhite ); if ( attrNames && elem.nodeType === 1 ) { while ( ( name = attrNames[ i++ ] ) ) { elem.removeAttribute( name ); } } } } ); // Hooks for boolean attributes boolHook = { set: function( elem, value, name ) { if ( value === false ) { // Remove boolean attributes when set to false jQuery.removeAttr( elem, name ); } else { elem.setAttribute( name, name ); } return name; } }; jQuery.each( jQuery.expr.match.bool.source.match( /\w+/g ), function( _i, name ) { var getter = attrHandle[ name ] || jQuery.find.attr; attrHandle[ name ] = function( elem, name, isXML ) { var ret, handle, lowercaseName = name.toLowerCase(); if ( !isXML ) { // Avoid an infinite loop by temporarily removing this function from the getter handle = attrHandle[ lowercaseName ]; attrHandle[ lowercaseName ] = ret; ret = getter( elem, name, isXML ) != null ? lowercaseName : null; attrHandle[ lowercaseName ] = handle; } return ret; }; } ); var rfocusable = /^(?:input|select|textarea|button)$/i, rclickable = /^(?:a|area)$/i; jQuery.fn.extend( { prop: function( name, value ) { return access( this, jQuery.prop, name, value, arguments.length > 1 ); }, removeProp: function( name ) { return this.each( function() { delete this[ jQuery.propFix[ name ] || name ]; } ); } } ); jQuery.extend( { prop: function( elem, name, value ) { var ret, hooks, nType = elem.nodeType; // Don't get/set properties on text, comment and attribute nodes if ( nType === 3 || nType === 8 || nType === 2 ) { return; } if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { // Fix name and attach hooks name = jQuery.propFix[ name ] || name; hooks = jQuery.propHooks[ name ]; } if ( value !== undefined ) { if ( hooks && "set" in hooks && ( ret = hooks.set( elem, value, name ) ) !== undefined ) { return ret; } return ( elem[ name ] = value ); } if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { return ret; } return elem[ name ]; }, propHooks: { tabIndex: { get: function( elem ) { // Support: IE <=9 - 11 only // elem.tabIndex doesn't always return the // correct value when it hasn't been explicitly set // https://web.archive.org/web/20141116233347/http://fluidproject.org/blog/2008/01/09/getting-setting-and-removing-tabindex-values-with-javascript/ // Use proper attribute retrieval(#12072) var tabindex = jQuery.find.attr( elem, "tabindex" ); if ( tabindex ) { return parseInt( tabindex, 10 ); } if ( rfocusable.test( elem.nodeName ) || rclickable.test( elem.nodeName ) && elem.href ) { return 0; } return -1; } } }, propFix: { "for": "htmlFor", "class": "className" } } ); // Support: IE <=11 only // Accessing the selectedIndex property // forces the browser to respect setting selected // on the option // The getter ensures a default option is selected // when in an optgroup // eslint rule "no-unused-expressions" is disabled for this code // since it considers such accessions noop if ( !support.optSelected ) { jQuery.propHooks.selected = { get: function( elem ) { /* eslint no-unused-expressions: "off" */ var parent = elem.parentNode; if ( parent && parent.parentNode ) { parent.parentNode.selectedIndex; } return null; }, set: function( elem ) { /* eslint no-unused-expressions: "off" */ var parent = elem.parentNode; if ( parent ) { parent.selectedIndex; if ( parent.parentNode ) { parent.parentNode.selectedIndex; } } } }; } jQuery.each( [ "tabIndex", "readOnly", "maxLength", "cellSpacing", "cellPadding", "rowSpan", "colSpan", "useMap", "frameBorder", "contentEditable" ], function() { jQuery.propFix[ this.toLowerCase() ] = this; } ); // Strip and collapse whitespace according to HTML spec // https://infra.spec.whatwg.org/#strip-and-collapse-ascii-whitespace function stripAndCollapse( value ) { var tokens = value.match( rnothtmlwhite ) || []; return tokens.join( " " ); } function getClass( elem ) { return elem.getAttribute && elem.getAttribute( "class" ) || ""; } function classesToArray( value ) { if ( Array.isArray( value ) ) { return value; } if ( typeof value === "string" ) { return value.match( rnothtmlwhite ) || []; } return []; } jQuery.fn.extend( { addClass: function( value ) { var classes, elem, cur, curValue, clazz, j, finalValue, i = 0; if ( isFunction( value ) ) { return this.each( function( j ) { jQuery( this ).addClass( value.call( this, j, getClass( this ) ) ); } ); } classes = classesToArray( value ); if ( classes.length ) { while ( ( elem = this[ i++ ] ) ) { curValue = getClass( elem ); cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); if ( cur ) { j = 0; while ( ( clazz = classes[ j++ ] ) ) { if ( cur.indexOf( " " + clazz + " " ) < 0 ) { cur += clazz + " "; } } // Only assign if different to avoid unneeded rendering. finalValue = stripAndCollapse( cur ); if ( curValue !== finalValue ) { elem.setAttribute( "class", finalValue ); } } } } return this; }, removeClass: function( value ) { var classes, elem, cur, curValue, clazz, j, finalValue, i = 0; if ( isFunction( value ) ) { return this.each( function( j ) { jQuery( this ).removeClass( value.call( this, j, getClass( this ) ) ); } ); } if ( !arguments.length ) { return this.attr( "class", "" ); } classes = classesToArray( value ); if ( classes.length ) { while ( ( elem = this[ i++ ] ) ) { curValue = getClass( elem ); // This expression is here for better compressibility (see addClass) cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); if ( cur ) { j = 0; while ( ( clazz = classes[ j++ ] ) ) { // Remove *all* instances while ( cur.indexOf( " " + clazz + " " ) > -1 ) { cur = cur.replace( " " + clazz + " ", " " ); } } // Only assign if different to avoid unneeded rendering. finalValue = stripAndCollapse( cur ); if ( curValue !== finalValue ) { elem.setAttribute( "class", finalValue ); } } } } return this; }, toggleClass: function( value, stateVal ) { var type = typeof value, isValidValue = type === "string" || Array.isArray( value ); if ( typeof stateVal === "boolean" && isValidValue ) { return stateVal ? this.addClass( value ) : this.removeClass( value ); } if ( isFunction( value ) ) { return this.each( function( i ) { jQuery( this ).toggleClass( value.call( this, i, getClass( this ), stateVal ), stateVal ); } ); } return this.each( function() { var className, i, self, classNames; if ( isValidValue ) { // Toggle individual class names i = 0; self = jQuery( this ); classNames = classesToArray( value ); while ( ( className = classNames[ i++ ] ) ) { // Check each className given, space separated list if ( self.hasClass( className ) ) { self.removeClass( className ); } else { self.addClass( className ); } } // Toggle whole class name } else if ( value === undefined || type === "boolean" ) { className = getClass( this ); if ( className ) { // Store className if set dataPriv.set( this, "__className__", className ); } // If the element has a class name or if we're passed `false`, // then remove the whole classname (if there was one, the above saved it). // Otherwise bring back whatever was previously saved (if anything), // falling back to the empty string if nothing was stored. if ( this.setAttribute ) { this.setAttribute( "class", className || value === false ? "" : dataPriv.get( this, "__className__" ) || "" ); } } } ); }, hasClass: function( selector ) { var className, elem, i = 0; className = " " + selector + " "; while ( ( elem = this[ i++ ] ) ) { if ( elem.nodeType === 1 && ( " " + stripAndCollapse( getClass( elem ) ) + " " ).indexOf( className ) > -1 ) { return true; } } return false; } } ); var rreturn = /\r/g; jQuery.fn.extend( { val: function( value ) { var hooks, ret, valueIsFunction, elem = this[ 0 ]; if ( !arguments.length ) { if ( elem ) { hooks = jQuery.valHooks[ elem.type ] || jQuery.valHooks[ elem.nodeName.toLowerCase() ]; if ( hooks && "get" in hooks && ( ret = hooks.get( elem, "value" ) ) !== undefined ) { return ret; } ret = elem.value; // Handle most common string cases if ( typeof ret === "string" ) { return ret.replace( rreturn, "" ); } // Handle cases where value is null/undef or number return ret == null ? "" : ret; } return; } valueIsFunction = isFunction( value ); return this.each( function( i ) { var val; if ( this.nodeType !== 1 ) { return; } if ( valueIsFunction ) { val = value.call( this, i, jQuery( this ).val() ); } else { val = value; } // Treat null/undefined as ""; convert numbers to string if ( val == null ) { val = ""; } else if ( typeof val === "number" ) { val += ""; } else if ( Array.isArray( val ) ) { val = jQuery.map( val, function( value ) { return value == null ? "" : value + ""; } ); } hooks = jQuery.valHooks[ this.type ] || jQuery.valHooks[ this.nodeName.toLowerCase() ]; // If set returns undefined, fall back to normal setting if ( !hooks || !( "set" in hooks ) || hooks.set( this, val, "value" ) === undefined ) { this.value = val; } } ); } } ); jQuery.extend( { valHooks: { option: { get: function( elem ) { var val = jQuery.find.attr( elem, "value" ); return val != null ? val : // Support: IE <=10 - 11 only // option.text throws exceptions (#14686, #14858) // Strip and collapse whitespace // https://html.spec.whatwg.org/#strip-and-collapse-whitespace stripAndCollapse( jQuery.text( elem ) ); } }, select: { get: function( elem ) { var value, option, i, options = elem.options, index = elem.selectedIndex, one = elem.type === "select-one", values = one ? null : [], max = one ? index + 1 : options.length; if ( index < 0 ) { i = max; } else { i = one ? index : 0; } // Loop through all the selected options for ( ; i < max; i++ ) { option = options[ i ]; // Support: IE <=9 only // IE8-9 doesn't update selected after form reset (#2551) if ( ( option.selected || i === index ) && // Don't return options that are disabled or in a disabled optgroup !option.disabled && ( !option.parentNode.disabled || !nodeName( option.parentNode, "optgroup" ) ) ) { // Get the specific value for the option value = jQuery( option ).val(); // We don't need an array for one selects if ( one ) { return value; } // Multi-Selects return an array values.push( value ); } } return values; }, set: function( elem, value ) { var optionSet, option, options = elem.options, values = jQuery.makeArray( value ), i = options.length; while ( i-- ) { option = options[ i ]; /* eslint-disable no-cond-assign */ if ( option.selected = jQuery.inArray( jQuery.valHooks.option.get( option ), values ) > -1 ) { optionSet = true; } /* eslint-enable no-cond-assign */ } // Force browsers to behave consistently when non-matching value is set if ( !optionSet ) { elem.selectedIndex = -1; } return values; } } } } ); // Radios and checkboxes getter/setter jQuery.each( [ "radio", "checkbox" ], function() { jQuery.valHooks[ this ] = { set: function( elem, value ) { if ( Array.isArray( value ) ) { return ( elem.checked = jQuery.inArray( jQuery( elem ).val(), value ) > -1 ); } } }; if ( !support.checkOn ) { jQuery.valHooks[ this ].get = function( elem ) { return elem.getAttribute( "value" ) === null ? "on" : elem.value; }; } } ); // Return jQuery for attributes-only inclusion support.focusin = "onfocusin" in window; var rfocusMorph = /^(?:focusinfocus|focusoutblur)$/, stopPropagationCallback = function( e ) { e.stopPropagation(); }; jQuery.extend( jQuery.event, { trigger: function( event, data, elem, onlyHandlers ) { var i, cur, tmp, bubbleType, ontype, handle, special, lastElement, eventPath = [ elem || document ], type = hasOwn.call( event, "type" ) ? event.type : event, namespaces = hasOwn.call( event, "namespace" ) ? event.namespace.split( "." ) : []; cur = lastElement = tmp = elem = elem || document; // Don't do events on text and comment nodes if ( elem.nodeType === 3 || elem.nodeType === 8 ) { return; } // focus/blur morphs to focusin/out; ensure we're not firing them right now if ( rfocusMorph.test( type + jQuery.event.triggered ) ) { return; } if ( type.indexOf( "." ) > -1 ) { // Namespaced trigger; create a regexp to match event type in handle() namespaces = type.split( "." ); type = namespaces.shift(); namespaces.sort(); } ontype = type.indexOf( ":" ) < 0 && "on" + type; // Caller can pass in a jQuery.Event object, Object, or just an event type string event = event[ jQuery.expando ] ? event : new jQuery.Event( type, typeof event === "object" && event ); // Trigger bitmask: & 1 for native handlers; & 2 for jQuery (always true) event.isTrigger = onlyHandlers ? 2 : 3; event.namespace = namespaces.join( "." ); event.rnamespace = event.namespace ? new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ) : null; // Clean up the event in case it is being reused event.result = undefined; if ( !event.target ) { event.target = elem; } // Clone any incoming data and prepend the event, creating the handler arg list data = data == null ? [ event ] : jQuery.makeArray( data, [ event ] ); // Allow special events to draw outside the lines special = jQuery.event.special[ type ] || {}; if ( !onlyHandlers && special.trigger && special.trigger.apply( elem, data ) === false ) { return; } // Determine event propagation path in advance, per W3C events spec (#9951) // Bubble up to document, then to window; watch for a global ownerDocument var (#9724) if ( !onlyHandlers && !special.noBubble && !isWindow( elem ) ) { bubbleType = special.delegateType || type; if ( !rfocusMorph.test( bubbleType + type ) ) { cur = cur.parentNode; } for ( ; cur; cur = cur.parentNode ) { eventPath.push( cur ); tmp = cur; } // Only add window if we got to document (e.g., not plain obj or detached DOM) if ( tmp === ( elem.ownerDocument || document ) ) { eventPath.push( tmp.defaultView || tmp.parentWindow || window ); } } // Fire handlers on the event path i = 0; while ( ( cur = eventPath[ i++ ] ) && !event.isPropagationStopped() ) { lastElement = cur; event.type = i > 1 ? bubbleType : special.bindType || type; // jQuery handler handle = ( dataPriv.get( cur, "events" ) || Object.create( null ) )[ event.type ] && dataPriv.get( cur, "handle" ); if ( handle ) { handle.apply( cur, data ); } // Native handler handle = ontype && cur[ ontype ]; if ( handle && handle.apply && acceptData( cur ) ) { event.result = handle.apply( cur, data ); if ( event.result === false ) { event.preventDefault(); } } } event.type = type; // If nobody prevented the default action, do it now if ( !onlyHandlers && !event.isDefaultPrevented() ) { if ( ( !special._default || special._default.apply( eventPath.pop(), data ) === false ) && acceptData( elem ) ) { // Call a native DOM method on the target with the same name as the event. // Don't do default actions on window, that's where global variables be (#6170) if ( ontype && isFunction( elem[ type ] ) && !isWindow( elem ) ) { // Don't re-trigger an onFOO event when we call its FOO() method tmp = elem[ ontype ]; if ( tmp ) { elem[ ontype ] = null; } // Prevent re-triggering of the same event, since we already bubbled it above jQuery.event.triggered = type; if ( event.isPropagationStopped() ) { lastElement.addEventListener( type, stopPropagationCallback ); } elem[ type ](); if ( event.isPropagationStopped() ) { lastElement.removeEventListener( type, stopPropagationCallback ); } jQuery.event.triggered = undefined; if ( tmp ) { elem[ ontype ] = tmp; } } } } return event.result; }, // Piggyback on a donor event to simulate a different one // Used only for `focus(in | out)` events simulate: function( type, elem, event ) { var e = jQuery.extend( new jQuery.Event(), event, { type: type, isSimulated: true } ); jQuery.event.trigger( e, null, elem ); } } ); jQuery.fn.extend( { trigger: function( type, data ) { return this.each( function() { jQuery.event.trigger( type, data, this ); } ); }, triggerHandler: function( type, data ) { var elem = this[ 0 ]; if ( elem ) { return jQuery.event.trigger( type, data, elem, true ); } } } ); // Support: Firefox <=44 // Firefox doesn't have focus(in | out) events // Related ticket - https://bugzilla.mozilla.org/show_bug.cgi?id=687787 // // Support: Chrome <=48 - 49, Safari <=9.0 - 9.1 // focus(in | out) events fire after focus & blur events, // which is spec violation - http://www.w3.org/TR/DOM-Level-3-Events/#events-focusevent-event-order // Related ticket - https://bugs.chromium.org/p/chromium/issues/detail?id=449857 if ( !support.focusin ) { jQuery.each( { focus: "focusin", blur: "focusout" }, function( orig, fix ) { // Attach a single capturing handler on the document while someone wants focusin/focusout var handler = function( event ) { jQuery.event.simulate( fix, event.target, jQuery.event.fix( event ) ); }; jQuery.event.special[ fix ] = { setup: function() { // Handle: regular nodes (via `this.ownerDocument`), window // (via `this.document`) & document (via `this`). var doc = this.ownerDocument || this.document || this, attaches = dataPriv.access( doc, fix ); if ( !attaches ) { doc.addEventListener( orig, handler, true ); } dataPriv.access( doc, fix, ( attaches || 0 ) + 1 ); }, teardown: function() { var doc = this.ownerDocument || this.document || this, attaches = dataPriv.access( doc, fix ) - 1; if ( !attaches ) { doc.removeEventListener( orig, handler, true ); dataPriv.remove( doc, fix ); } else { dataPriv.access( doc, fix, attaches ); } } }; } ); } var location = window.location; var nonce = { guid: Date.now() }; var rquery = ( /\?/ ); // Cross-browser xml parsing jQuery.parseXML = function( data ) { var xml, parserErrorElem; if ( !data || typeof data !== "string" ) { return null; } // Support: IE 9 - 11 only // IE throws on parseFromString with invalid input. try { xml = ( new window.DOMParser() ).parseFromString( data, "text/xml" ); } catch ( e ) {} parserErrorElem = xml && xml.getElementsByTagName( "parsererror" )[ 0 ]; if ( !xml || parserErrorElem ) { jQuery.error( "Invalid XML: " + ( parserErrorElem ? jQuery.map( parserErrorElem.childNodes, function( el ) { return el.textContent; } ).join( "\n" ) : data ) ); } return xml; }; var rbracket = /\[\]$/, rCRLF = /\r?\n/g, rsubmitterTypes = /^(?:submit|button|image|reset|file)$/i, rsubmittable = /^(?:input|select|textarea|keygen)/i; function buildParams( prefix, obj, traditional, add ) { var name; if ( Array.isArray( obj ) ) { // Serialize array item. jQuery.each( obj, function( i, v ) { if ( traditional || rbracket.test( prefix ) ) { // Treat each array item as a scalar. add( prefix, v ); } else { // Item is non-scalar (array or object), encode its numeric index. buildParams( prefix + "[" + ( typeof v === "object" && v != null ? i : "" ) + "]", v, traditional, add ); } } ); } else if ( !traditional && toType( obj ) === "object" ) { // Serialize object item. for ( name in obj ) { buildParams( prefix + "[" + name + "]", obj[ name ], traditional, add ); } } else { // Serialize scalar item. add( prefix, obj ); } } // Serialize an array of form elements or a set of // key/values into a query string jQuery.param = function( a, traditional ) { var prefix, s = [], add = function( key, valueOrFunction ) { // If value is a function, invoke it and use its return value var value = isFunction( valueOrFunction ) ? valueOrFunction() : valueOrFunction; s[ s.length ] = encodeURIComponent( key ) + "=" + encodeURIComponent( value == null ? "" : value ); }; if ( a == null ) { return ""; } // If an array was passed in, assume that it is an array of form elements. if ( Array.isArray( a ) || ( a.jquery && !jQuery.isPlainObject( a ) ) ) { // Serialize the form elements jQuery.each( a, function() { add( this.name, this.value ); } ); } else { // If traditional, encode the "old" way (the way 1.3.2 or older // did it), otherwise encode params recursively. for ( prefix in a ) { buildParams( prefix, a[ prefix ], traditional, add ); } } // Return the resulting serialization return s.join( "&" ); }; jQuery.fn.extend( { serialize: function() { return jQuery.param( this.serializeArray() ); }, serializeArray: function() { return this.map( function() { // Can add propHook for "elements" to filter or add form elements var elements = jQuery.prop( this, "elements" ); return elements ? jQuery.makeArray( elements ) : this; } ).filter( function() { var type = this.type; // Use .is( ":disabled" ) so that fieldset[disabled] works return this.name && !jQuery( this ).is( ":disabled" ) && rsubmittable.test( this.nodeName ) && !rsubmitterTypes.test( type ) && ( this.checked || !rcheckableType.test( type ) ); } ).map( function( _i, elem ) { var val = jQuery( this ).val(); if ( val == null ) { return null; } if ( Array.isArray( val ) ) { return jQuery.map( val, function( val ) { return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; } ); } return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; } ).get(); } } ); var r20 = /%20/g, rhash = /#.*$/, rantiCache = /([?&])_=[^&]*/, rheaders = /^(.*?):[ \t]*([^\r\n]*)$/mg, // #7653, #8125, #8152: local protocol detection rlocalProtocol = /^(?:about|app|app-storage|.+-extension|file|res|widget):$/, rnoContent = /^(?:GET|HEAD)$/, rprotocol = /^\/\//, /* Prefilters * 1) They are useful to introduce custom dataTypes (see ajax/jsonp.js for an example) * 2) These are called: * - BEFORE asking for a transport * - AFTER param serialization (s.data is a string if s.processData is true) * 3) key is the dataType * 4) the catchall symbol "*" can be used * 5) execution will start with transport dataType and THEN continue down to "*" if needed */ prefilters = {}, /* Transports bindings * 1) key is the dataType * 2) the catchall symbol "*" can be used * 3) selection will start with transport dataType and THEN go to "*" if needed */ transports = {}, // Avoid comment-prolog char sequence (#10098); must appease lint and evade compression allTypes = "*/".concat( "*" ), // Anchor tag for parsing the document origin originAnchor = document.createElement( "a" ); originAnchor.href = location.href; // Base "constructor" for jQuery.ajaxPrefilter and jQuery.ajaxTransport function addToPrefiltersOrTransports( structure ) { // dataTypeExpression is optional and defaults to "*" return function( dataTypeExpression, func ) { if ( typeof dataTypeExpression !== "string" ) { func = dataTypeExpression; dataTypeExpression = "*"; } var dataType, i = 0, dataTypes = dataTypeExpression.toLowerCase().match( rnothtmlwhite ) || []; if ( isFunction( func ) ) { // For each dataType in the dataTypeExpression while ( ( dataType = dataTypes[ i++ ] ) ) { // Prepend if requested if ( dataType[ 0 ] === "+" ) { dataType = dataType.slice( 1 ) || "*"; ( structure[ dataType ] = structure[ dataType ] || [] ).unshift( func ); // Otherwise append } else { ( structure[ dataType ] = structure[ dataType ] || [] ).push( func ); } } } }; } // Base inspection function for prefilters and transports function inspectPrefiltersOrTransports( structure, options, originalOptions, jqXHR ) { var inspected = {}, seekingTransport = ( structure === transports ); function inspect( dataType ) { var selected; inspected[ dataType ] = true; jQuery.each( structure[ dataType ] || [], function( _, prefilterOrFactory ) { var dataTypeOrTransport = prefilterOrFactory( options, originalOptions, jqXHR ); if ( typeof dataTypeOrTransport === "string" && !seekingTransport && !inspected[ dataTypeOrTransport ] ) { options.dataTypes.unshift( dataTypeOrTransport ); inspect( dataTypeOrTransport ); return false; } else if ( seekingTransport ) { return !( selected = dataTypeOrTransport ); } } ); return selected; } return inspect( options.dataTypes[ 0 ] ) || !inspected[ "*" ] && inspect( "*" ); } // A special extend for ajax options // that takes "flat" options (not to be deep extended) // Fixes #9887 function ajaxExtend( target, src ) { var key, deep, flatOptions = jQuery.ajaxSettings.flatOptions || {}; for ( key in src ) { if ( src[ key ] !== undefined ) { ( flatOptions[ key ] ? target : ( deep || ( deep = {} ) ) )[ key ] = src[ key ]; } } if ( deep ) { jQuery.extend( true, target, deep ); } return target; } /* Handles responses to an ajax request: * - finds the right dataType (mediates between content-type and expected dataType) * - returns the corresponding response */ function ajaxHandleResponses( s, jqXHR, responses ) { var ct, type, finalDataType, firstDataType, contents = s.contents, dataTypes = s.dataTypes; // Remove auto dataType and get content-type in the process while ( dataTypes[ 0 ] === "*" ) { dataTypes.shift(); if ( ct === undefined ) { ct = s.mimeType || jqXHR.getResponseHeader( "Content-Type" ); } } // Check if we're dealing with a known content-type if ( ct ) { for ( type in contents ) { if ( contents[ type ] && contents[ type ].test( ct ) ) { dataTypes.unshift( type ); break; } } } // Check to see if we have a response for the expected dataType if ( dataTypes[ 0 ] in responses ) { finalDataType = dataTypes[ 0 ]; } else { // Try convertible dataTypes for ( type in responses ) { if ( !dataTypes[ 0 ] || s.converters[ type + " " + dataTypes[ 0 ] ] ) { finalDataType = type; break; } if ( !firstDataType ) { firstDataType = type; } } // Or just use first one finalDataType = finalDataType || firstDataType; } // If we found a dataType // We add the dataType to the list if needed // and return the corresponding response if ( finalDataType ) { if ( finalDataType !== dataTypes[ 0 ] ) { dataTypes.unshift( finalDataType ); } return responses[ finalDataType ]; } } /* Chain conversions given the request and the original response * Also sets the responseXXX fields on the jqXHR instance */ function ajaxConvert( s, response, jqXHR, isSuccess ) { var conv2, current, conv, tmp, prev, converters = {}, // Work with a copy of dataTypes in case we need to modify it for conversion dataTypes = s.dataTypes.slice(); // Create converters map with lowercased keys if ( dataTypes[ 1 ] ) { for ( conv in s.converters ) { converters[ conv.toLowerCase() ] = s.converters[ conv ]; } } current = dataTypes.shift(); // Convert to each sequential dataType while ( current ) { if ( s.responseFields[ current ] ) { jqXHR[ s.responseFields[ current ] ] = response; } // Apply the dataFilter if provided if ( !prev && isSuccess && s.dataFilter ) { response = s.dataFilter( response, s.dataType ); } prev = current; current = dataTypes.shift(); if ( current ) { // There's only work to do if current dataType is non-auto if ( current === "*" ) { current = prev; // Convert response if prev dataType is non-auto and differs from current } else if ( prev !== "*" && prev !== current ) { // Seek a direct converter conv = converters[ prev + " " + current ] || converters[ "* " + current ]; // If none found, seek a pair if ( !conv ) { for ( conv2 in converters ) { // If conv2 outputs current tmp = conv2.split( " " ); if ( tmp[ 1 ] === current ) { // If prev can be converted to accepted input conv = converters[ prev + " " + tmp[ 0 ] ] || converters[ "* " + tmp[ 0 ] ]; if ( conv ) { // Condense equivalence converters if ( conv === true ) { conv = converters[ conv2 ]; // Otherwise, insert the intermediate dataType } else if ( converters[ conv2 ] !== true ) { current = tmp[ 0 ]; dataTypes.unshift( tmp[ 1 ] ); } break; } } } } // Apply converter (if not an equivalence) if ( conv !== true ) { // Unless errors are allowed to bubble, catch and return them if ( conv && s.throws ) { response = conv( response ); } else { try { response = conv( response ); } catch ( e ) { return { state: "parsererror", error: conv ? e : "No conversion from " + prev + " to " + current }; } } } } } } return { state: "success", data: response }; } jQuery.extend( { // Counter for holding the number of active queries active: 0, // Last-Modified header cache for next request lastModified: {}, etag: {}, ajaxSettings: { url: location.href, type: "GET", isLocal: rlocalProtocol.test( location.protocol ), global: true, processData: true, async: true, contentType: "application/x-www-form-urlencoded; charset=UTF-8", /* timeout: 0, data: null, dataType: null, username: null, password: null, cache: null, throws: false, traditional: false, headers: {}, */ accepts: { "*": allTypes, text: "text/plain", html: "text/html", xml: "application/xml, text/xml", json: "application/json, text/javascript" }, contents: { xml: /\bxml\b/, html: /\bhtml/, json: /\bjson\b/ }, responseFields: { xml: "responseXML", text: "responseText", json: "responseJSON" }, // Data converters // Keys separate source (or catchall "*") and destination types with a single space converters: { // Convert anything to text "* text": String, // Text to html (true = no transformation) "text html": true, // Evaluate text as a json expression "text json": JSON.parse, // Parse text as xml "text xml": jQuery.parseXML }, // For options that shouldn't be deep extended: // you can add your own custom options here if // and when you create one that shouldn't be // deep extended (see ajaxExtend) flatOptions: { url: true, context: true } }, // Creates a full fledged settings object into target // with both ajaxSettings and settings fields. // If target is omitted, writes into ajaxSettings. ajaxSetup: function( target, settings ) { return settings ? // Building a settings object ajaxExtend( ajaxExtend( target, jQuery.ajaxSettings ), settings ) : // Extending ajaxSettings ajaxExtend( jQuery.ajaxSettings, target ); }, ajaxPrefilter: addToPrefiltersOrTransports( prefilters ), ajaxTransport: addToPrefiltersOrTransports( transports ), // Main method ajax: function( url, options ) { // If url is an object, simulate pre-1.5 signature if ( typeof url === "object" ) { options = url; url = undefined; } // Force options to be an object options = options || {}; var transport, // URL without anti-cache param cacheURL, // Response headers responseHeadersString, responseHeaders, // timeout handle timeoutTimer, // Url cleanup var urlAnchor, // Request state (becomes false upon send and true upon completion) completed, // To know if global events are to be dispatched fireGlobals, // Loop variable i, // uncached part of the url uncached, // Create the final options object s = jQuery.ajaxSetup( {}, options ), // Callbacks context callbackContext = s.context || s, // Context for global events is callbackContext if it is a DOM node or jQuery collection globalEventContext = s.context && ( callbackContext.nodeType || callbackContext.jquery ) ? jQuery( callbackContext ) : jQuery.event, // Deferreds deferred = jQuery.Deferred(), completeDeferred = jQuery.Callbacks( "once memory" ), // Status-dependent callbacks statusCode = s.statusCode || {}, // Headers (they are sent all at once) requestHeaders = {}, requestHeadersNames = {}, // Default abort message strAbort = "canceled", // Fake xhr jqXHR = { readyState: 0, // Builds headers hashtable if needed getResponseHeader: function( key ) { var match; if ( completed ) { if ( !responseHeaders ) { responseHeaders = {}; while ( ( match = rheaders.exec( responseHeadersString ) ) ) { responseHeaders[ match[ 1 ].toLowerCase() + " " ] = ( responseHeaders[ match[ 1 ].toLowerCase() + " " ] || [] ) .concat( match[ 2 ] ); } } match = responseHeaders[ key.toLowerCase() + " " ]; } return match == null ? null : match.join( ", " ); }, // Raw string getAllResponseHeaders: function() { return completed ? responseHeadersString : null; }, // Caches the header setRequestHeader: function( name, value ) { if ( completed == null ) { name = requestHeadersNames[ name.toLowerCase() ] = requestHeadersNames[ name.toLowerCase() ] || name; requestHeaders[ name ] = value; } return this; }, // Overrides response content-type header overrideMimeType: function( type ) { if ( completed == null ) { s.mimeType = type; } return this; }, // Status-dependent callbacks statusCode: function( map ) { var code; if ( map ) { if ( completed ) { // Execute the appropriate callbacks jqXHR.always( map[ jqXHR.status ] ); } else { // Lazy-add the new callbacks in a way that preserves old ones for ( code in map ) { statusCode[ code ] = [ statusCode[ code ], map[ code ] ]; } } } return this; }, // Cancel the request abort: function( statusText ) { var finalText = statusText || strAbort; if ( transport ) { transport.abort( finalText ); } done( 0, finalText ); return this; } }; // Attach deferreds deferred.promise( jqXHR ); // Add protocol if not provided (prefilters might expect it) // Handle falsy url in the settings object (#10093: consistency with old signature) // We also use the url parameter if available s.url = ( ( url || s.url || location.href ) + "" ) .replace( rprotocol, location.protocol + "//" ); // Alias method option to type as per ticket #12004 s.type = options.method || options.type || s.method || s.type; // Extract dataTypes list s.dataTypes = ( s.dataType || "*" ).toLowerCase().match( rnothtmlwhite ) || [ "" ]; // A cross-domain request is in order when the origin doesn't match the current origin. if ( s.crossDomain == null ) { urlAnchor = document.createElement( "a" ); // Support: IE <=8 - 11, Edge 12 - 15 // IE throws exception on accessing the href property if url is malformed, // e.g. http://example.com:80x/ try { urlAnchor.href = s.url; // Support: IE <=8 - 11 only // Anchor's host property isn't correctly set when s.url is relative urlAnchor.href = urlAnchor.href; s.crossDomain = originAnchor.protocol + "//" + originAnchor.host !== urlAnchor.protocol + "//" + urlAnchor.host; } catch ( e ) { // If there is an error parsing the URL, assume it is crossDomain, // it can be rejected by the transport if it is invalid s.crossDomain = true; } } // Convert data if not already a string if ( s.data && s.processData && typeof s.data !== "string" ) { s.data = jQuery.param( s.data, s.traditional ); } // Apply prefilters inspectPrefiltersOrTransports( prefilters, s, options, jqXHR ); // If request was aborted inside a prefilter, stop there if ( completed ) { return jqXHR; } // We can fire global events as of now if asked to // Don't fire events if jQuery.event is undefined in an AMD-usage scenario (#15118) fireGlobals = jQuery.event && s.global; // Watch for a new set of requests if ( fireGlobals && jQuery.active++ === 0 ) { jQuery.event.trigger( "ajaxStart" ); } // Uppercase the type s.type = s.type.toUpperCase(); // Determine if request has content s.hasContent = !rnoContent.test( s.type ); // Save the URL in case we're toying with the If-Modified-Since // and/or If-None-Match header later on // Remove hash to simplify url manipulation cacheURL = s.url.replace( rhash, "" ); // More options handling for requests with no content if ( !s.hasContent ) { // Remember the hash so we can put it back uncached = s.url.slice( cacheURL.length ); // If data is available and should be processed, append data to url if ( s.data && ( s.processData || typeof s.data === "string" ) ) { cacheURL += ( rquery.test( cacheURL ) ? "&" : "?" ) + s.data; // #9682: remove data so that it's not used in an eventual retry delete s.data; } // Add or update anti-cache param if needed if ( s.cache === false ) { cacheURL = cacheURL.replace( rantiCache, "$1" ); uncached = ( rquery.test( cacheURL ) ? "&" : "?" ) + "_=" + ( nonce.guid++ ) + uncached; } // Put hash and anti-cache on the URL that will be requested (gh-1732) s.url = cacheURL + uncached; // Change '%20' to '+' if this is encoded form body content (gh-2658) } else if ( s.data && s.processData && ( s.contentType || "" ).indexOf( "application/x-www-form-urlencoded" ) === 0 ) { s.data = s.data.replace( r20, "+" ); } // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. if ( s.ifModified ) { if ( jQuery.lastModified[ cacheURL ] ) { jqXHR.setRequestHeader( "If-Modified-Since", jQuery.lastModified[ cacheURL ] ); } if ( jQuery.etag[ cacheURL ] ) { jqXHR.setRequestHeader( "If-None-Match", jQuery.etag[ cacheURL ] ); } } // Set the correct header, if data is being sent if ( s.data && s.hasContent && s.contentType !== false || options.contentType ) { jqXHR.setRequestHeader( "Content-Type", s.contentType ); } // Set the Accepts header for the server, depending on the dataType jqXHR.setRequestHeader( "Accept", s.dataTypes[ 0 ] && s.accepts[ s.dataTypes[ 0 ] ] ? s.accepts[ s.dataTypes[ 0 ] ] + ( s.dataTypes[ 0 ] !== "*" ? ", " + allTypes + "; q=0.01" : "" ) : s.accepts[ "*" ] ); // Check for headers option for ( i in s.headers ) { jqXHR.setRequestHeader( i, s.headers[ i ] ); } // Allow custom headers/mimetypes and early abort if ( s.beforeSend && ( s.beforeSend.call( callbackContext, jqXHR, s ) === false || completed ) ) { // Abort if not done already and return return jqXHR.abort(); } // Aborting is no longer a cancellation strAbort = "abort"; // Install callbacks on deferreds completeDeferred.add( s.complete ); jqXHR.done( s.success ); jqXHR.fail( s.error ); // Get transport transport = inspectPrefiltersOrTransports( transports, s, options, jqXHR ); // If no transport, we auto-abort if ( !transport ) { done( -1, "No Transport" ); } else { jqXHR.readyState = 1; // Send global event if ( fireGlobals ) { globalEventContext.trigger( "ajaxSend", [ jqXHR, s ] ); } // If request was aborted inside ajaxSend, stop there if ( completed ) { return jqXHR; } // Timeout if ( s.async && s.timeout > 0 ) { timeoutTimer = window.setTimeout( function() { jqXHR.abort( "timeout" ); }, s.timeout ); } try { completed = false; transport.send( requestHeaders, done ); } catch ( e ) { // Rethrow post-completion exceptions if ( completed ) { throw e; } // Propagate others as results done( -1, e ); } } // Callback for when everything is done function done( status, nativeStatusText, responses, headers ) { var isSuccess, success, error, response, modified, statusText = nativeStatusText; // Ignore repeat invocations if ( completed ) { return; } completed = true; // Clear timeout if it exists if ( timeoutTimer ) { window.clearTimeout( timeoutTimer ); } // Dereference transport for early garbage collection // (no matter how long the jqXHR object will be used) transport = undefined; // Cache response headers responseHeadersString = headers || ""; // Set readyState jqXHR.readyState = status > 0 ? 4 : 0; // Determine if successful isSuccess = status >= 200 && status < 300 || status === 304; // Get response data if ( responses ) { response = ajaxHandleResponses( s, jqXHR, responses ); } // Use a noop converter for missing script but not if jsonp if ( !isSuccess && jQuery.inArray( "script", s.dataTypes ) > -1 && jQuery.inArray( "json", s.dataTypes ) < 0 ) { s.converters[ "text script" ] = function() {}; } // Convert no matter what (that way responseXXX fields are always set) response = ajaxConvert( s, response, jqXHR, isSuccess ); // If successful, handle type chaining if ( isSuccess ) { // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. if ( s.ifModified ) { modified = jqXHR.getResponseHeader( "Last-Modified" ); if ( modified ) { jQuery.lastModified[ cacheURL ] = modified; } modified = jqXHR.getResponseHeader( "etag" ); if ( modified ) { jQuery.etag[ cacheURL ] = modified; } } // if no content if ( status === 204 || s.type === "HEAD" ) { statusText = "nocontent"; // if not modified } else if ( status === 304 ) { statusText = "notmodified"; // If we have data, let's convert it } else { statusText = response.state; success = response.data; error = response.error; isSuccess = !error; } } else { // Extract error from statusText and normalize for non-aborts error = statusText; if ( status || !statusText ) { statusText = "error"; if ( status < 0 ) { status = 0; } } } // Set data for the fake xhr object jqXHR.status = status; jqXHR.statusText = ( nativeStatusText || statusText ) + ""; // Success/Error if ( isSuccess ) { deferred.resolveWith( callbackContext, [ success, statusText, jqXHR ] ); } else { deferred.rejectWith( callbackContext, [ jqXHR, statusText, error ] ); } // Status-dependent callbacks jqXHR.statusCode( statusCode ); statusCode = undefined; if ( fireGlobals ) { globalEventContext.trigger( isSuccess ? "ajaxSuccess" : "ajaxError", [ jqXHR, s, isSuccess ? success : error ] ); } // Complete completeDeferred.fireWith( callbackContext, [ jqXHR, statusText ] ); if ( fireGlobals ) { globalEventContext.trigger( "ajaxComplete", [ jqXHR, s ] ); // Handle the global AJAX counter if ( !( --jQuery.active ) ) { jQuery.event.trigger( "ajaxStop" ); } } } return jqXHR; }, getJSON: function( url, data, callback ) { return jQuery.get( url, data, callback, "json" ); }, getScript: function( url, callback ) { return jQuery.get( url, undefined, callback, "script" ); } } ); jQuery.each( [ "get", "post" ], function( _i, method ) { jQuery[ method ] = function( url, data, callback, type ) { // Shift arguments if data argument was omitted if ( isFunction( data ) ) { type = type || callback; callback = data; data = undefined; } // The url can be an options object (which then must have .url) return jQuery.ajax( jQuery.extend( { url: url, type: method, dataType: type, data: data, success: callback }, jQuery.isPlainObject( url ) && url ) ); }; } ); jQuery.ajaxPrefilter( function( s ) { var i; for ( i in s.headers ) { if ( i.toLowerCase() === "content-type" ) { s.contentType = s.headers[ i ] || ""; } } } ); jQuery._evalUrl = function( url, options, doc ) { return jQuery.ajax( { url: url, // Make this explicit, since user can override this through ajaxSetup (#11264) type: "GET", dataType: "script", cache: true, async: false, global: false, // Only evaluate the response if it is successful (gh-4126) // dataFilter is not invoked for failure responses, so using it instead // of the default converter is kludgy but it works. converters: { "text script": function() {} }, dataFilter: function( response ) { jQuery.globalEval( response, options, doc ); } } ); }; jQuery.fn.extend( { wrapAll: function( html ) { var wrap; if ( this[ 0 ] ) { if ( isFunction( html ) ) { html = html.call( this[ 0 ] ); } // The elements to wrap the target around wrap = jQuery( html, this[ 0 ].ownerDocument ).eq( 0 ).clone( true ); if ( this[ 0 ].parentNode ) { wrap.insertBefore( this[ 0 ] ); } wrap.map( function() { var elem = this; while ( elem.firstElementChild ) { elem = elem.firstElementChild; } return elem; } ).append( this ); } return this; }, wrapInner: function( html ) { if ( isFunction( html ) ) { return this.each( function( i ) { jQuery( this ).wrapInner( html.call( this, i ) ); } ); } return this.each( function() { var self = jQuery( this ), contents = self.contents(); if ( contents.length ) { contents.wrapAll( html ); } else { self.append( html ); } } ); }, wrap: function( html ) { var htmlIsFunction = isFunction( html ); return this.each( function( i ) { jQuery( this ).wrapAll( htmlIsFunction ? html.call( this, i ) : html ); } ); }, unwrap: function( selector ) { this.parent( selector ).not( "body" ).each( function() { jQuery( this ).replaceWith( this.childNodes ); } ); return this; } } ); jQuery.expr.pseudos.hidden = function( elem ) { return !jQuery.expr.pseudos.visible( elem ); }; jQuery.expr.pseudos.visible = function( elem ) { return !!( elem.offsetWidth || elem.offsetHeight || elem.getClientRects().length ); }; jQuery.ajaxSettings.xhr = function() { try { return new window.XMLHttpRequest(); } catch ( e ) {} }; var xhrSuccessStatus = { // File protocol always yields status code 0, assume 200 0: 200, // Support: IE <=9 only // #1450: sometimes IE returns 1223 when it should be 204 1223: 204 }, xhrSupported = jQuery.ajaxSettings.xhr(); support.cors = !!xhrSupported && ( "withCredentials" in xhrSupported ); support.ajax = xhrSupported = !!xhrSupported; jQuery.ajaxTransport( function( options ) { var callback, errorCallback; // Cross domain only allowed if supported through XMLHttpRequest if ( support.cors || xhrSupported && !options.crossDomain ) { return { send: function( headers, complete ) { var i, xhr = options.xhr(); xhr.open( options.type, options.url, options.async, options.username, options.password ); // Apply custom fields if provided if ( options.xhrFields ) { for ( i in options.xhrFields ) { xhr[ i ] = options.xhrFields[ i ]; } } // Override mime type if needed if ( options.mimeType && xhr.overrideMimeType ) { xhr.overrideMimeType( options.mimeType ); } // X-Requested-With header // For cross-domain requests, seeing as conditions for a preflight are // akin to a jigsaw puzzle, we simply never set it to be sure. // (it can always be set on a per-request basis or even using ajaxSetup) // For same-domain requests, won't change header if already provided. if ( !options.crossDomain && !headers[ "X-Requested-With" ] ) { headers[ "X-Requested-With" ] = "XMLHttpRequest"; } // Set headers for ( i in headers ) { xhr.setRequestHeader( i, headers[ i ] ); } // Callback callback = function( type ) { return function() { if ( callback ) { callback = errorCallback = xhr.onload = xhr.onerror = xhr.onabort = xhr.ontimeout = xhr.onreadystatechange = null; if ( type === "abort" ) { xhr.abort(); } else if ( type === "error" ) { // Support: IE <=9 only // On a manual native abort, IE9 throws // errors on any property access that is not readyState if ( typeof xhr.status !== "number" ) { complete( 0, "error" ); } else { complete( // File: protocol always yields status 0; see #8605, #14207 xhr.status, xhr.statusText ); } } else { complete( xhrSuccessStatus[ xhr.status ] || xhr.status, xhr.statusText, // Support: IE <=9 only // IE9 has no XHR2 but throws on binary (trac-11426) // For XHR2 non-text, let the caller handle it (gh-2498) ( xhr.responseType || "text" ) !== "text" || typeof xhr.responseText !== "string" ? { binary: xhr.response } : { text: xhr.responseText }, xhr.getAllResponseHeaders() ); } } }; }; // Listen to events xhr.onload = callback(); errorCallback = xhr.onerror = xhr.ontimeout = callback( "error" ); // Support: IE 9 only // Use onreadystatechange to replace onabort // to handle uncaught aborts if ( xhr.onabort !== undefined ) { xhr.onabort = errorCallback; } else { xhr.onreadystatechange = function() { // Check readyState before timeout as it changes if ( xhr.readyState === 4 ) { // Allow onerror to be called first, // but that will not handle a native abort // Also, save errorCallback to a variable // as xhr.onerror cannot be accessed window.setTimeout( function() { if ( callback ) { errorCallback(); } } ); } }; } // Create the abort callback callback = callback( "abort" ); try { // Do send the request (this may raise an exception) xhr.send( options.hasContent && options.data || null ); } catch ( e ) { // #14683: Only rethrow if this hasn't been notified as an error yet if ( callback ) { throw e; } } }, abort: function() { if ( callback ) { callback(); } } }; } } ); // Prevent auto-execution of scripts when no explicit dataType was provided (See gh-2432) jQuery.ajaxPrefilter( function( s ) { if ( s.crossDomain ) { s.contents.script = false; } } ); // Install script dataType jQuery.ajaxSetup( { accepts: { script: "text/javascript, application/javascript, " + "application/ecmascript, application/x-ecmascript" }, contents: { script: /\b(?:java|ecma)script\b/ }, converters: { "text script": function( text ) { jQuery.globalEval( text ); return text; } } } ); // Handle cache's special case and crossDomain jQuery.ajaxPrefilter( "script", function( s ) { if ( s.cache === undefined ) { s.cache = false; } if ( s.crossDomain ) { s.type = "GET"; } } ); // Bind script tag hack transport jQuery.ajaxTransport( "script", function( s ) { // This transport only deals with cross domain or forced-by-attrs requests if ( s.crossDomain || s.scriptAttrs ) { var script, callback; return { send: function( _, complete ) { script = jQuery( " Skip to contents



mvgam R package logoStan Logo

MultiVariate (Dynamic) Generalized Addivite Models

The goal of mvgam is to fit Bayesian Dynamic Generalized Additive Models (DGAMs) that can include highly flexible nonlinear predictor effects for both process and observation components. The package does this by relying on functionalities from the impressive brms and mgcv packages. This allows mvgam to fit a wide range of models, including:

Installation

Install the stable version from CRAN using: install.packages('mvgam'), or install the development version from GitHub using: devtools::install_github("nicholasjclark/mvgam"). Note that to actually condition models with MCMC sampling, the Stan software must be installed (along with either rstan and/or cmdstanr). Only rstan is listed as a dependency of mvgam to ensure that installation is less difficult. If users wish to fit the models using mvgam, please refer to installation links for Stan with rstan here, or for Stan with cmdstandr here. You will need a fairly recent version of Stan (preferably 2.29 or above) to ensure all the model syntax is recognized. We highly recommend you use Cmdstan through the cmdstanr interface as the backend. This is because Cmdstan is easier to install, is more up to date with new features, and uses less memory than Rstan. See this documentation from the Cmdstan team for more information.

Introductory seminar

Cheatsheet

mvgam usage cheatsheet

Getting started

mvgam was originally designed to analyse and forecast non-negative integer-valued data (counts). These data are traditionally challenging to analyse with existing time-series analysis packages. But further development of mvgam has resulted in support for a growing number of observation families that extend to other types of data. Currently, the package can handle data for the following families:

See ?mvgam_families for more information. Below is a simple example for simulating and modelling proportional data with Beta observations over a set of seasonal series with independent Gaussian Process dynamic trends:

set.seed(100)
data <- sim_mvgam(
  family = betar(),
  T = 80,
  trend_model = GP(),
  prop_trend = 0.5, 
  seasonality = 'shared'
)

Plot the series to see how they evolve over time

plot_mvgam_series(
  data = data$data_train, 
  series = 'all'
)
Visualizing multivariate proportional time series using the mvgam R package #rstats

Fit a State-Space GAM to these series that uses a hierarchical cyclic seasonal smooth term to capture variation in seasonality among series. The model also includes series-specific latent Gaussian Processes with squared exponential covariance functions to capture temporal dynamics

mod <- mvgam(
  y ~ s(season, bs = 'cc', k = 7) +
    s(season, by = series, m = 1, k = 5),
  trend_model = GP(),
  data = data$data_train,
  newdata = data$data_test,
  family = betar()
)

Plot the estimated posterior hindcast and forecast distributions for each series

library(patchwork)
fc <- forecast(mod)
wrap_plots(
  plot(fc, series = 1), 
  plot(fc, series = 2), 
  plot(fc, series = 3), 
  ncol = 2
)
Forecasting multivariate time series with Dynamic Generalized Additive Models

Various S3 functions can be used to inspect parameter estimates, plot smooth functions and residuals, and evaluate models through posterior predictive checks or forecast comparisons. Please see the package documentation for more detailed examples.

Vignettes

You can set build_vignettes = TRUE when installing but be aware this will slow down the installation drastically. Instead, you can always access the vignette htmls online at https://nicholasjclark.github.io/mvgam/articles/

When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work (but feel free to ⭐ the repo as well), so we highly encourage you to cite any packages that you rely on for your research.

When using mvgam, please cite the following:

Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. DOI: https://doi.org/10.1111/2041-210X.13974

As mvgam acts as an interface to Stan, please additionally cite:

Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software. 76(1). DOI: https://doi.org/10.18637/jss.v076.i01

mvgam relies on several other R packages and, of course, on R itself. To find out how to cite R and its packages, use citation(). There are some features of mvgam which specifically rely on certain packages. The most important of these is the generation of data necessary to estimate smoothing splines and Gaussian Processes, which rely on the mgcv, brms and splines2 packages. The rstan and cmdstanr packages together with Rcpp makes Stan conveniently accessible in R. If you use some of these features, please also consider citing the related packages.

Getting help

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub. Please also feel free to use the mvgam Discussion Board to hunt for or post other discussion topics related to the package, and do check out the mvgam changelog for any updates about recent upgrades that the package has incorporated.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please reach out if you are interested (n.clark’at’uq.edu.au). Other contributions are also very welcome, but please see The Contributor Instructions for general guidelines. Note that by participating in this project you agree to abide by the terms of its Contributor Code of Conduct.

================================================ FILE: docs/news/index.html ================================================ Changelog • mvgam Skip to contents

mvgam 1.1.5 (not yet on CRAN)

New functionalities

  • Converted several more plotting functions to return ggplot objects in place of base R plots
  • Added four new types to the pp_check() function to allow more targeted investigations of randomized quantile residual distributions

Bug fixes

  • Bug fix to ensure piecewise trends are extrapolated the correct number of timepoints when forecasting using the forecast() function

mvgam 1.1.4

CRAN release: 2025-02-19

New functionalities

  • Added the how_to_cite.mvgam() function to generate a scaffold methods description of fitted models, which can hopefully make it easier for users to fully describe their programming environment
  • Improved various plotting functions by returning ggplot objects in place of base plots (thanks to @mhollanders #38)
  • Added the brier score (score = 'brier') as an option in score.mvgam_forecast() for scoring forecasts of binary variables when using family = bernoulli() (#80)
  • Added augment() function to add residuals and fitted values to an mvgam object’s observed data (thanks to @swpease #83)
  • Added support for approximate gp() effects with more than one covariate and with different kernel functions (#79)
  • Added function jsdgam() to estimate Joint Species Distribution Models in which both the latent factors and the observation model components can include any of mvgam’s complex linear predictor effects. Also added a function residual_cor() to compute residual correlation, covariance and precision matrices from jsdgam models. See ?mvgam::jsdgam and ?mvgam::residual_cor for details
  • Added a stability.mvgam() method to compute stability metrics from models fit with Vector Autoregressive dynamics (#21 and #76)
  • Added functionality to estimate hierarchical error correlations when using multivariate latent process models and when the data are nested among levels of a relevant grouping factor (#75); see ?mvgam::AR for an example
  • Added ZMVN() error models for estimating Zero-Mean Multivariate Normal errors; convenient for working with non time-series data where latent residuals are expected to be correlated (such as when fitting Joint Species Distribution Models); see ?mvgam::ZMVN for examples
  • Added a fevd.mvgam() method to compute forecast error variance decompositions from models fit with Vector Autoregressive dynamics (#21 and #76)

Deprecations

  • Arguments use_stan, jags_path, data_train, data_test, adapt_delta, max_treedepth and drift have been removed from primary functions to streamline documentation and reflect the package’s mission to deprecate ‘JAGS’ as a suitable backend. Both adapt_delta and max_treedepth should now be supplied in a named list() to the new argument control

Bug fixes

  • Bug fix to ensure marginaleffects::comparisons functions appropriately recognise internal rowid variables
  • Updates to ensure ensemble provides appropriate weighting of forecast draws (#98)
  • Not necessarily a “bug fix”, but this update removes several dependencies to lighten installation and improve efficiency of the workflow (#93)
  • Fixed a minor bug in the way trend_map recognises levels of the series factor
  • Bug fix to ensure lfo_cv recognises the actual times in time, just in case the user supplies data that doesn’t start at t = 1. Also updated documentation to better reflect this
  • Bug fix to ensure update.mvgam captures any knots or trend_knots arguments that were passed to the original model call

mvgam 1.1.3

CRAN release: 2024-09-04

New functionalities

  • Allow intercepts to be included in process models when trend_formula is supplied. This breaks the assumption that the process has to be zero-centred, adding more modelling flexibility but also potentially inducing nonidentifiabilities with respect to any observation model intercepts. Thoughtful priors are a must for these models
  • Added standata.mvgam_prefit, stancode.mvgam and stancode.mvgam_prefit methods for better alignment with ‘brms’ workflows
  • Added ‘gratia’ to Enhancements to allow popular methods such as draw() to be used for ‘mvgam’ models if ‘gratia’ is already installed
  • Added an ensemble.mvgam_forecast() method to generate evenly weighted combinations of probabilistic forecast distributions
  • Added an irf.mvgam() method to compute Generalized and Orthogonalized Impulse Response Functions (IRFs) from models fit with Vector Autoregressive dynamics

Deprecations

  • The drift argument has been deprecated. It is now recommended for users to include parametric fixed effects of “time” in their respective GAM formulae to capture any expected drift effects

Bug fixes

  • Added a new check to ensure that exception messages are only suppressed by the silent argument if the user’s version of ‘cmdstanr’ is adequate
  • Updated dependency for ‘brms’ to version >= ‘2.21.0’ so that read_csv_as_stanfit can be imported, which should future-proof the conversion of ‘cmdstanr’ models to stanfit objects (#70)

mvgam 1.1.2

CRAN release: 2024-07-01

New functionalities

  • Added options for silencing some of the ‘Stan’ compiler and modeling messages using the silent argument in mvgam()
  • Moved a number of packages from ‘Depends’ to ‘Imports’ for simpler package loading and fewer potential masking conflicts
  • Improved efficiency of the model initialisation by tweaking parameters of the underlying ‘mgcv’ gam object’s convergence criteria, resulting in much faster model setups
  • Added an option to use trend_model = 'None' in State-Space models, increasing flexibility by ensuring the process error evolves as white noise (#51)
  • Added an option to use the non-centred parameterisation for some autoregressive trend models, which speeds up mixing most of the time
  • Updated support for multithreading so that all observation families (apart from nmix()) can now be modeled with multiple threads
  • Changed default priors on autoregressive coefficients (AR1, AR2, AR3) to enforce stationarity, which is a much more sensible prior in the majority of contexts

Bug fixes

mvgam 1.1.1

CRAN release: 2024-05-10

New functionalities

mvgam 1.1.0

CRAN release: 2024-05-06

  • First release of mvgam to CRAN
================================================ FILE: docs/pkgdown.js ================================================ /* http://gregfranko.com/blog/jquery-best-practices/ */ (function($) { $(function() { $('nav.navbar').headroom(); Toc.init({ $nav: $("#toc"), $scope: $("main h2, main h3, main h4, main h5, main h6") }); if ($('#toc').length) { $('body').scrollspy({ target: '#toc', offset: $("nav.navbar").outerHeight() + 1 }); } // Activate popovers $('[data-bs-toggle="popover"]').popover({ container: 'body', html: true, trigger: 'focus', placement: "top", sanitize: false, }); $('[data-bs-toggle="tooltip"]').tooltip(); /* Clipboard --------------------------*/ function changeTooltipMessage(element, msg) { var tooltipOriginalTitle=element.getAttribute('data-original-title'); element.setAttribute('data-original-title', msg); $(element).tooltip('show'); element.setAttribute('data-original-title', tooltipOriginalTitle); } if(ClipboardJS.isSupported()) { $(document).ready(function() { var copyButton = ""; $("div.sourceCode").addClass("hasCopyButton"); // Insert copy buttons: $(copyButton).prependTo(".hasCopyButton"); // Initialize tooltips: $('.btn-copy-ex').tooltip({container: 'body'}); // Initialize clipboard: var clipboard = new ClipboardJS('[data-clipboard-copy]', { text: function(trigger) { return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); } }); clipboard.on('success', function(e) { changeTooltipMessage(e.trigger, 'Copied!'); e.clearSelection(); }); clipboard.on('error', function() { changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); }); }); } /* Search marking --------------------------*/ var url = new URL(window.location.href); var toMark = url.searchParams.get("q"); var mark = new Mark("main#main"); if (toMark) { mark.mark(toMark, { accuracy: { value: "complementary", limiters: [",", ".", ":", "/"], } }); } /* Search --------------------------*/ /* Adapted from https://github.com/rstudio/bookdown/blob/2d692ba4b61f1e466c92e78fd712b0ab08c11d31/inst/resources/bs4_book/bs4_book.js#L25 */ // Initialise search index on focus var fuse; $("#search-input").focus(async function(e) { if (fuse) { return; } $(e.target).addClass("loading"); var response = await fetch($("#search-input").data("search-index")); var data = await response.json(); var options = { keys: ["what", "text", "code"], ignoreLocation: true, threshold: 0.1, includeMatches: true, includeScore: true, }; fuse = new Fuse(data, options); $(e.target).removeClass("loading"); }); // Use algolia autocomplete var options = { autoselect: true, debug: true, hint: false, minLength: 2, }; var q; async function searchFuse(query, callback) { await fuse; var items; if (!fuse) { items = []; } else { q = query; var results = fuse.search(query, { limit: 20 }); items = results .filter((x) => x.score <= 0.75) .map((x) => x.item); if (items.length === 0) { items = [{dir:"Sorry 😿",previous_headings:"",title:"No results found.",what:"No results found.",path:window.location.href}]; } } callback(items); } $("#search-input").autocomplete(options, [ { name: "content", source: searchFuse, templates: { suggestion: (s) => { if (s.title == s.what) { return `${s.dir} >
${s.title}
`; } else if (s.previous_headings == "") { return `${s.dir} >
${s.title}
> ${s.what}`; } else { return `${s.dir} >
${s.title}
> ${s.previous_headings} > ${s.what}`; } }, }, }, ]).on('autocomplete:selected', function(event, s) { window.location.href = s.path + "?q=" + q + "#" + s.id; }); }); })(window.jQuery || window.$) ================================================ FILE: docs/pkgdown.yml ================================================ pandoc: 3.1.1 pkgdown: 2.0.7 pkgdown_sha: ~ articles: data_in_mvgam: data_in_mvgam.html forecast_evaluation: forecast_evaluation.html mvgam_overview: mvgam_overview.html nmixtures: nmixtures.html shared_states: shared_states.html time_varying_effects: time_varying_effects.html trend_formulas: trend_formulas.html last_built: 2024-03-12T03:54Z urls: reference: https://nicholasjclark.github.io/mvgam/reference article: https://nicholasjclark.github.io/mvgam/articles ================================================ FILE: docs/reference/GP.html ================================================ Specify dynamic Gaussian process trends in mvgam models — GP • mvgam Skip to contents

Set up low-rank approximate Gaussian Process trend models using Hilbert basis expansions in mvgam. This function does not evaluate its arguments – it exists purely to help set up a model with particular GP trend models.

Usage

GP(...)

Arguments

...

unused

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Details

A GP trend is estimated for each series using Hilbert space approximate Gaussian Processes. In mvgam, latent squared exponential GP trends are approximated using by default 20 basis functions and using a multiplicative factor of c = 5/4, which saves computational costs compared to fitting full GPs while adequately estimating GP alpha and rho parameters.

See also

================================================ FILE: docs/reference/RW.html ================================================ Specify autoregressive dynamic processes in mvgam — RW • mvgam Skip to contents

Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.

Usage

RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

CAR(p = 1)

VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

Arguments

ma

Logical Include moving average terms of order 1? Default is FALSE.

cor

Logical Include correlated process errors as part of a multivariate normal process model? If TRUE and if n_series > 1 in the supplied data, a fully structured covariance matrix will be estimated for the process errors. Default is FALSE.

gr

An optional grouping variable, which must be a factor in the supplied data, for setting up hierarchical residual correlation structures. If specified, this will automatically set cor = TRUE and set up a model where the residual correlations for a specific level of gr are modelled hierarchically: \(\Omega_{group} = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). If gr is supplied then subgr must also be supplied

subgr

A subgrouping factor variable specifying which element in data represents the different time series. Defaults to series, but note that models that use the hierarchical correlations, where the subgr time series are measured in each level of gr, should not include a series element in data. Rather, this element will be created internally based on the supplied variables for gr and subgr. For example, if you are modelling temporal counts for a group of species (labelled as species in data) across three different geographical regions (labelled as region), and you would like the residuals to be correlated within regions, then you should specify gr = region and subgr = species. Internally, mvgam() will create the series element for the data using: series = interaction(group, subgroup, drop = TRUE))

p

A non-negative integer specifying the autoregressive (AR) order. Default is 1. Cannot currently be larger than 3 for AR terms, and cannot be anything other than 1 for continuous time AR (CAR) terms

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Examples

# \donttest{
# A short example to illustrate CAR(1) models
# Function to simulate CAR1 data with seasonality
sim_corcar1 = function(n = 120,
                      phi = 0.5,
                      sigma = 1,
                      sigma_obs = 0.75){
# Sample irregularly spaced time intervals
time_dis <- c(0, runif(n - 1, -0.1, 1))
time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5

# Set up the latent dynamic process
x <- vector(length = n); x[1] <- -0.3
for(i in 2:n){
 # zero-distances will cause problems in sampling, so mvgam uses a
 # minimum threshold; this simulation function emulates that process
 if(time_dis[i] == 0){
   x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma)
  } else {
    x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma)
  }
}

# Add 12-month seasonality
cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12)
beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5)
seasonality <- beta1 * cov1 + beta2 * cov2

# Take Gaussian observations with error and return
data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs),
           season = rep(1:12, 20)[1:n],
           time = cumsum(time_dis))
}

# Sample two time series
dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65,
                                         sigma_obs = 0.55),
                             data.frame(series = 'series1')),
            dplyr::bind_cols(sim_corcar1(phi = 0.8,
                             sigma_obs = 0.35),
                             data.frame(series = 'series2'))) %>%
      dplyr::mutate(series = as.factor(series))

# mvgam with CAR(1) trends and series-level seasonal smooths; the
# State-Space representation (using trend_formula) will be more efficient
mod <- mvgam(formula = y ~ 1,
            trend_formula = ~ s(season, bs = 'cc',
                                k = 5, by = trend),
            trend_model = CAR(),
            data = dat,
            family = gaussian(),
            samples = 300,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3e2173a3.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# View usual summaries and plots
summary(mod)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x000001d61abb4b00>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 5, by = trend)
#> <environment: 0x000001d61abb4b00>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> CAR()
#> 
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 2 
#> 
#> N timepoints:
#> 120 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 800; warmup = 500; thin = 1 
#> Total post-warmup draws = 600
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.32 0.61  0.93 1.02    25
#> sigma_obs[2] 0.17 0.30  0.54 1.08    21
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5%   50%  97.5% Rhat n_eff
#> (Intercept) -2.1 -0.86 -0.026 2.18     3
#> 
#> Process model AR parameter estimates:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.20 0.56  0.85 1.03    61
#> ar1[2] 0.69 0.80  0.89 1.00   183
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.64 0.95   1.2 1.01    39
#> sigma[2] 0.73 0.89   1.1 1.02    87
#> 
#> GAM process model coefficient (beta) estimates:
#>                                2.5%   50%  97.5% Rhat n_eff
#> (Intercept)_trend             -0.14  0.71  2.000  2.1     3
#> s(season):trendtrend1.1_trend -0.17  0.18  0.560  1.0   423
#> s(season):trendtrend1.2_trend -0.75 -0.30  0.120  1.0   522
#> s(season):trendtrend1.3_trend -0.86 -0.46 -0.044  1.0   488
#> s(season):trendtrend2.1_trend -0.13  0.24  0.700  1.0   388
#> s(season):trendtrend2.2_trend -0.95 -0.41  0.076  1.0   398
#> s(season):trendtrend2.3_trend -1.00 -0.50 -0.094  1.0   491
#> 
#> Approximate significance of GAM process smooths:
#>                         edf Ref.df Chi.sq p-value  
#> s(season):seriestrend1 2.29      3   10.5   0.061 .
#> s(season):seriestrend2 2.19      3   15.1   0.051 .
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 485 parameters
#>  *Diagnose further to investigate why the chains have not mixed
#> 11 of 600 iterations ended with a divergence (1.8333%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 600 iterations saturated the maximum tree depth of 10 (0%)
#> Chain 1: E-FMI = 0.1507
#> Chain 2: E-FMI = 0.1239
#>  *E-FMI below 0.2 indicates you may need to reparameterize your model
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:59:08 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'expected')

plot(mod, type = 'trend', series = 1)

plot(mod, type = 'trend', series = 2)

plot(mod, type = 'residuals', series = 1)

plot(mod, type = 'residuals', series = 2)


# Now an example illustrating hierarchical dynamics
set.seed(123)
# Simulate three species monitored in three different
# regions, where dynamics can potentially vary across regions
simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))
simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))
simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))

# Set up the data but DO NOT include 'series'
all_dat <- rbind(simdat1$data_train %>%
                  dplyr::mutate(region = 'qld'),
                simdat2$data_train %>%
                  dplyr::mutate(region = 'nsw'),
                simdat3$data_train %>%
                  dplyr::mutate(region = 'vic')) %>%
dplyr::mutate(species = gsub('series', 'species', series),
              species = as.factor(species),
              region = as.factor(region)) %>%
dplyr::arrange(series, time) %>%
dplyr::select(-series)

# Check priors for a hierarchical AR1 model
get_mvgam_priors(formula = y ~ species,
                trend_model = AR(gr = region, subgr = species),
                data = all_dat)
#>                                param_name param_length                    param_info
#> 1                             (Intercept)            1                   (Intercept)
#> 2                        speciesspecies_2            1 speciesspecies_2 fixed effect
#> 3                        speciesspecies_3            1 speciesspecies_3 fixed effect
#> 4 vector<lower=-1,upper=1>[n_series] ar1;            9         trend AR1 coefficient
#> 5        vector<lower=0>[n_series] sigma;            9                      trend sd
#>                                    prior                   example_change new_lowerbound new_upperbound
#> 1  (Intercept) ~ student_t(3, 1.9, 2.5);      (Intercept) ~ normal(0, 1);             NA             NA
#> 2 speciesspecies_2 ~ student_t(3, 0, 2); speciesspecies_2 ~ normal(0, 1);             NA             NA
#> 3 speciesspecies_3 ~ student_t(3, 0, 2); speciesspecies_3 ~ normal(0, 1);             NA             NA
#> 4                    ar1 ~ std_normal();       ar1 ~ normal(-0.79, 0.86);             NA             NA
#> 5          sigma ~ student_t(3, 0, 2.5);       sigma ~ exponential(0.37);             NA             NA

# Fit the model
mod <- mvgam(formula = y ~ species,
            trend_model = AR(gr = region, subgr = species),
            data = all_dat,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c48444ff2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Check standard outputs
summary(mod)
#> GAM formula:
#> y ~ species
#> <environment: 0x000001d61abb4b00>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(gr = region, subgr = species)
#> 
#> 
#> N series:
#> 9 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                  2.5%  50% 97.5% Rhat n_eff
#> (Intercept)      0.90 1.10   1.3 1.00   175
#> speciesspecies_2 0.79 0.99   1.2 1.02   332
#> speciesspecies_3 1.60 1.90   2.1 1.01   125
#> 
#> Latent trend parameter AR estimates:
#>            2.5%    50%  97.5% Rhat n_eff
#> ar1[1]    0.220  0.480  0.700 1.01   202
#> ar1[2]    0.120  0.330  0.550 1.02   157
#> ar1[3]   -0.017  0.370  0.680 1.03    54
#> ar1[4]    0.230  0.640  0.870 1.02   136
#> ar1[5]   -0.190  0.074  0.300 1.00   146
#> ar1[6]   -0.430 -0.220 -0.015 1.00   120
#> ar1[7]   -0.034  0.290  0.610 1.00   168
#> ar1[8]    0.570  0.730  0.890 1.00   178
#> ar1[9]    0.330  0.560  0.780 1.00   111
#> sigma[1]  0.810  1.000  1.300 1.00   531
#> sigma[2]  0.650  0.800  0.970 1.00   613
#> sigma[3]  0.820  0.970  1.200 1.00   358
#> sigma[4]  0.310  0.480  0.710 1.03   117
#> sigma[5]  0.630  0.760  0.900 1.00   495
#> sigma[6]  0.680  0.790  0.960 1.00   666
#> sigma[7]  0.610  0.790  1.000 1.00   256
#> sigma[8]  0.560  0.700  0.910 1.00   556
#> sigma[9]  0.690  0.820  1.000 1.00  1267
#> 
#> Hierarchical correlation weighting parameter (alpha_cor) estimates:
#>            2.5%   50% 97.5% Rhat n_eff
#> alpha_cor 0.016 0.075  0.21    1   447
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 12:00:03 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'link')


# Inspect posterior estimates for the correlation weighting parameter
mcmc_plot(mod, variable = 'alpha_cor', type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# }
================================================ FILE: docs/reference/ZMVN.html ================================================ Specify correlated residual processes in mvgam — ZMVN • mvgam Skip to contents

Set up latent correlated multivariate Gaussian residual processes in mvgam. This function does not evaluate it's arguments – it exists purely to help set up a model with particular error processes.

Usage

ZMVN(unit = time, gr = NA, subgr = series)

Arguments

unit

The unquoted name of the variable that represents the unit of analysis in data over which latent residuals should be correlated. This variable should be either a numeric or integer variable in the supplied data. Defaults to time to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. See examples below for further details and explanations

gr

An optional grouping variable, which must be a factor in the supplied data, for setting up hierarchical residual correlation structures. If specified, this will automatically set up a model where the residual correlations for a specific level of gr are modelled hierarchically: \(\Omega_{group} = p\Omega_{global} + (1 - p)\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(p\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\). If gr is supplied then subgr must also be supplied

subgr

A subgrouping factor variable specifying which element in data represents the different observational units. Defaults to series to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. But note that models that use the hierarchical correlations (by supplying a value for gr) should not include a series element in data. Rather, this element will be created internally based on the supplied variables for gr and subgr. For example, if you are modelling counts for a group of species (labelled as species in the data) across sampling sites (labelled as site in the data) in three different geographical regions (labelled as region), and you would like the residuals to be correlated within regions, then you should specify unit = site, gr = region, and subgr = species. Internally, mvgam() will appropriately order the data by unit (in this case, by site) and create the series element for the data using something like: series = as.factor(paste0(group, '_', subgroup))

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Examples

# \donttest{
# Simulate counts of four species over ten sampling locations
site_dat <- data.frame(site = rep(1:10, 4),
                      species = as.factor(sort(rep(letters[1:4], 10))),
                      y = c(NA, rpois(39, 3)))
head(site_dat)
#>   site species  y
#> 1    1       a NA
#> 2    2       a  2
#> 3    3       a  0
#> 4    4       a  1
#> 5    5       a  3
#> 6    6       a  4

# Set up a correlated residual (i.e. Joint Species Distribution) model,
# where 'site' represents the unit of analysis
trend_model <- ZMVN(unit = site, subgr = species)
mod <- mvgam(y ~ species,
            trend_model = ZMVN(unit = site,
                               subgr = species),
            data = site_dat,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c7aea2ba7.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Inspect the estimated species-species residual covariances
mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


# A hierarchical correlation example; set up correlated counts
# for three species across two sampling locations
Sigma <- matrix(c(1, -0.4, 0.5,
                 -0.4, 1, 0.3,
                 0.5, 0.3, 1),
               byrow = TRUE,
               nrow = 3)
               Sigma
#>      [,1] [,2] [,3]
#> [1,]  1.0 -0.4  0.5
#> [2,] -0.4  1.0  0.3
#> [3,]  0.5  0.3  1.0

make_site_dat = function(...){
 errors <- mgcv::rmvn(n = 30,
                      mu = c(0.6, 0.8, 1.8),
                      V = Sigma)
 site_dat <- do.call(rbind, lapply(1:3, function(spec){
   data.frame(y = rpois(30,
                        lambda = exp(errors[, spec])),
              species = paste0('species',
                               spec),
              site = 1:30)
}))
site_dat
}

site_dat <- rbind(make_site_dat() %>%
                   dplyr::mutate(group = 'group1'),
                 make_site_dat() %>%
                   dplyr::mutate(group = 'group2')) %>%
   dplyr::mutate(species = as.factor(species),
                 group = as.factor(group))

# Fit the hierarchical correlated residual model
mod <- mvgam(y ~ species,
            trend_model = ZMVN(unit = site,
                               gr = group,
                               subgr = species),
            data = site_dat)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c51c52b08.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 2.3 seconds.
#> Chain 4 finished in 2.2 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 2.3 seconds.
#> Chain 3 finished in 2.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 2.3 seconds.
#> Total execution time: 2.5 seconds.
#> 

# Inspect the estimated species-species residual covariances
mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# }

================================================ FILE: docs/reference/add_residuals.mvgam.html ================================================ Calculate randomized quantile residuals for mvgam objects — add_residuals.mvgam • mvgam Skip to contents

Calculate randomized quantile residuals for mvgam objects

Usage

add_residuals(object, ...)

# S3 method for class 'mvgam'
add_residuals(object, ...)

Arguments

object

list object of class mvgam. See mvgam()

...

unused

Value

A list object of class mvgam with residuals included in the 'resids' slot

Details

For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution

================================================ FILE: docs/reference/add_tweedie_lines.html ================================================ Tweedie JAGS modifications — add_tweedie_lines • mvgam Skip to contents

Tweedie JAGS modifications

Usage

add_tweedie_lines(model_file, upper_bounds)

Arguments

model_file

A template JAGS model file to be modified

upper_bounds

Optional upper bounds for the truncated observation likelihood

Value

A modified JAGS model file

================================================ FILE: docs/reference/all_neon_tick_data.html ================================================ NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data • mvgam Skip to contents

A dataset containing timeseries of Amblyomma americanum and Ixodes scapularis nymph abundances at NEON sites

Usage

all_neon_tick_data

Format

A tibble/dataframe containing covariate information alongside the main fields of:

Year

Year of sampling

epiWeek

Epidemiological week of sampling

plot_ID

NEON plot ID for survey location

siteID

NEON site ID for survey location

amblyomma_americanum

Counts of A. americanum nymphs

ixodes_scapularis

Counts of I. scapularis nymphs

================================================ FILE: docs/reference/augment.mvgam.html ================================================ Augment an mvgam object's data — augment.mvgam • mvgam Skip to contents

Add fits and residuals to the data, implementing the generic augment from the package broom.

Usage

# S3 method for class 'mvgam'
augment(x, robust = FALSE, probs = c(0.025, 0.975), ...)

Arguments

x

An object of class mvgam.

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead.

probs

The percentiles to be computed by the quantile function.

...

Unused, included for generic consistency only.

Value

A list or tibble (see details) combining:

  • The data supplied to mvgam().

  • The outcome variable, named as .observed.

  • The fitted backcasts, along with their variability and credible bounds.

  • The residuals, along with their variability and credible bounds.

Details

A list is returned if class(x$obs_data) == 'list', otherwise a tibble is returned, but the contents of either object is the same.

The arguments robust and probs are applied to both the fit and residuals calls (see fitted.mvgam() and residuals.mvgam() for details).

Examples

# \donttest{
set.seed(0)
dat <- sim_mvgam(T = 80,
                 n_series = 3,
                 mu = 2,
                 trend_model = AR(p = 1),
                 prop_missing = 0.1,
                 prop_trend = 0.6)

mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6),
              data = dat$data_train,
              trend_model = AR(),
              family = poisson(),
              noncentred = TRUE,
              chains = 2,
              silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c7eb7bd2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

augment(mod1, robust = TRUE, probs = c(0.25, 0.75))
#> # A tibble: 180 × 14
#>        y season  year series    time .observed .fitted .fit.variability
#>    <int>  <int> <int> <fct>    <int>     <int>   <dbl>            <dbl>
#>  1     4      1     1 series_1     1         4    4.47             2.91
#>  2    NA      1     1 series_2     1        NA    6.00             3.72
#>  3     4      1     1 series_3     1         4    4.38             2.99
#>  4     5      2     1 series_1     2         5    4.59             2.89
#>  5     2      2     1 series_2     2         2    3.53             3.42
#>  6    NA      2     1 series_3     2        NA    4.46             3.69
#>  7     7      3     1 series_1     3         7    8.11             3.87
#>  8    12      3     1 series_2     3        12   11.2              5.49
#>  9     4      3     1 series_3     3         4    5.14             2.89
#> 10    39      4     1 series_1     4        39   36.0             28.1 
#> # ℹ 170 more rows
#> # ℹ 6 more variables: .fit.cred.low <dbl>, .fit.cred.high <dbl>, .resid <dbl>,
#> #   .resid.variability <dbl>, .resid.cred.low <dbl>, .resid.cred.high <dbl>
# }

================================================ FILE: docs/reference/code.html ================================================ Stan code and data objects for mvgam models — code • mvgam Skip to contents

Generate Stan code and data objects for mvgam models

Usage

code(object)

# S3 method for class 'mvgam_prefit'
stancode(object, ...)

# S3 method for class 'mvgam'
stancode(object, ...)

# S3 method for class 'mvgam_prefit'
standata(object, ...)

Arguments

object

An object of class mvgam or mvgam_prefit, returned from a call to mvgam

...

ignored

Value

Either a character string containing the fully commented Stan code to fit a mvgam model or a named list containing the data objects needed to fit the model in Stan.

Examples

# \donttest{
simdat <- sim_mvgam()
mod <- mvgam(y ~ s(season) +
               s(time, by = series),
             family = poisson(),
             data = simdat$data_train,
             run_model = FALSE)

# View Stan model code
stancode(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[9, 18] S1; // mgcv smooth penalty matrix S1
#>   matrix[9, 18] S2; // mgcv smooth penalty matrix S2
#>   matrix[9, 18] S3; // mgcv smooth penalty matrix S3
#>   matrix[9, 18] S4; // mgcv smooth penalty matrix S4
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 0, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 10] ~ multi_normal_prec(zero[2 : 10],
#>                                     S1[1 : 9, 1 : 9] * lambda[1]
#>                                     + S1[1 : 9, 10 : 18] * lambda[2]);
#>   
#>   // prior for s(time):seriesseries_1...
#>   b_raw[11 : 19] ~ multi_normal_prec(zero[11 : 19],
#>                                      S2[1 : 9, 1 : 9] * lambda[3]
#>                                      + S2[1 : 9, 10 : 18] * lambda[4]);
#>   
#>   // prior for s(time):seriesseries_2...
#>   b_raw[20 : 28] ~ multi_normal_prec(zero[20 : 28],
#>                                      S3[1 : 9, 1 : 9] * lambda[5]
#>                                      + S3[1 : 9, 10 : 18] * lambda[6]);
#>   
#>   // prior for s(time):seriesseries_3...
#>   b_raw[29 : 37] ~ multi_normal_prec(zero[29 : 37],
#>                                      S4[1 : 9, 1 : 9] * lambda[7]
#>                                      + S4[1 : 9, 10 : 18] * lambda[8]);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# View Stan model data
sdata <- standata(mod)
str(sdata)
#> List of 21
#>  $ y           : num [1:75, 1:3] 1 0 0 1 1 2 0 0 1 3 ...
#>  $ n           : int 75
#>  $ X           : num [1:225, 1:37] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:37] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:9, 1:18] 3.819 0.796 1.727 -0.323 2.686 ...
#>  $ zero        : num [1:37] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ S2          : num [1:9, 1:18] 8.555 -1.22 4.352 0.822 -5.594 ...
#>  $ S3          : num [1:9, 1:18] 8.555 -1.22 4.352 0.822 -5.594 ...
#>  $ S4          : num [1:9, 1:18] 8.555 -1.22 4.352 0.822 -5.594 ...
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 1.33
#>  $ ytimes      : int [1:75, 1:3] 1 4 7 10 13 16 19 22 25 28 ...
#>  $ n_series    : int 3
#>  $ sp          : Named num [1:8] 0.368 0.368 0.368 0.368 0.368 ...
#>   ..- attr(*, "names")= chr [1:8] "s(season)1" "s(season)2" "s(time):seriesseries_11" "s(time):seriesseries_12" ...
#>  $ y_observed  : num [1:75, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ total_obs   : int 225
#>  $ num_basis   : int 37
#>  $ n_sp        : num 8
#>  $ n_nonmissing: int 225
#>  $ obs_ind     : int [1:225] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ flat_ys     : num [1:225] 1 0 0 1 1 2 0 0 1 3 ...
#>  $ flat_xs     : num [1:225, 1:37] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:37] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "None"
# }

================================================ FILE: docs/reference/conditional_effects.mvgam.html ================================================ Display conditional effects of predictors for mvgam models — conditional_effects.mvgam • mvgam Skip to contents

Display conditional effects of one or more numeric and/or categorical predictors in models of class mvgam and jsdgam, including two-way interaction effects.

Usage

# S3 method for class 'mvgam'
conditional_effects(
  x,
  effects = NULL,
  type = "response",
  points = FALSE,
  rug = FALSE,
  ...
)

# S3 method for class 'mvgam_conditional_effects'
plot(x, plot = TRUE, ask = FALSE, ...)

# S3 method for class 'mvgam_conditional_effects'
print(x, ...)

Arguments

x

Object of class mvgam, jsdgam or mvgam_conditional_effects

effects

An optional character vector naming effects (main effects or interactions) for which to compute conditional plots. Interactions are specified by a : between variable names. If NULL (the default), plots are generated for all main effects and two-way interactions estimated in the model. When specifying effects manually, all two-way interactions (including grouping variables) may be plotted even if not originally modeled.

type

character specifying the scale of predictions. When this has the value link (default) the linear predictor is calculated on the link scale. If expected is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When response is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. Two special cases are also allowed: type latent_N will return the estimated latent abundances from an N-mixture distribution, while type detection will return the estimated detection probability from an N-mixture distribution

points

Logical. Indicates if the original data points should be added, but only if type == 'response'. Default is TRUE.

rug

Logical. Indicates if displays tick marks should be plotted on the axes to mark the distribution of raw data, but only if type == 'response'. Default is TRUE.

...

other arguments to pass to plot_predictions

plot

Logical; indicates if plots should be plotted directly in the active graphic device. Defaults to TRUE.

ask

Logical. Indicates if the user is prompted before a new page is plotted. Only used if plot is TRUE. Default is FALSE.

Value

conditional_effects returns an object of class mvgam_conditional_effects which is a named list with one slot per effect containing a ggplot object, which can be further customized using the ggplot2 package. The corresponding plot method will draw these plots in the active graphic device

Details

This function acts as a wrapper to the more flexible plot_predictions. When creating conditional_effects for a particular predictor (or interaction of two predictors), one has to choose the values of all other predictors to condition on. By default, the mean is used for continuous variables and the reference category is used for factors. Use plot_predictions to change these and create more bespoke conditional effects plots.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some data
simdat <- sim_mvgam(family = poisson(),
                    seasonality = 'hierarchical')

# Fit a model
mod <- mvgam(y ~ s(season, by = series, k = 5) + year:series,
             family = poisson(),
             data = simdat$data_train,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-16186d0b7d25.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 0.7 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 0.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.7 seconds.
#> Total execution time: 1.0 seconds.
#> 

# Plot all main effects on the response scale
conditional_effects(mod)



# Change the prediction interval to 70% using plot_predictions() argument
# 'conf_level'
conditional_effects(mod, conf_level = 0.7)



# Plot all main effects on the link scale
conditional_effects(mod, type = 'link')



# Works the same for smooth terms, including smooth interactions
set.seed(0)
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model
mod <- mvgam(y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6),
            data = dat,
            family = gaussian(),
            chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-16185a69130a.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 2.5 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 2.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.7 seconds.
#> Total execution time: 2.9 seconds.
#> 
conditional_effects(mod)



conditional_effects(mod, conf_level = 0.5, type = 'link')




# ggplot objects can be modified and combined with the help of many
# additional packages. Here is an example using the patchwork package

# Simulate some nonlinear data
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model
mod <- mvgam(y ~ s(x1, bs = 'moi') +
               te(x0, x2),
             data = dat,
             family = gaussian(),
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-1618244b33d7.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):      
#>                                         \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#> 
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Extract the list of ggplot conditional_effect plots
m <- plot(conditional_effects(mod), plot = FALSE)

# Add custom labels and arrange plots together using patchwork::wrap_plots()
library(patchwork)
#> Warning: package ‘patchwork’ was built under R version 4.4.2
library(ggplot2)
#> Warning: package ‘ggplot2’ was built under R version 4.4.2
wrap_plots(m[[1]] + labs(title = 's(x1, bs = "moi")'),
           m[[2]] + labs(title = 'te(x0, x2)'))

# }
================================================ FILE: docs/reference/dynamic.html ================================================ Defining dynamic coefficients in mvgam formulae — dynamic • mvgam Skip to contents

Set up time-varying (dynamic) coefficients for use in mvgam models. Currently, only low-rank Gaussian Process smooths are available for estimating the dynamics of the time-varying coefficient.

Usage

dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)

Arguments

variable

The variable that the dynamic smooth will be a function of

k

Optional number of basis functions for computing approximate GPs. If missing, k will be set as large as possible to accurately estimate the nonlinear function

rho

Either a positive numeric stating the length scale to be used for approximating the squared exponential Gaussian Process smooth (see gp.smooth for details) or missing, in which case the length scale will be estimated by setting up a Hilbert space approximate GP

stationary

Logical. If TRUE (the default) and rho is supplied, the latent Gaussian Process smooth will not have a linear trend component. If FALSE, a linear trend in the covariate is added to the Gaussian Process smooth. Leave at TRUE if you do not believe the coefficient is evolving with much trend, as the linear component of the basis functions can be hard to penalize to zero. This sometimes causes divergence issues in Stan. See gp.smooth for details. Ignored if rho is missing (in which case a Hilbert space approximate GP is used)

scale

Logical; If TRUE (the default) and rho is missing, predictors are scaled so that the maximum Euclidean distance between two points is 1. This often improves sampling speed and convergence. Scaling also affects the estimated length-scale parameters in that they resemble those of scaled predictors (not of the original predictors) if scale is TRUE.

Value

a list object for internal usage in 'mvgam'

Details

mvgam currently sets up dynamic coefficients as low-rank squared exponential Gaussian Process smooths via the call s(time, by = variable, bs = "gp", m = c(2, rho, 2)). These smooths, if specified with reasonable values for the length scale parameter, will give more realistic out of sample forecasts than standard splines such as thin plate or cubic. But the user must set the value for rho, as there is currently no support for estimating this value in mgcv. This may not be too big of a problem, as estimating latent length scales is often difficult anyway. The rho parameter should be thought of as a prior on the smoothness of the latent dynamic coefficient function (where higher values of rho lead to smoother functions with more temporal covariance structure. Values of k are set automatically to ensure enough basis functions are used to approximate the expected wiggliness of the underlying dynamic function (k will increase as rho decreases)

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate a time-varying coefficient
# (as a Gaussian Process with length scale = 10)
set.seed(1111)
N <- 200

# A function to simulate from a squared exponential Gaussian Process
sim_gp = function(N, c, alpha, rho){
 Sigma <- alpha ^ 2 *
          exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) +
          diag(1e-9, N)
c + mgcv::rmvn(1,
               mu = rep(0, N),
               V = Sigma)
}

beta <- sim_gp(alpha = 0.75,
              rho = 10,
              c = 0.5,
              N = N)
plot(beta, type = 'l', lwd = 3,
    bty = 'l', xlab = 'Time',
    ylab = 'Coefficient',
    col = 'darkred')


# Simulate the predictor as a standard normal
predictor <- rnorm(N, sd = 1)

# Simulate a Gaussian outcome variable
out <- rnorm(N, mean = 4 + beta * predictor,
            sd = 0.25)
time <- seq_along(predictor)
plot(out,  type = 'l', lwd = 3,
    bty = 'l', xlab = 'Time', ylab = 'Outcome',
    col = 'darkred')


# Gather into a data.frame and fit a dynamic coefficient model
data <- data.frame(out, predictor, time)

# Split into training and testing
data_train <- data[1:190,]
data_test <- data[191:200,]

# Fit a model using the dynamic function
mod <- mvgam(out ~
             # mis-specify the length scale slightly as this
             # won't be known in practice
             dynamic(predictor, rho = 8, stationary = TRUE),
            family = gaussian(),
            data = data_train,
            chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3e562719.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.8 seconds.
#> Chain 2 finished in 0.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.8 seconds.
#> Total execution time: 1.0 seconds.
#> 

# Inspect the summary
summary(mod)
#> GAM formula:
#> out ~ s(time, by = predictor, bs = "gp", m = c(-2, 8, 2), k = 27)
#> <environment: 0x000001d631e8b030>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.22 0.24  0.27    1   717
#> 
#> GAM coefficient (beta) estimates:
#>                         2.5%    50%  97.5% Rhat n_eff
#> (Intercept)           3.9000  4.000  4.000 1.00   789
#> s(time):predictor.1  -0.5100 -0.022  0.520 1.02   145
#> s(time):predictor.2   0.6200  0.660  0.710 1.01   501
#> s(time):predictor.3   0.1600  0.340  0.540 1.02   158
#> s(time):predictor.4  -0.3400 -0.290 -0.240 1.00   862
#> s(time):predictor.5   0.0130  0.130  0.250 1.02   150
#> s(time):predictor.6  -0.7200 -0.670 -0.620 1.00   851
#> s(time):predictor.7  -0.3600 -0.270 -0.170 1.01   182
#> s(time):predictor.8  -0.2300 -0.170 -0.120 1.00   835
#> s(time):predictor.9   0.1500  0.250  0.340 1.02   193
#> s(time):predictor.10 -0.1900 -0.130 -0.069 1.01   721
#> s(time):predictor.11  0.0120  0.100  0.200 1.01   232
#> s(time):predictor.12 -0.4300 -0.360 -0.280 1.00   884
#> s(time):predictor.13  0.0080  0.110  0.220 1.01   274
#> s(time):predictor.14 -0.2600 -0.160 -0.081 1.00   725
#> s(time):predictor.15 -0.0730  0.051  0.170 1.02   212
#> s(time):predictor.16  0.0095  0.130  0.230 1.00   669
#> s(time):predictor.17  0.0820  0.220  0.380 1.00   451
#> s(time):predictor.18 -0.0870  0.042  0.180 1.00   904
#> s(time):predictor.19  0.0150  0.200  0.380 1.00   591
#> s(time):predictor.20 -0.2400 -0.050  0.150 1.01   715
#> s(time):predictor.21 -0.1900  0.039  0.280 1.00   506
#> s(time):predictor.22 -0.4300 -0.180  0.120 1.00   788
#> s(time):predictor.23 -0.0550  0.260  0.560 1.00   689
#> s(time):predictor.24 -0.4200 -0.100  0.250 1.00   752
#> s(time):predictor.25 -0.2300  0.180  0.620 1.01   890
#> s(time):predictor.26 -0.7000 -0.230  0.220 1.00   825
#> s(time):predictor.27 -0.1000  0.470  1.100 1.02   144
#> 
#> Approximate significance of GAM smooths:
#>                    edf Ref.df Chi.sq p-value    
#> s(time):predictor 14.1     27    272  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:28:07 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

# Plot the time-varying coefficient estimates
plot(mod, type = 'smooths')


# Extrapolate the coefficient forward in time
plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = 'dashed', lwd = 2)

# Overlay the true simulated time-varying coefficient
lines(beta, lwd = 2.5, col = 'white')
lines(beta, lwd = 2)

# }
================================================ FILE: docs/reference/ensemble.mvgam_forecast.html ================================================ Combine forecasts from mvgam models into evenly weighted ensembles — ensemble.mvgam_forecast • mvgam Skip to contents

Generate evenly weighted ensemble forecast distributions from mvgam_forecast objects

Usage

ensemble(object, ...)

# S3 method for class 'mvgam_forecast'
ensemble(object, ..., ndraws = 5000)

Arguments

object

list object of class mvgam_forecast. See forecast.mvgam()

...

More mvgam_forecast objects.

ndraws

Positive integer specifying the number of draws to use from each forecast distribution for creating the ensemble. If some of the ensemble members have fewer draws than ndraws, their forecast distributions will be resampled with replacement to achieve the correct number of draws

Value

An object of class mvgam_forecast containing the ensemble predictions. This object can be readily used with the supplied S3 functions plot and score

Details

It is widely recognised in the forecasting literature that combining forecasts from different models often results in improved forecast accuracy. The simplest way to create an ensemble is to use evenly weighted combinations of forecasts from the different models. This is straightforward to do in a Bayesian setting with mvgam as the posterior MCMC draws contained in each mvgam_forecast object will already implicitly capture correlations among the temporal posterior predictions.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some series and fit a few competing dynamic models
set.seed(1)
simdat <- sim_mvgam(n_series = 1,
                    prop_trend = 0.6,
                    mu = 1)

plot_mvgam_series(data = simdat$data_train,
                 newdata = simdat$data_test)


m1 <- mvgam(y ~ 1,
            trend_formula = ~ time +
              s(season, bs = 'cc', k = 9),
            trend_model = AR(p = 1),
            noncentred = TRUE,
            data = simdat$data_train,
            newdata = simdat$data_test,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c74e52e4c.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

m2 <- mvgam(y ~ time,
            trend_model = RW(),
            noncentred = TRUE,
            data = simdat$data_train,
            newdata = simdat$data_test,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c1777634c.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Calculate forecast distributions for each model
fc1 <- forecast(m1)
fc2 <- forecast(m2)

# Generate the ensemble forecast
ensemble_fc <- ensemble(fc1, fc2)

# Plot forecasts
plot(fc1)
#> Out of sample DRPS:
#> 42.791261

plot(fc2)
#> Out of sample DRPS:
#> 48.658331

plot(ensemble_fc)
#> Out of sample DRPS:
#> 44.3088696


# Score forecasts
score(fc1)
#> $series_1
#>       score in_interval interval_width eval_horizon score_type
#> 1  3.433228           1            0.9            1       crps
#> 2  1.105110           1            0.9            2       crps
#> 3  1.995765           1            0.9            3       crps
#> 4  1.447371           1            0.9            4       crps
#> 5  0.977251           1            0.9            5       crps
#> 6  1.419407           1            0.9            6       crps
#> 7  2.022923           1            0.9            7       crps
#> 8  1.727456           1            0.9            8       crps
#> 9  1.147758           1            0.9            9       crps
#> 10 1.200552           1            0.9           10       crps
#> 11 1.318235           1            0.9           11       crps
#> 12 1.700447           1            0.9           12       crps
#> 13 1.018621           1            0.9           13       crps
#> 14 0.988125           1            0.9           14       crps
#> 15 0.896207           1            0.9           15       crps
#> 16 1.007965           1            0.9           16       crps
#> 17 1.042871           1            0.9           17       crps
#> 18 4.183073           1            0.9           18       crps
#> 19 1.335348           1            0.9           19       crps
#> 20 2.246848           1            0.9           20       crps
#> 21 1.614939           1            0.9           21       crps
#> 22 1.568674           1            0.9           22       crps
#> 23 4.672876           1            0.9           23       crps
#> 24 1.362186           1            0.9           24       crps
#> 25 1.358025           1            0.9           25       crps
#> 
#> $all_series
#>       score eval_horizon score_type
#> 1  3.433228            1   sum_crps
#> 2  1.105110            2   sum_crps
#> 3  1.995765            3   sum_crps
#> 4  1.447371            4   sum_crps
#> 5  0.977251            5   sum_crps
#> 6  1.419407            6   sum_crps
#> 7  2.022923            7   sum_crps
#> 8  1.727456            8   sum_crps
#> 9  1.147758            9   sum_crps
#> 10 1.200552           10   sum_crps
#> 11 1.318235           11   sum_crps
#> 12 1.700447           12   sum_crps
#> 13 1.018621           13   sum_crps
#> 14 0.988125           14   sum_crps
#> 15 0.896207           15   sum_crps
#> 16 1.007965           16   sum_crps
#> 17 1.042871           17   sum_crps
#> 18 4.183073           18   sum_crps
#> 19 1.335348           19   sum_crps
#> 20 2.246848           20   sum_crps
#> 21 1.614939           21   sum_crps
#> 22 1.568674           22   sum_crps
#> 23 4.672876           23   sum_crps
#> 24 1.362186           24   sum_crps
#> 25 1.358025           25   sum_crps
#> 
score(fc2)
#> $series_1
#>       score in_interval interval_width eval_horizon score_type
#> 1  4.171226           1            0.9            1       crps
#> 2  1.547056           1            0.9            2       crps
#> 3  3.124789           0            0.9            3       crps
#> 4  0.999647           1            0.9            4       crps
#> 5  2.435121           1            0.9            5       crps
#> 6  2.610246           1            0.9            6       crps
#> 7  1.928458           1            0.9            7       crps
#> 8  0.961976           1            0.9            8       crps
#> 9  1.625352           1            0.9            9       crps
#> 10 1.302485           1            0.9           10       crps
#> 11 1.584327           1            0.9           11       crps
#> 12 1.754281           1            0.9           12       crps
#> 13 1.051481           1            0.9           13       crps
#> 14 1.081281           1            0.9           14       crps
#> 15 1.217669           1            0.9           15       crps
#> 16 1.177959           1            0.9           16       crps
#> 17 1.259938           1            0.9           17       crps
#> 18 2.725221           1            0.9           18       crps
#> 19 1.230060           1            0.9           19       crps
#> 20 3.820228           1            0.9           20       crps
#> 21 2.635251           1            0.9           21       crps
#> 22 1.648463           1            0.9           22       crps
#> 23 4.030445           1            0.9           23       crps
#> 24 1.360764           1            0.9           24       crps
#> 25 1.374607           1            0.9           25       crps
#> 
#> $all_series
#>       score eval_horizon score_type
#> 1  4.171226            1   sum_crps
#> 2  1.547056            2   sum_crps
#> 3  3.124789            3   sum_crps
#> 4  0.999647            4   sum_crps
#> 5  2.435121            5   sum_crps
#> 6  2.610246            6   sum_crps
#> 7  1.928458            7   sum_crps
#> 8  0.961976            8   sum_crps
#> 9  1.625352            9   sum_crps
#> 10 1.302485           10   sum_crps
#> 11 1.584327           11   sum_crps
#> 12 1.754281           12   sum_crps
#> 13 1.051481           13   sum_crps
#> 14 1.081281           14   sum_crps
#> 15 1.217669           15   sum_crps
#> 16 1.177959           16   sum_crps
#> 17 1.259938           17   sum_crps
#> 18 2.725221           18   sum_crps
#> 19 1.230060           19   sum_crps
#> 20 3.820228           20   sum_crps
#> 21 2.635251           21   sum_crps
#> 22 1.648463           22   sum_crps
#> 23 4.030445           23   sum_crps
#> 24 1.360764           24   sum_crps
#> 25 1.374607           25   sum_crps
#> 
score(ensemble_fc)
#> $series_1
#>        score in_interval interval_width eval_horizon score_type
#> 1  3.8263402           1            0.9            1       crps
#> 2  1.3036929           1            0.9            2       crps
#> 3  2.5519035           1            0.9            3       crps
#> 4  1.0739452           1            0.9            4       crps
#> 5  1.6133805           1            0.9            5       crps
#> 6  1.9180633           1            0.9            6       crps
#> 7  1.9954492           1            0.9            7       crps
#> 8  1.2471014           1            0.9            8       crps
#> 9  1.2738834           1            0.9            9       crps
#> 10 1.1792346           1            0.9           10       crps
#> 11 1.4541516           1            0.9           11       crps
#> 12 1.6823855           1            0.9           12       crps
#> 13 1.0212761           1            0.9           13       crps
#> 14 1.0489788           1            0.9           14       crps
#> 15 0.9839557           1            0.9           15       crps
#> 16 0.9631836           1            0.9           16       crps
#> 17 0.9966610           1            0.9           17       crps
#> 18 3.4294286           1            0.9           18       crps
#> 19 1.2707383           1            0.9           19       crps
#> 20 2.9218745           1            0.9           20       crps
#> 21 2.0465904           1            0.9           21       crps
#> 22 1.4433498           1            0.9           22       crps
#> 23 4.2497062           1            0.9           23       crps
#> 24 1.3876704           1            0.9           24       crps
#> 25 1.4259249           1            0.9           25       crps
#> 
#> $all_series
#>        score eval_horizon score_type
#> 1  3.8263402            1   sum_crps
#> 2  1.3036929            2   sum_crps
#> 3  2.5519035            3   sum_crps
#> 4  1.0739452            4   sum_crps
#> 5  1.6133805            5   sum_crps
#> 6  1.9180633            6   sum_crps
#> 7  1.9954492            7   sum_crps
#> 8  1.2471014            8   sum_crps
#> 9  1.2738834            9   sum_crps
#> 10 1.1792346           10   sum_crps
#> 11 1.4541516           11   sum_crps
#> 12 1.6823855           12   sum_crps
#> 13 1.0212761           13   sum_crps
#> 14 1.0489788           14   sum_crps
#> 15 0.9839557           15   sum_crps
#> 16 0.9631836           16   sum_crps
#> 17 0.9966610           17   sum_crps
#> 18 3.4294286           18   sum_crps
#> 19 1.2707383           19   sum_crps
#> 20 2.9218745           20   sum_crps
#> 21 2.0465904           21   sum_crps
#> 22 1.4433498           22   sum_crps
#> 23 4.2497062           23   sum_crps
#> 24 1.3876704           24   sum_crps
#> 25 1.4259249           25   sum_crps
#> 
# }
================================================ FILE: docs/reference/evaluate_mvgams.html ================================================ Evaluate forecasts from fitted mvgam objects — evaluate_mvgams • mvgam Skip to contents

Evaluate forecasts from fitted mvgam objects

Usage

eval_mvgam(
  object,
  n_samples = 5000,
  eval_timepoint = 3,
  fc_horizon = 3,
  n_cores = 1,
  score = "drps",
  log = FALSE,
  weights
)

roll_eval_mvgam(
  object,
  n_evaluations = 5,
  evaluation_seq,
  n_samples = 5000,
  fc_horizon = 3,
  n_cores = 1,
  score = "drps",
  log = FALSE,
  weights
)

compare_mvgams(
  model1,
  model2,
  n_samples = 1000,
  fc_horizon = 3,
  n_evaluations = 10,
  n_cores = 1,
  score = "drps",
  log = FALSE,
  weights
)

Arguments

object

list object returned from mvgam

n_samples

integer specifying the number of samples to generate from the model's posterior distribution

eval_timepoint

integer indexing the timepoint that represents our last 'observed' set of outcome data

fc_horizon

integer specifying the length of the forecast horizon for evaluating forecasts

n_cores

Deprecated. Parallel processing is no longer supported

score

character specifying the type of ranked probability score to use for evaluation. Options are: variogram, drps or crps

log

logical. Should the forecasts and truths be logged prior to scoring? This is often appropriate for comparing performance of models when series vary in their observation ranges

weights

optional vector of weights (where length(weights) == n_series) for weighting pairwise correlations when evaluating the variogram score for multivariate forecasts. Useful for down-weighting series that have larger magnitude observations or that are of less interest when forecasting. Ignored if score != 'variogram'

n_evaluations

integer specifying the total number of evaluations to perform

evaluation_seq

Optional integer sequence specifying the exact set of timepoints for evaluating the model's forecasts. This sequence cannot have values <3 or > max(training timepoints) - fc_horizon

model1

list object returned from mvgam representing the first model to be evaluated

model2

list object returned from mvgam representing the second model to be evaluated

Value

For eval_mvgam, a list object containing information on specific evaluations for each series (if using drps or crps as the score) or a vector of scores when using variogram.

For roll_eval_mvgam, a list object containing information on specific evaluations for each series as well as a total evaluation summary (taken by summing the forecast score for each series at each evaluation and averaging the coverages at each evaluation)

For compare_mvgams, a series of plots comparing forecast Rank Probability Scores for each competing model. A lower score is preferred. Note however that it is possible to select a model that ultimately would perform poorly in true out-of-sample forecasting. For example if a wiggly smooth function of 'year' is included in the model then this function will be learned prior to evaluating rolling window forecasts, and the model could generate very tight predictions as a result. But when forecasting ahead to timepoints that the model has not seen (i.e. next year), the smooth function will end up extrapolating, sometimes in very strange and unexpected ways. It is therefore recommended to only use smooth functions for covariates that are adequately measured in the data (i.e. 'seasonality', for example) to reduce possible extrapolation of smooths and let the latent trends in the mvgam model capture any temporal dependencies in the data. These trends are time series models and so will provide much more stable forecasts

Details

eval_mvgam may be useful when both repeated fitting of a model using update.mvgam for exact leave-future-out cross-validation and approximate leave-future-out cross-validation using lfo_cv are impractical. The function generates a set of samples representing fixed parameters estimated from the full mvgam model and latent trend states at a given point in time. The trends are rolled forward a total of fc_horizon timesteps according to their estimated state space dynamics to generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window. This function therefore simulates a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts from the latent trends that have been observed up to that timepoint. Evaluation involves calculating an appropriate Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast's 90% prediction interval

roll_eval_mvgam sets up a sequence of evaluation timepoints along a rolling window and iteratively calls eval_mvgam to evaluate 'out-of-sample' forecasts. Evaluation involves calculating the Rank Probability Scores and a binary indicator for whether or not the true value lies within the forecast's 90% prediction interval

compare_mvgams automates the evaluation to compare two fitted models using rolling window forecast evaluation and provides a series of summary plots to facilitate model selection. It is essentially a wrapper for roll_eval_mvgam

See also

Examples

# \donttest{
# Simulate from a Poisson-AR2 model with a seasonal smooth
set.seed(1)
dat <- sim_mvgam(T = 75,
                n_series = 1,
                prop_trend = 0.75,
                trend_model = AR(p = 2),
                family = poisson())


# Fit an appropriate model
mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'),
                trend_model = AR(p = 2),
                family = poisson(),
                data = dat$data_train,
                newdata = dat$data_test,
                chains = 2,
                silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c18ee5221.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Fit a less appropriate model
mod_rw <- mvgam(y ~ 1,
               trend_model = RW(),
               family = poisson(),
               data = dat$data_train,
               newdata = dat$data_test,
               chains = 2,
               silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c440561df.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Compare Discrete Ranked Probability Scores for the testing period
fc_ar2 <- forecast(mod_ar2)
fc_rw <- forecast(mod_rw)
score_ar2 <- score(fc_ar2, score = 'drps')
score_rw <- score(fc_rw, score = 'drps')
sum(score_ar2$series_1$score)
#> [1] 19.81778
sum(score_rw$series_1$score)
#> [1] 145.5815

# Use rolling evaluation for approximate comparisons of 3-step ahead
# forecasts across the training period
compare_mvgams(mod_ar2,
              mod_rw,
              fc_horizon = 3,
              n_samples = 1000,
              n_evaluations = 5)
#> RPS summaries per model (lower is better)
#>             Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> Model 1 0.990209 1.546376 2.102542 2.439294 3.163837 4.225131
#> Model 2 1.883123 2.212336 2.541548 2.692843 3.097704 3.653859
#> 
#> 90% interval coverages per model (closer to 0.9 is better)
#> Model 1 0.9333333 
#> Model 2 1




# Now use approximate leave-future-out CV to compare
# rolling forecasts; start at time point 40 to reduce
# computational time and to ensure enough data is available
# for estimating model parameters
lfo_ar2 <- lfo_cv(mod_ar2,
                 min_t = 40,
                 fc_horizon = 3,
                 silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c4eb121c4.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8cc7665f6.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
lfo_rw <- lfo_cv(mod_rw,
                min_t = 40,
                fc_horizon = 3,
                silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3d242b0.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c52257764.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c4b652e6f.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Plot Pareto-K values and ELPD estimates
plot(lfo_ar2)

plot(lfo_rw)


# Proportion of timepoints in which AR2 model gives
# better forecasts
length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) /
      length(lfo_ar2$elpds)
#> [1] 0.6923077

# A higher total ELPD is preferred
lfo_ar2$sum_ELPD
#> [1] -59.60248
lfo_rw$sum_ELPD
#> [1] -76.47689
# }
================================================ FILE: docs/reference/fevd.mvgam.html ================================================ Calculate latent VAR forecast error variance decompositions — fevd.mvgam • mvgam Skip to contents

Compute forecast error variance decompositions from mvgam models with Vector Autoregressive dynamics

Usage

fevd(object, ...)

# S3 method for class 'mvgam'
fevd(object, h = 1, ...)

Arguments

object

list object of class mvgam resulting from a call to mvgam() that used a Vector Autoregressive latent process model (either as VAR(cor = FALSE) or VAR(cor = TRUE))

...

ignored

h

Positive integer specifying the forecast horizon over which to calculate the IRF

Value

An object of class mvgam_fevd containing the posterior forecast error variance decompositions. This object can be used with the supplied S3 functions plot

Details

A forecast error variance decomposition is useful for quantifying the amount of information each series that in a Vector Autoregression contributes to the forecast distributions of the other series in the autoregression. This function calculates the forecast error variance decomposition using the orthogonalised impulse response coefficient matrices \(\Psi_h\), which can be used to quantify the contribution of series \(j\) to the h-step forecast error variance of series \(k\): $$ \sigma_k^2(h) = \sum_{j=1}^K(\psi_{kj, 0}^2 + \ldots + \psi_{kj, h-1}^2) \quad $$ If the orthogonalised impulse reponses \((\psi_{kj, 0}^2 + \ldots + \psi_{kj, h-1}^2)\) are divided by the variance of the forecast error \(\sigma_k^2(h)\), this yields an interpretable percentage representing how much of the forecast error variance for \(k\) can be explained by an exogenous shock to \(j\).

References

Lütkepohl, H (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.

See also

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some time series that follow a latent VAR(1) process
simdat <- sim_mvgam(family = gaussian(),
                    n_series = 4,
                    trend_model = VAR(cor = TRUE),
                    prop_trend = 1)
plot_mvgam_series(data = simdat$data_train, series = 'all')


# Fit a model that uses a latent VAR(1)
mod <- mvgam(y ~ -1,
             trend_formula = ~ 1,
             trend_model = VAR(cor = TRUE),
             family = gaussian(),
             data = simdat$data_train,
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c280a7c3f.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Calulate forecast error variance decompositions for each series
fevds <- fevd(mod, h = 12)

# Plot median contributions to forecast error variance
plot(fevds)

# }
================================================ FILE: docs/reference/fitted.mvgam.html ================================================ Expected values of the posterior predictive distribution for mvgam objects — fitted.mvgam • mvgam Skip to contents

This method extracts posterior estimates of the fitted values (i.e. the actual predictions, included estimates for any trend states, that were obtained when fitting the model). It also includes an option for obtaining summaries of the computed draws.

Usage

# S3 method for class 'mvgam'
fitted(
  object,
  process_error = TRUE,
  scale = c("response", "linear"),
  summary = TRUE,
  robust = FALSE,
  probs = c(0.025, 0.975),
  ...
)

Arguments

object

An object of class mvgam

process_error

Logical. If TRUE and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from the latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions

scale

Either "response" or "linear". If "response", results are returned on the scale of the response variable. If "linear", results are returned on the scale of the linear predictor term, that is without applying the inverse link function or other transformations.

summary

Should summary statistics be returned instead of the raw values? Default is TRUE..

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

...

Further arguments passed to prepare_predictions that control several aspects of data validation and prediction.

Value

An array of predicted mean response values. If summary = FALSE the output resembles those of posterior_epred.mvgam and predict.mvgam.

If summary = TRUE the output is an n_observations x E matrix. The number of summary statistics E is equal to 2 + length(probs): The Estimate column contains point estimates (either mean or median depending on argument robust), while the Est.Error column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument robust). The remaining columns starting with Q contain quantile estimates as specified via argument probs.

Details

This method gives the actual fitted values from the model (i.e. what you will see if you generate hindcasts from the fitted model using hindcast.mvgam with type = 'expected'). These predictions can be overly precise if a flexible dynamic trend component was included in the model. This is in contrast to the set of predict functions (i.e. posterior_epred.mvgam or predict.mvgam), which will assume any dynamic trend component has reached stationarity when returning hypothetical predictions

See also

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
            trend_model = AR(),
            data = simdat$data_train,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c198135d6.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Extract fitted values (posterior expectations)
expectations <- fitted(mod)
str(expectations)
#>  num [1:75, 1:4] 0.264 0.254 0.467 0.498 1.066 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
# }
================================================ FILE: docs/reference/forecast.mvgam.html ================================================ Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam • mvgam Skip to contents

Extract or compute hindcasts and forecasts for a fitted mvgam object

Usage

forecast(object, ...)

# S3 method for mvgam
forecast(object, newdata, data_test, n_cores = 1, type = "response", ...)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

...

Ignored

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If included, the covariate information in newdata will be used to generate forecasts from the fitted model equations. If this same newdata was originally included in the call to mvgam, then forecasts have already been produced by the generative model and these will simply be extracted and plotted. However if no newdata was supplied to the original model call, an assumption is made that the newdata supplied here comes sequentially after the data supplied in the original model (i.e. we assume there is no time gap between the last observation of series 1 in the original data and the first observation for series 1 in newdata)

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

n_cores

Deprecated. Parallel processing is no longer supported

type

When this has the value link (default) the linear predictor is calculated on the link scale. If expected is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When response is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When variance is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When type = "terms", each component of the linear predictor is returned separately in the form of a list (possibly with standard errors, if summary = TRUE): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type latent_N will return the estimated latent abundances from an N-mixture distribution, while type detection will return the estimated detection probability from an N-mixture distribution

Value

An object of class mvgam_forecast containing hindcast and forecast distributions. See mvgam_forecast-class for details.

Details

Posterior predictions are drawn from the fitted mvgam and used to simulate a forecast distribution

See also

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 3, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2,
             silent = 2)

# Hindcasts on response scale
hc <- hindcast(mod)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x000002e052d652a8> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr [1:3] "series_1" "series_2" "series_3"
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 1 0 0 0 0 1 0 1 0 2 ...
#>   ..$ series_2: int [1:75] 5 1 0 0 0 0 0 0 0 0 ...
#>   ..$ series_3: int [1:75] 3 1 0 0 0 0 2 0 0 2 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 2 1 4 0 1 2 2 1 1 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] 3 1 2 2 3 0 0 2 3 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 2 0 1 1 0 5 1 2 3 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(hc, series = 1)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 2)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 3)
#> No non-missing values in test_observations; cannot calculate forecast score


# Forecasts on response scale
fc <- forecast(mod,
               newdata = simdat$data_test)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x000002e052d652a8> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 1 0 0 0 0 1 0 1 0 2 ...
#>   ..$ series_2: int [1:75] 5 1 0 0 0 0 0 0 0 0 ...
#>   ..$ series_3: int [1:75] 3 1 0 0 0 0 2 0 0 2 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:25] 1 0 0 0 0 1 0 1 3 0 ...
#>   ..$ series_2: int [1:25] 0 2 0 1 1 0 1 1 2 2 ...
#>   ..$ series_3: int [1:25] 1 0 2 3 1 1 2 3 15 1 ...
#>  $ test_times        : int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 2 1 4 0 1 2 2 1 1 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] 3 1 2 2 3 0 0 2 3 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 2 0 1 1 0 5 1 2 3 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: int [1:1000, 1:25] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ series_2: int [1:1000, 1:25] 1 1 0 0 0 0 0 0 0 0 ...
#>   ..$ series_3: int [1:1000, 1:25] 0 0 2 0 1 1 0 3 0 0 ...
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(fc, series = 1)
#> Out of sample DRPS:
#> 9.396333

plot(fc, series = 2)
#> Out of sample DRPS:
#> 7.449717

plot(fc, series = 3)
#> Out of sample DRPS:
#> 31.423124


# Forecasts as expectations
fc <- forecast(mod,
               newdata = simdat$data_test,
               type = 'expected')
plot(fc, series = 1)

plot(fc, series = 2)

plot(fc, series = 3)


# Dynamic trend extrapolations
fc <- forecast(mod,
               newdata = simdat$data_test,
               type = 'trend')
plot(fc, series = 1)

plot(fc, series = 2)

plot(fc, series = 3)

# }
================================================ FILE: docs/reference/formula.mvgam.html ================================================ Extract formulae from mvgam objects — formula.mvgam • mvgam Skip to contents

Extract formulae from mvgam objects

Usage

# S3 method for class 'mvgam'
formula(x, trend_effects = FALSE, ...)

# S3 method for class 'mvgam_prefit'
formula(x, trend_effects = FALSE, ...)

Arguments

x

mvgam, jsdgam or mvgam_prefit object

trend_effects

logical, return the formula from the observation model (if FALSE) or from the underlying process model (ifTRUE)

...

Ignored

Value

A formula object

Author

Nicholas J Clark

================================================ FILE: docs/reference/get_monitor_pars.html ================================================ Return parameters to monitor during modelling — get_monitor_pars • mvgam Skip to contents

Return parameters to monitor during modelling

Usage

get_monitor_pars(family, smooths_included = TRUE, use_lv, trend_model, drift)

Arguments

family

character

smooths_included

Logical. Are smooth terms included in the model formula?

use_lv

Logical (use latent variable trends or not)

trend_model

The type of trend model used

drift

Logical (was a drift term estimated or not)

Value

A string of parameters to monitor

================================================ FILE: docs/reference/get_mvgam_priors.html ================================================ Extract information on default prior distributions for an mvgam model — get_mvgam_priors • mvgam Skip to contents

This function lists the parameters that can have their prior distributions changed for a given model, as well listing their default distributions

Usage

get_mvgam_priors(
  formula,
  trend_formula,
  factor_formula,
  knots,
  trend_knots,
  trend_model = "None",
  family = poisson(),
  data,
  unit = time,
  species = series,
  use_lv = FALSE,
  n_lv,
  trend_map,
  ...
)

Arguments

formula

A formula object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, s(), te(), ti(), t2(), as well as time-varying dynamic() terms, nonparametric gp() terms and offsets using offset(), can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In nmix() family models, the formula is used to set up a linear predictor for the detection probability. Details of the formula syntax used by mvgam can be found in mvgam_formulae

trend_formula

An optional formula object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. Should not have a response variable specified on the left-hand side of the formula (i.e. a valid option would be ~ season + s(year)). Also note that you should not use the identifier series in this formula to specify effects that vary across time series. Instead you should use trend. This will ensure that models in which a trend_map is supplied will still work consistently (i.e. by allowing effects to vary across process models, even when some time series share the same underlying process model). This feature is only currently available for RW(), AR() and VAR() trend models. In nmix() family models, the trend_formula is used to set up a linear predictor for the underlying latent abundance. Be aware that it can be very challenging to simultaneously estimate intercept parameters for both the observation mode (captured by formula) and the process model (captured by trend_formula). Users are recommended to drop one of these using the - 1 convention in the formula right hand side.

factor_formula

Can be supplied instead trend_formula to match syntax from jsdgam

knots

An optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

trend_knots

As for knots above, this is an optional list of knot values for smooth functions within the trend_formula

trend_model

character or function specifying the time series dynamics for the latent trend. Options are:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • ZMVN or ZMVN() (Zero-Mean Multivariate Normal; only available in Stan)

  • 'RW' or RW()

  • 'AR1' or AR(p = 1)

  • 'AR2' or AR(p = 2)

  • 'AR3' or AR(p = 3)

  • 'CAR1' or CAR(p = 1)

  • 'VAR1' or VAR()(only available in Stan)

  • 'PWlogistic, 'PWlinear' or PW() (only available in Stan)

  • 'GP' or GP() (Gaussian Process with squared exponential kernel; only available in Stan)

For all trend types apart from ZMVN(), GP(), CAR() and PW(), moving average and/or correlated process error terms can also be estimated (for example, RW(cor = TRUE) will set up a multivariate Random Walk if n_series > 1). It is also possible for many multivariate trends to estimate hierarchical correlations if the data are structured among levels of a relevant grouping factor. See mvgam_trends for more details and see ZMVN for an example.

family

family specifying the exponential observation family for the series. Currently supported families are:

  • gaussian() for real-valued data

  • betar() for proportional data on (0,1)

  • lognormal() for non-negative real-valued data

  • student_t() for real-valued data

  • Gamma() for non-negative real-valued data

  • bernoulli() for binary data

  • poisson() for count data

  • nb() for overdispersed count data

  • binomial() for count data with imperfect detection when the number of trials is known; note that the cbind() function must be used to bind the discrete observations and the discrete number of trials

  • beta_binomial() as for binomial() but allows for overdispersion

  • nmix() for count data with imperfect detection when the number of trials is unknown and should be modeled via a State-Space N-Mixture model. The latent states are Poisson, capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. See mvgam_families for an example of how to use this family

Default is poisson(). See mvgam_families for more details

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and optional trend_formula. Most models should include columns:

  • series (a factor index of the series IDs; the number of levels should be identical to the number of unique series labels (i.e. n_series = length(levels(data$series))))

  • time (numeric or integer index of the time point for each observation). For most dynamic trend types available in mvgam (see argument trend_model), time should be measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)). However you can use irregularly spaced intervals if using trend_model = CAR(1), though note that any temporal intervals that are exactly 0 will be adjusted to a very small number (1e-12) to prevent sampling errors. See an example of CAR() trends in CAR

Note however that there are special cases where these identifiers are not needed. For example, models with hierarchical temporal correlation processes (e.g. AR(gr = region, subgr = species)) should NOT include a series identifier, as this will be constructed internally (see mvgam_trends and AR for details). mvgam can also fit models that do not include a time variable if there are no temporal dynamic structures included (i.e. trend_model = 'None' or trend_model = ZMVN()). data should also include any other variables to be included in the linear predictor of formula

unit

The unquoted name of the variable that represents the unit of analysis in data over which latent residuals should be correlated. This variable should be either a numeric or integer variable in the supplied data. Defaults to time to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. See examples below for further details and explanations

species

The unquoted name of the factor variable that indexes the different response units in data (usually 'species' in a JSDM). Defaults to series to be consistent with other mvgam models

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for RW(), AR() and GP() trend models. Defaults to FALSE

n_lv

integer the number of latent dynamic factors to use if use_lv == TRUE. Cannot be > n_series. Defaults arbitrarily to min(2, floor(n_series / 2))

trend_map

Optional data.frame specifying which series should depend on which latent trends. Useful for allowing multiple series to depend on the same latent trend process, but with different observation processes. If supplied, a latent factor model is set up by setting use_lv = TRUE and using the mapping to set up the shared trends. Needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each series in the data (names should perfectly match factor levels of the series variable in data). Note that if this is supplied, the intercept parameter in the process model will NOT be automatically suppressed. Not yet supported for models in wich the latent factors evolve in continuous time (CAR()). See examples for details

...

Not currently used

Value

either a data.frame containing the prior definitions (if any suitable priors can be altered by the user) or NULL, indicating that no priors in the model can be modified through the mvgam interface

Details

Users can supply a model formula, prior to fitting the model, so that default priors can be inspected and altered. To make alterations, change the contents of the prior column and supplying this data.frame to the mvgam function using the argument priors. If using Stan as the backend, users can also modify the parameter bounds by modifying the new_lowerbound and/or new_upperbound columns. This will be necessary if using restrictive distributions on some parameters, such as a Beta distribution for the trend sd parameters for example (Beta only has support on (0,1)), so the upperbound cannot be above 1. Another option is to make use of the prior modification functions in brms (i.e. prior) to change prior distributions and bounds (just use the name of the parameter that you'd like to change as the class argument; see examples below)

Note

Only the prior, new_lowerbound and/or new_upperbound columns of the output should be altered when defining the user-defined priors for the mvgam model. Use only if you are familiar with the underlying probabilistic programming language. There are no sanity checks done to ensure that the code is legal (i.e. to check that lower bounds are smaller than upper bounds, for example)

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate three integer-valued time series
library(mvgam)
dat <- sim_mvgam(trend_rel = 0.5)

# Get a model file that uses default mvgam priors for inspection (not always necessary,
# but this can be useful for testing whether your updated priors are written correctly)
mod_default <- mvgam(y ~ s(series, bs = 're') +
              s(season, bs = 'cc') - 1,
              family = nb(),
              data = dat$data_train,
              trend_model = AR(p = 2),
              run_model = FALSE)
#> Your model may benefit from using "noncentred = TRUE"

# Inspect the model file with default mvgam priors
code(mod_default)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // negative binomial overdispersion
#>   vector<lower=0>[n_series] phi_inv;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend AR2 terms
#>   vector<lower=-1, upper=1>[n_series] ar2;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 8] = b_raw[1 : 8];
#>   b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for s(season)...
#>   b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[9 : 11] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   ar2 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for overdispersion parameters
#>   phi_inv ~ student_t(3, 0, 0.1);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma);
#>   for (s in 1 : n_series) {
#>     trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s]
#>                              + ar2[s] * trend[1 : (n - 2), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     array[n_nonmissing] real flat_phis;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
#>     flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends)
#>                                  * append_row(b, 1.0)),
#>                              inv(flat_phis));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   matrix[n, n_series] phi_vec;
#>   vector[n_series] phi;
#>   phi = inv(phi_inv);
#>   for (s in 1 : n_series) {
#>     phi_vec[1 : n, s] = rep_vector(phi[s], n);
#>   }
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]);
#>   }
#> }
#> 
#> 

# Look at which priors can be updated in mvgam
test_priors <- get_mvgam_priors(y ~ s(series, bs = 're') +
                              s(season, bs = 'cc') - 1,
                              family = nb(),
                              data = dat$data_train,
                              trend_model = AR(p = 2))
test_priors
#>                                param_name param_length
#> 1           vector<lower=0>[n_sp] lambda;            2
#> 2                       vector[1] mu_raw;            1
#> 3           vector<lower=0>[1] sigma_raw;            1
#> 4 vector<lower=-1,upper=1>[n_series] ar1;            3
#> 5 vector<lower=-1,upper=1>[n_series] ar2;            3
#> 6        vector<lower=0>[n_series] sigma;            3
#> 7      vector<lower=0>[n_series] phi_inv;            3
#>                    param_info                             prior
#> 1 s(season) smooth parameters           lambda ~ normal(5, 30);
#> 2          s(series) pop mean            mu_raw ~ std_normal();
#> 3            s(series) pop sd sigma_raw ~ student_t(3, 0, 2.5);
#> 4       trend AR1 coefficient               ar1 ~ std_normal();
#> 5       trend AR2 coefficient               ar2 ~ std_normal();
#> 6                    trend sd     sigma ~ student_t(3, 0, 2.5);
#> 7   inverse of NB dispsersion   phi_inv ~ student_t(3, 0, 0.1);
#>                  example_change new_lowerbound new_upperbound
#> 1   lambda ~ exponential(0.78);             NA             NA
#> 2  mu_raw ~ normal(0.08, 0.87);             NA             NA
#> 3 sigma_raw ~ exponential(0.5);             NA             NA
#> 4      ar1 ~ normal(0.17, 0.9);             NA             NA
#> 5    ar2 ~ normal(-0.21, 0.98);             NA             NA
#> 6    sigma ~ exponential(0.11);             NA             NA
#> 7 phi_inv ~ normal(-0.98, 0.7);             NA             NA

# Make a few changes; first, change the population mean for the series-level
# random intercepts
test_priors$prior[2] <- 'mu_raw ~ normal(0.2, 0.5);'

# Now use stronger regularisation for the series-level AR2 coefficients
test_priors$prior[5] <- 'ar2 ~ normal(0, 0.25);'

# Check that the changes are made to the model file without any warnings by
# setting 'run_model = FALSE'
mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
            family = nb(),
            data = dat$data_train,
            trend_model = AR(p = 2),
            priors = test_priors,
            run_model = FALSE)
#> Your model may benefit from using "noncentred = TRUE"
code(mod)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // negative binomial overdispersion
#>   vector<lower=0>[n_series] phi_inv;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend AR2 terms
#>   vector<lower=-1, upper=1>[n_series] ar2;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 8] = b_raw[1 : 8];
#>   b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ normal(0.2, 0.5);
#>   
#>   // prior for s(season)...
#>   b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[9 : 11] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   ar2 ~ normal(0, 0.25);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for overdispersion parameters
#>   phi_inv ~ student_t(3, 0, 0.1);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma);
#>   for (s in 1 : n_series) {
#>     trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s]
#>                              + ar2[s] * trend[1 : (n - 2), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     array[n_nonmissing] real flat_phis;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
#>     flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends)
#>                                  * append_row(b, 1.0)),
#>                              inv(flat_phis));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   matrix[n, n_series] phi_vec;
#>   vector[n_series] phi;
#>   phi = inv(phi_inv);
#>   for (s in 1 : n_series) {
#>     phi_vec[1 : n, s] = rep_vector(phi[s], n);
#>   }
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]);
#>   }
#> }
#> 
#> 

# No warnings, the model is ready for fitting now in the usual way with the addition
# of the 'priors' argument

# The same can be done using 'brms' functions; here we will also change the ar1 prior
# and put some bounds on the ar coefficients to enforce stationarity; we set the
# prior using the 'class' argument in all brms prior functions
brmsprior <- c(prior(normal(0.2, 0.5), class = mu_raw),
              prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1),
              prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1))
brmsprior
#>             prior  class coef group resp dpar nlpar   lb   ub source
#>  normal(0.2, 0.5) mu_raw                            <NA> <NA>   user
#>   normal(0, 0.25)    ar1                              -1    1   user
#>   normal(0, 0.25)    ar2                              -1    1   user

mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
          family = nb(),
          data = dat$data_train,
          trend_model = AR(p = 2),
          priors = brmsprior,
          run_model = FALSE)
#> Your model may benefit from using "noncentred = TRUE"
code(mod)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // negative binomial overdispersion
#>   vector<lower=0>[n_series] phi_inv;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend AR2 terms
#>   vector<lower=-1, upper=1>[n_series] ar2;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 8] = b_raw[1 : 8];
#>   b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ normal(0.2, 0.5);
#>   
#>   // prior for s(season)...
#>   b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[9 : 11] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ normal(0, 0.25);
#>   ar2 ~ normal(0, 0.25);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for overdispersion parameters
#>   phi_inv ~ student_t(3, 0, 0.1);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma);
#>   for (s in 1 : n_series) {
#>     trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s]
#>                              + ar2[s] * trend[1 : (n - 2), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     array[n_nonmissing] real flat_phis;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
#>     flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends)
#>                                  * append_row(b, 1.0)),
#>                              inv(flat_phis));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   matrix[n, n_series] phi_vec;
#>   vector[n_series] phi;
#>   phi = inv(phi_inv);
#>   for (s in 1 : n_series) {
#>     phi_vec[1 : n, s] = rep_vector(phi[s], n);
#>   }
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]);
#>   }
#> }
#> 
#> 

# Look at what is returned when an incorrect spelling is used
test_priors$prior[5] <- 'ar2_bananas ~ normal(0, 0.25);'
mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
            family = nb(),
            data = dat$data_train,
            trend_model = AR(p = 2),
            priors = test_priors,
            run_model = FALSE)
#> Warning: no match found in model_file for parameter: ar2_bananas
#> Your model may benefit from using "noncentred = TRUE"
code(mod)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // negative binomial overdispersion
#>   vector<lower=0>[n_series] phi_inv;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend AR2 terms
#>   vector<lower=-1, upper=1>[n_series] ar2;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 8] = b_raw[1 : 8];
#>   b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ normal(0.2, 0.5);
#>   
#>   // prior for s(season)...
#>   b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[9 : 11] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   ar2 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for overdispersion parameters
#>   phi_inv ~ student_t(3, 0, 0.1);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma);
#>   for (s in 1 : n_series) {
#>     trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s]
#>                              + ar2[s] * trend[1 : (n - 2), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     array[n_nonmissing] real flat_phis;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
#>     flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends)
#>                                  * append_row(b, 1.0)),
#>                              inv(flat_phis));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   matrix[n, n_series] phi_vec;
#>   vector[n_series] phi;
#>   phi = inv(phi_inv);
#>   for (s in 1 : n_series) {
#>     phi_vec[1 : n, s] = rep_vector(phi[s], n);
#>   }
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]);
#>   }
#> }
#> 
#> 

# Example of changing parametric (fixed effect) priors
simdat <- sim_mvgam()

# Add a fake covariate
simdat$data_train$cov <- rnorm(NROW(simdat$data_train))

priors <- get_mvgam_priors(y ~ cov + s(season),
                          data = simdat$data_train,
                          family = poisson(),
                          trend_model = AR())

# Change priors for the intercept and fake covariate effects
priors$prior[1] <- '(Intercept) ~ normal(0, 1);'
priors$prior[2] <- 'cov ~ normal(0, 0.1);'

mod2 <- mvgam(y ~ cov + s(season),
             data = simdat$data_train,
             trend_model = AR(),
             family = poisson(),
             priors = priors,
             run_model = FALSE)
#> Your model may benefit from using "noncentred = TRUE"
code(mod2)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[9, 18] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ normal(0, 1);
#>   
#>   // prior for cov...
#>   b_raw[2] ~ normal(0, 0.1);
#>   
#>   // prior for s(season)...
#>   b_raw[3 : 11] ~ multi_normal_prec(zero[3 : 11],
#>                                     S1[1 : 9, 1 : 9] * lambda[1]
#>                                     + S1[1 : 9, 10 : 18] * lambda[2]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# Likewise using 'brms' utilities (note that you can use
# Intercept rather than `(Intercept)`) to change priors on the intercept
brmsprior <- c(prior(normal(0.2, 0.5), class = cov),
              prior(normal(0, 0.25), class = Intercept))
brmsprior
#>             prior     class coef group resp dpar nlpar   lb   ub source
#>  normal(0.2, 0.5)       cov                            <NA> <NA>   user
#>   normal(0, 0.25) Intercept                            <NA> <NA>   user

mod2 <- mvgam(y ~ cov + s(season),
             data = simdat$data_train,
             trend_model = AR(),
             family = poisson(),
             priors = brmsprior,
             run_model = FALSE)
#> Your model may benefit from using "noncentred = TRUE"
code(mod2)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[9, 18] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ normal(0, 0.25);
#>   
#>   // prior for cov...
#>   b_raw[2] ~ normal(0.2, 0.5);
#>   
#>   // prior for s(season)...
#>   b_raw[3 : 11] ~ multi_normal_prec(zero[3 : 11],
#>                                     S1[1 : 9, 1 : 9] * lambda[1]
#>                                     + S1[1 : 9, 10 : 18] * lambda[2]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# The "class = 'b'" shortcut can be used to put the same prior on all
# 'fixed' effect coefficients (apart from any intercepts)
set.seed(0)
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model
dat$time <- 1:NROW(dat)
mod <- mvgam(y ~ x0 + x1 + s(x2) + s(x3),
            priors = prior(normal(0, 0.75), class = 'b'),
            data = dat,
            family = gaussian(),
            run_model = FALSE)
code(mod)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[9, 18] S1; // mgcv smooth penalty matrix S1
#>   matrix[9, 18] S2; // mgcv smooth penalty matrix S2
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   vector[n_nonmissing] flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // gaussian observation error
#>   vector<lower=0>[n_series] sigma_obs;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 7.4, 3.7);
#>   
#>   // prior for x0...
#>   b_raw[2] ~ normal(0, 0.75);
#>   
#>   // prior for x1...
#>   b_raw[3] ~ normal(0, 0.75);
#>   
#>   // prior for s(x2)...
#>   b_raw[4 : 12] ~ multi_normal_prec(zero[4 : 12],
#>                                     S1[1 : 9, 1 : 9] * lambda[1]
#>                                     + S1[1 : 9, 10 : 18] * lambda[2]);
#>   
#>   // prior for s(x3)...
#>   b_raw[13 : 21] ~ multi_normal_prec(zero[13 : 21],
#>                                      S2[1 : 9, 1 : 9] * lambda[3]
#>                                      + S2[1 : 9, 10 : 18] * lambda[4]);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for observation error parameters
#>   sigma_obs ~ student_t(3, 0, 3.7);
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_sigma_obs;
#>     flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind];
#>     flat_ys ~ normal_id_glm(flat_xs, 0.0, b, flat_sigma_obs);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] sigma_obs_vec;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] real ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     sigma_obs_vec[1 : n, s] = rep_vector(sigma_obs[s], n);
#>   }
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = normal_rng(mus[1 : n, s], sigma_obs_vec[1 : n, s]);
#>   }
#> }
#> 
#> 
# }
================================================ FILE: docs/reference/gratia_mvgam_enhancements.html ================================================ Enhance post-processing of mvgam models using gratia functionality — gratia_mvgam_enhancements • mvgam Skip to contents

These evaluation and plotting functions exist to allow some popular gratia methods to work with mvgam or jsdgam models

Usage

drawDotmvgam(
  object,
  trend_effects = FALSE,
  data = NULL,
  select = NULL,
  parametric = FALSE,
  terms = NULL,
  residuals = FALSE,
  scales = c("free", "fixed"),
  ci_level = 0.95,
  n = 100,
  n_3d = 16,
  n_4d = 4,
  unconditional = FALSE,
  overall_uncertainty = TRUE,
  constant = NULL,
  fun = NULL,
  dist = 0.1,
  rug = TRUE,
  contour = TRUE,
  grouped_by = FALSE,
  ci_alpha = 0.2,
  ci_col = "black",
  smooth_col = "black",
  resid_col = "steelblue3",
  contour_col = "black",
  n_contour = NULL,
  partial_match = FALSE,
  discrete_colour = NULL,
  discrete_fill = NULL,
  continuous_colour = NULL,
  continuous_fill = NULL,
  position = "identity",
  angle = NULL,
  ncol = NULL,
  nrow = NULL,
  guides = "keep",
  widths = NULL,
  heights = NULL,
  crs = NULL,
  default_crs = NULL,
  lims_method = "cross",
  wrap = TRUE,
  envir = environment(formula(object)),
  ...
)

eval_smoothDothilbertDotsmooth(
  smooth,
  model,
  n = 100,
  n_3d = NULL,
  n_4d = NULL,
  data = NULL,
  unconditional = FALSE,
  overall_uncertainty = TRUE,
  dist = NULL,
  ...
)

eval_smoothDotmodDotsmooth(
  smooth,
  model,
  n = 100,
  n_3d = NULL,
  n_4d = NULL,
  data = NULL,
  unconditional = FALSE,
  overall_uncertainty = TRUE,
  dist = NULL,
  ...
)

eval_smoothDotmoiDotsmooth(
  smooth,
  model,
  n = 100,
  n_3d = NULL,
  n_4d = NULL,
  data = NULL,
  unconditional = FALSE,
  overall_uncertainty = TRUE,
  dist = NULL,
  ...
)

Arguments

object

a fitted mvgam, the result of a call to mvgam().

trend_effects

logical specifying whether smooth terms from the trend_formula should be drawn. If FALSE, only terms from the observation formula are drawn. If TRUE, only terms from the trend_formula are drawn.

data

a data frame of covariate values at which to evaluate the model's smooth functions.

select

character, logical, or numeric; which smooths to plot. If NULL, the default, then all model smooths are drawn. Character select matches the labels for smooths as shown for example in the output from summary(object). Logical select operates as per numeric select in the order that smooths are stored.

parametric

logical; plot parametric terms also? Note that select is used for selecting which smooths to plot. The terms argument is used to select which parametric effects are plotted. The default, as with mgcv::plot.gam(), is to not draw parametric effects.

terms

character; which model parametric terms should be drawn? The Default of NULL will plot all parametric terms that can be drawn.

residuals

currently ignored for mvgam models.

scales

character; should all univariate smooths be plotted with the same y-axis scale? If scales = "free", the default, each univariate smooth has its own y-axis scale. If scales = "fixed", a common y axis scale is used for all univariate smooths.

Currently does not affect the y-axis scale of plots of the parametric terms.

ci_level

numeric between 0 and 1; the coverage of credible interval.

n

numeric; the number of points over the range of the covariate at which to evaluate the smooth.

n_3d, n_4d

numeric; the number of points over the range of last covariate in a 3D or 4D smooth. The default is NULL which achieves the standard behaviour of using n points over the range of all covariate, resulting in n^d evaluation points, where d is the dimension of the smooth. For d > 2 this can result in very many evaluation points and slow performance. For smooths of d > 4, the value of n_4d will be used for all dimensions > 4, unless this is NULL, in which case the default behaviour (using n for all dimensions) will be observed.

unconditional

ignored for mvgam models as all appropriate uncertainties are already included in the posterior estimates.

overall_uncertainty

ignored for mvgam models as all appropriate uncertainties are already included in the posterior estimates.

constant

numeric; a constant to add to the estimated values of the smooth. constant, if supplied, will be added to the estimated value before the confidence band is computed.

fun

function; a function that will be applied to the estimated values and confidence interval before plotting. Can be a function or the name of a function. Function fun will be applied after adding any constant, if provided.

dist

numeric; if greater than 0, this is used to determine when a location is too far from data to be plotted when plotting 2-D smooths. The data are scaled into the unit square before deciding what to exclude, and dist is a distance within the unit square. See mgcv::exclude.too.far() for further details.

rug

logical; draw a rug plot at the bottom of each plot for 1-D smooths or plot locations of data for higher dimensions.

contour

logical; should contours be draw on the plot using ggplot2::geom_contour().

grouped_by

logical; should factor by smooths be drawn as one panel per level of the factor (FALSE, the default), or should the individual smooths be combined into a single panel containing all levels (TRUE)?

ci_alpha

numeric; alpha transparency for confidence or simultaneous interval.

ci_col

colour specification for the confidence/credible intervals band. Affects the fill of the interval.

smooth_col

colour specification for the smooth line.

resid_col

colour specification for residual points. Ignored.

contour_col

colour specification for contour lines.

n_contour

numeric; the number of contour bins. Will result in n_contour - 1 contour lines being drawn. See ggplot2::geom_contour().

partial_match

logical; should smooths be selected by partial matches with select? If TRUE, select can only be a single string to match against.

discrete_colour

a suitable colour scale to be used when plotting discrete variables.

discrete_fill

a suitable fill scale to be used when plotting discrete variables.

continuous_colour

a suitable colour scale to be used when plotting continuous variables.

continuous_fill

a suitable fill scale to be used when plotting continuous variables.

position

Position adjustment, either as a string, or the result of a call to a position adjustment function.

angle

numeric; the angle at which the x axis tick labels are to be drawn passed to the angle argument of ggplot2::guide_axis().

ncol, nrow

numeric; the numbers of rows and columns over which to spread the plots

guides

character; one of "keep" (the default), "collect", or "auto". Passed to patchwork::plot_layout()

widths, heights

The relative widths and heights of each column and row in the grid. Will get repeated to match the dimensions of the grid. If there is more than 1 plot and widths = NULL, the value of widths will be set internally to widths = 1 to accommodate plots of smooths that use a fixed aspect ratio.

crs

the coordinate reference system (CRS) to use for the plot. All data will be projected into this CRS. See ggplot2::coord_sf() for details.

default_crs

the coordinate reference system (CRS) to use for the non-sf layers in the plot. If left at the default NULL, the CRS used is 4326 (WGS84), which is appropriate for spline-on-the-sphere smooths, which are parameterized in terms of latitude and longitude as coordinates. See ggplot2::coord_sf() for more details.

lims_method

character; affects how the axis limits are determined. See ggplot2::coord_sf(). Be careful; in testing of some examples, changing this to "orthogonal" for example with the chlorophyll-a example from Simon Wood's GAM book quickly used up all the RAM in my test system and the OS killed R. This could be incorrect usage on my part; right now the grid of points at which SOS smooths are evaluated (if not supplied by the user) can produce invalid coordinates for the corners of tiles as the grid is generated for tile centres without respect to the spacing of those tiles.

wrap

logical; wrap plots as a patchwork? If FALSE, a list of ggplot objects is returned, 1 per term plotted.

envir

an environment to look up the data within.

...

additional arguments passed to other methods.

smooth

a smooth object of class "gp.smooth" (returned from a model using either the dynamic() function or the gp() function) or of class "moi.smooth" or "mod.smooth" (returned from a model using the 'moi' or 'mod' basis).

model

a fitted mgcv model of clas gam or bam.

Details

These methods allow mvgam models to be Enhanced if users have the gratia package installed, making available the popular draw() function to plot partial effects of mvgam smooth functions using ggplot2::ggplot() utilities

Author

Nicholas J Clark

Examples

# \donttest{
# Fit a simple GAM and draw partial effects of smooths using gratia
set.seed(0)
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model
mod <- mvgam(y ~ s(x1, bs = 'moi') +
              te(x0, x2),
             data = dat,
             family = gaussian(),
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c4cff7578.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

if(require("gratia")){
 gratia::draw(mod)
}
#> Loading required package: gratia
#> Warning: package ‘gratia’ was built under R version 4.3.3
#> 
#> Attaching package: ‘gratia’
#> The following object is masked from ‘package:mvgam’:
#> 
#>     add_residuals


# }
================================================ FILE: docs/reference/hindcast.mvgam.html ================================================ Extract hindcasts for a fitted mvgam object — hindcast.mvgam • mvgam Skip to contents

Extract hindcasts for a fitted mvgam object

Usage

hindcast(object, ...)

# S3 method for mvgam
hindcast(object, type = "response", ...)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

...

Ignored

type

When this has the value link (default) the linear predictor is calculated on the link scale. If expected is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When response is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When variance is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When type = "terms", each component of the linear predictor is returned separately in the form of a list (possibly with standard errors, if summary = TRUE): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type latent_N will return the estimated latent abundances from an N-mixture distribution, while type detection will return the estimated detection probability from an N-mixture distribution

Value

An object of class mvgam_forecast containing hindcast distributions. See mvgam_forecast-class for details.

Details

Posterior retrodictions are drawn from the fitted mvgam and organized into a convenient format

See also

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 3, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2,
             silent = 2)

# Hindcasts on response scale
hc <- hindcast(mod)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc")
#>   .. ..- attr(*, ".Environment")=<environment: 0x000002e05bda8280> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr [1:3] "series_1" "series_2" "series_3"
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 10 3 2 1 0 0 1 2 1 2 ...
#>   ..$ series_2: int [1:75] 0 0 0 2 0 1 1 0 0 1 ...
#>   ..$ series_3: int [1:75] 6 0 2 0 0 0 0 2 0 0 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 9 5 3 10 9 11 7 5 14 9 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] 4 1 1 3 3 4 7 0 5 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 4 3 1 5 3 1 2 1 2 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(hc, series = 1)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 2)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 3)
#> No non-missing values in test_observations; cannot calculate forecast score


# Hindcasts as expectations
hc <- hindcast(mod, type = 'expected')
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc")
#>   .. ..- attr(*, ".Environment")=<environment: 0x000002e05bda8280> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "expected"
#>  $ series_names      : chr [1:3] "series_1" "series_2" "series_3"
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 10 3 2 1 0 0 1 2 1 2 ...
#>   ..$ series_2: int [1:75] 0 0 0 2 0 1 1 0 0 1 ...
#>   ..$ series_3: int [1:75] 6 0 2 0 0 0 0 2 0 0 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 6.61 5.21 3.68 7.64 6.25 ...
#>   ..$ series_2: num [1:1000, 1:75] 1.93 1.59 3.11 2.15 3.29 ...
#>   ..$ series_3: num [1:1000, 1:75] 6.29 2.44 3.93 4.99 2.72 ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(hc, series = 1)

plot(hc, series = 2)

plot(hc, series = 3)


# Estimated latent trends
hc <- hindcast(mod, type = 'trend')
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc")
#>   .. ..- attr(*, ".Environment")=<environment: 0x000002e05bda8280> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "trend"
#>  $ series_names      : chr [1:3] "series_1" "series_2" "series_3"
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 10 3 2 1 0 0 1 2 1 2 ...
#>   ..$ series_2: int [1:75] 0 0 0 2 0 1 1 0 0 1 ...
#>   ..$ series_3: int [1:75] 6 0 2 0 0 0 0 2 0 0 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 0.641 0.5 0.232 0.923 0.622 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "trend[1,1]" "trend[2,1]" "trend[3,1]" "trend[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] -0.5912 -0.6872 0.0625 -0.3461 -0.0187 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "trend[1,2]" "trend[2,2]" "trend[3,2]" "trend[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 0.591 -0.26 0.298 0.497 -0.212 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "trend[1,3]" "trend[2,3]" "trend[3,3]" "trend[4,3]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(hc, series = 1)

plot(hc, series = 2)

plot(hc, series = 3)

# }
================================================ FILE: docs/reference/how_to_cite.mvgam.html ================================================ Generate a methods description for mvgam models — how_to_cite.mvgam • mvgam Skip to contents

Create a brief but fully referenced methods description, along with a useful list of references, for fitted mvgam and jsdgam models

Usage

how_to_cite(object, ...)

# S3 method for class 'mvgam'
how_to_cite(object, ...)

Arguments

object

list object of class mvgam resulting from a call to mvgam() or jsdgam()

...

ignored

Value

An object of class how_to_cite containing a text description of the methods as well as lists of both primary and additional references

Details

This function uses the model's structure to come up with a very basic but hopefully useful methods description that can help users to appropriately acknowledge the hard work of developers and champion open science. Please do not consider the text returned by this function to be a completely adequate methods section, it is only meant to get you started.

See also

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and a VAR(1) dynamic process
set.seed(0)
simdat <- sim_mvgam(seasonality = 'hierarchical',
                    trend_model = VAR(cor = TRUE),
                    family = gaussian())

# Fit an appropriate model
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
              data = simdat$data_train,
              family = gaussian(),
              trend_model = VAR(cor = TRUE),
              chains = 2,
              silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c5856225c.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
how_to_cite(mod1)
#> Methods text skeleton
#> We used the R package mvgam (version 1.1.4; Clark & Wells, 2023) to construct, fit and interrogate the model. mvgam fits Bayesian State-Space models that can include flexible predictor effects in both the process and observation components by incorporating functionalities from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan, 2023) packages. To encourage stability and prevent forecast variance from increasing indefinitely, we enforced stationarity of the Vector Autoregressive process following methods described by Heaps (2023). The mvgam-constructed model and observed data were passed to the probabilistic programming environment Stan (version 2.34.1; Carpenter et al. 2017, Stan Development Team 2025), specifically through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 2 Hamiltonian Monte Carlo chains for 500 warmup iterations and 500 sampling iterations for joint posterior estimation. Rank normalized split Rhat (Vehtari et al. 2021) and effective sample sizes were used to monitor convergence.
#> 
#> Primary references
#> [1] "Clark, NJ and Wells K (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974"                                                  
#> [2] "Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01"                                                                                                  
#> [3] "Wood, SN (2017). Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC."                                                                                                                                                  
#> [4] "Wang W and Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517. doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020."                                                                  
#> [5] "Heaps, SE (2023). Enforcing stationarity through the prior in vector autoregressions. Journal of Computational and Graphical Statistics 32, 74-83."                                                                                                         
#> [6] "Carpenter, B, Gelman, A, Hoffman, MD, Lee, D, Goodrich, B, Betancourt, M, Brubaker, M, Guo, J, Li, P and Riddell, A (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76."                                                
#> [7] "Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org."                                                                                                    
#> [8] "Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2) 667-718. https://doi.org/10.1214/20-BA1221."
#> 
#> Other useful references
#> [1] "Arel-Bundock, V, Greifer, N, and Heiss, A (2024). How to interpret statistical models using marginaleffects for R and Python. Journal of Statistical Software, 111(9), 1-32. https://doi.org/10.18637/jss.v111.i09"                               
#> [2] "Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). Visualization in Bayesian workflow. Journal of the Royal Statatistical Society A, 182, 389-402. doi:10.1111/rssa.12378."                                                        
#> [3] "Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4."                                              
#> [4] "Burkner, PC, Gabry, J, and Vehtari, A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499-2523. https://doi.org/10.1080/00949655.2020.1783262"

# For a GP example, simulate data using the mgcv package
dat <- mgcv::gamSim(1, n = 30, scale = 2)
#> Gu & Wahba 4 term additive model

# Fit a model that uses an approximate GP from the brms package
mod2 <- mvgam(y ~ gp(x2, k = 12),
              data = dat,
              family = gaussian(),
              chains = 2,
              silent = 2)
#> Warning: gp effects in mvgam cannot yet handle autogrouping
#> resetting all instances of 'gr = TRUE' to 'gr = FALSE'
#> This warning is displayed once per session.
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3596eb6.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
how_to_cite(mod2)
#> Methods text skeleton
#> We used the R package mvgam (version 1.1.4; Clark & Wells, 2023) to construct, fit and interrogate the model. mvgam fits Bayesian State-Space models that can include flexible predictor effects in both the process and observation components by incorporating functionalities from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan, 2023) packages. Gaussian Process functional effects were estimated using a low-rank Hilbert space approximation following methods described by Riutort-Mayol et al. (2023). The mvgam-constructed model and observed data were passed to the probabilistic programming environment Stan (version 2.34.1; Carpenter et al. 2017, Stan Development Team 2025), specifically through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 2 Hamiltonian Monte Carlo chains for 500 warmup iterations and 500 sampling iterations for joint posterior estimation. Rank normalized split Rhat (Vehtari et al. 2021) and effective sample sizes were used to monitor convergence.
#> 
#> Primary references
#> [1] "Clark, NJ and Wells K (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974"                                                  
#> [2] "Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01"                                                                                                  
#> [3] "Wood, SN (2017). Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC."                                                                                                                                                  
#> [4] "Wang W and Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517. doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020."                                                                  
#> [5] "Riutort-Mayol, G, Burkner, PC, Andersen, MR, Solin, A and Vehtari, A (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing 33, 1. https://doi.org/10.1007/s11222-022-10167-2"     
#> [6] "Carpenter, B, Gelman, A, Hoffman, MD, Lee, D, Goodrich, B, Betancourt, M, Brubaker, M, Guo, J, Li, P and Riddell, A (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76."                                                
#> [7] "Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org."                                                                                                    
#> [8] "Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2) 667-718. https://doi.org/10.1214/20-BA1221."
#> 
#> Other useful references
#> [1] "Arel-Bundock, V, Greifer, N, and Heiss, A (2024). How to interpret statistical models using marginaleffects for R and Python. Journal of Statistical Software, 111(9), 1-32. https://doi.org/10.18637/jss.v111.i09"                               
#> [2] "Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). Visualization in Bayesian workflow. Journal of the Royal Statatistical Society A, 182, 389-402. doi:10.1111/rssa.12378."                                                        
#> [3] "Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4."                                              
#> [4] "Burkner, PC, Gabry, J, and Vehtari, A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499-2523. https://doi.org/10.1080/00949655.2020.1783262"
# }
================================================ FILE: docs/reference/index-mvgam.html ================================================ Index mvgam objects — index-mvgam • mvgam Skip to contents

Index mvgam objects

Usage

# S3 method for class 'mvgam'
variables(x, ...)

Arguments

x

list object returned from mvgam. See mvgam()

...

Arguments passed to individual methods (if applicable).

Value

a list object of the variables that can be extracted, along with their aliases

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1')
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             trend_model = AR(),
             data = simdat$data_train,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c7d5b7631.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
variables(mod)
#> $observation_pars
#> NULL
#> 
#> $observation_linpreds
#>    orig_name alias
#> 1   mus[1,1]    NA
#> 2   mus[2,1]    NA
#> 3   mus[3,1]    NA
#> 4   mus[4,1]    NA
#> 5   mus[5,1]    NA
#> 6   mus[6,1]    NA
#> 7   mus[7,1]    NA
#> 8   mus[8,1]    NA
#> 9   mus[9,1]    NA
#> 10 mus[10,1]    NA
#> 11 mus[11,1]    NA
#> 12 mus[12,1]    NA
#> 13 mus[13,1]    NA
#> 14 mus[14,1]    NA
#> 15 mus[15,1]    NA
#> 16 mus[16,1]    NA
#> 17 mus[17,1]    NA
#> 18 mus[18,1]    NA
#> 19 mus[19,1]    NA
#> 20 mus[20,1]    NA
#> 21 mus[21,1]    NA
#> 22 mus[22,1]    NA
#> 23 mus[23,1]    NA
#> 24 mus[24,1]    NA
#> 25 mus[25,1]    NA
#> 26 mus[26,1]    NA
#> 27 mus[27,1]    NA
#> 28 mus[28,1]    NA
#> 29 mus[29,1]    NA
#> 30 mus[30,1]    NA
#> 31 mus[31,1]    NA
#> 32 mus[32,1]    NA
#> 33 mus[33,1]    NA
#> 34 mus[34,1]    NA
#> 35 mus[35,1]    NA
#> 36 mus[36,1]    NA
#> 37 mus[37,1]    NA
#> 38 mus[38,1]    NA
#> 39 mus[39,1]    NA
#> 40 mus[40,1]    NA
#> 41 mus[41,1]    NA
#> 42 mus[42,1]    NA
#> 43 mus[43,1]    NA
#> 44 mus[44,1]    NA
#> 45 mus[45,1]    NA
#> 46 mus[46,1]    NA
#> 47 mus[47,1]    NA
#> 48 mus[48,1]    NA
#> 49 mus[49,1]    NA
#> 50 mus[50,1]    NA
#> 51 mus[51,1]    NA
#> 52 mus[52,1]    NA
#> 53 mus[53,1]    NA
#> 54 mus[54,1]    NA
#> 55 mus[55,1]    NA
#> 56 mus[56,1]    NA
#> 57 mus[57,1]    NA
#> 58 mus[58,1]    NA
#> 59 mus[59,1]    NA
#> 60 mus[60,1]    NA
#> 61 mus[61,1]    NA
#> 62 mus[62,1]    NA
#> 63 mus[63,1]    NA
#> 64 mus[64,1]    NA
#> 65 mus[65,1]    NA
#> 66 mus[66,1]    NA
#> 67 mus[67,1]    NA
#> 68 mus[68,1]    NA
#> 69 mus[69,1]    NA
#> 70 mus[70,1]    NA
#> 71 mus[71,1]    NA
#> 72 mus[72,1]    NA
#> 73 mus[73,1]    NA
#> 74 mus[74,1]    NA
#> 75 mus[75,1]    NA
#> 
#> $observation_betas
#>   orig_name       alias
#> 1      b[1] (Intercept)
#> 2      b[2] s(season).1
#> 3      b[3] s(season).2
#> 4      b[4] s(season).3
#> 5      b[5] s(season).4
#> 
#> $observation_smoothpars
#>   orig_name         alias
#> 1    rho[1] s(season)_rho
#> 
#> $observation_re_params
#> NULL
#> 
#> $posterior_preds
#>      orig_name alias
#> 1   ypred[1,1]    NA
#> 2   ypred[2,1]    NA
#> 3   ypred[3,1]    NA
#> 4   ypred[4,1]    NA
#> 5   ypred[5,1]    NA
#> 6   ypred[6,1]    NA
#> 7   ypred[7,1]    NA
#> 8   ypred[8,1]    NA
#> 9   ypred[9,1]    NA
#> 10 ypred[10,1]    NA
#> 11 ypred[11,1]    NA
#> 12 ypred[12,1]    NA
#> 13 ypred[13,1]    NA
#> 14 ypred[14,1]    NA
#> 15 ypred[15,1]    NA
#> 16 ypred[16,1]    NA
#> 17 ypred[17,1]    NA
#> 18 ypred[18,1]    NA
#> 19 ypred[19,1]    NA
#> 20 ypred[20,1]    NA
#> 21 ypred[21,1]    NA
#> 22 ypred[22,1]    NA
#> 23 ypred[23,1]    NA
#> 24 ypred[24,1]    NA
#> 25 ypred[25,1]    NA
#> 26 ypred[26,1]    NA
#> 27 ypred[27,1]    NA
#> 28 ypred[28,1]    NA
#> 29 ypred[29,1]    NA
#> 30 ypred[30,1]    NA
#> 31 ypred[31,1]    NA
#> 32 ypred[32,1]    NA
#> 33 ypred[33,1]    NA
#> 34 ypred[34,1]    NA
#> 35 ypred[35,1]    NA
#> 36 ypred[36,1]    NA
#> 37 ypred[37,1]    NA
#> 38 ypred[38,1]    NA
#> 39 ypred[39,1]    NA
#> 40 ypred[40,1]    NA
#> 41 ypred[41,1]    NA
#> 42 ypred[42,1]    NA
#> 43 ypred[43,1]    NA
#> 44 ypred[44,1]    NA
#> 45 ypred[45,1]    NA
#> 46 ypred[46,1]    NA
#> 47 ypred[47,1]    NA
#> 48 ypred[48,1]    NA
#> 49 ypred[49,1]    NA
#> 50 ypred[50,1]    NA
#> 51 ypred[51,1]    NA
#> 52 ypred[52,1]    NA
#> 53 ypred[53,1]    NA
#> 54 ypred[54,1]    NA
#> 55 ypred[55,1]    NA
#> 56 ypred[56,1]    NA
#> 57 ypred[57,1]    NA
#> 58 ypred[58,1]    NA
#> 59 ypred[59,1]    NA
#> 60 ypred[60,1]    NA
#> 61 ypred[61,1]    NA
#> 62 ypred[62,1]    NA
#> 63 ypred[63,1]    NA
#> 64 ypred[64,1]    NA
#> 65 ypred[65,1]    NA
#> 66 ypred[66,1]    NA
#> 67 ypred[67,1]    NA
#> 68 ypred[68,1]    NA
#> 69 ypred[69,1]    NA
#> 70 ypred[70,1]    NA
#> 71 ypred[71,1]    NA
#> 72 ypred[72,1]    NA
#> 73 ypred[73,1]    NA
#> 74 ypred[74,1]    NA
#> 75 ypred[75,1]    NA
#> 
#> $trend_pars
#>   orig_name alias
#> 1    ar1[1]    NA
#> 2  sigma[1]    NA
#> 
#> $trend_linpreds
#> NULL
#> 
#> $trend_betas
#> NULL
#> 
#> $trend_smoothpars
#> NULL
#> 
#> $trend_re_params
#> NULL
#> 
#> $trends
#>      orig_name alias
#> 1   trend[1,1]    NA
#> 2   trend[2,1]    NA
#> 3   trend[3,1]    NA
#> 4   trend[4,1]    NA
#> 5   trend[5,1]    NA
#> 6   trend[6,1]    NA
#> 7   trend[7,1]    NA
#> 8   trend[8,1]    NA
#> 9   trend[9,1]    NA
#> 10 trend[10,1]    NA
#> 11 trend[11,1]    NA
#> 12 trend[12,1]    NA
#> 13 trend[13,1]    NA
#> 14 trend[14,1]    NA
#> 15 trend[15,1]    NA
#> 16 trend[16,1]    NA
#> 17 trend[17,1]    NA
#> 18 trend[18,1]    NA
#> 19 trend[19,1]    NA
#> 20 trend[20,1]    NA
#> 21 trend[21,1]    NA
#> 22 trend[22,1]    NA
#> 23 trend[23,1]    NA
#> 24 trend[24,1]    NA
#> 25 trend[25,1]    NA
#> 26 trend[26,1]    NA
#> 27 trend[27,1]    NA
#> 28 trend[28,1]    NA
#> 29 trend[29,1]    NA
#> 30 trend[30,1]    NA
#> 31 trend[31,1]    NA
#> 32 trend[32,1]    NA
#> 33 trend[33,1]    NA
#> 34 trend[34,1]    NA
#> 35 trend[35,1]    NA
#> 36 trend[36,1]    NA
#> 37 trend[37,1]    NA
#> 38 trend[38,1]    NA
#> 39 trend[39,1]    NA
#> 40 trend[40,1]    NA
#> 41 trend[41,1]    NA
#> 42 trend[42,1]    NA
#> 43 trend[43,1]    NA
#> 44 trend[44,1]    NA
#> 45 trend[45,1]    NA
#> 46 trend[46,1]    NA
#> 47 trend[47,1]    NA
#> 48 trend[48,1]    NA
#> 49 trend[49,1]    NA
#> 50 trend[50,1]    NA
#> 51 trend[51,1]    NA
#> 52 trend[52,1]    NA
#> 53 trend[53,1]    NA
#> 54 trend[54,1]    NA
#> 55 trend[55,1]    NA
#> 56 trend[56,1]    NA
#> 57 trend[57,1]    NA
#> 58 trend[58,1]    NA
#> 59 trend[59,1]    NA
#> 60 trend[60,1]    NA
#> 61 trend[61,1]    NA
#> 62 trend[62,1]    NA
#> 63 trend[63,1]    NA
#> 64 trend[64,1]    NA
#> 65 trend[65,1]    NA
#> 66 trend[66,1]    NA
#> 67 trend[67,1]    NA
#> 68 trend[68,1]    NA
#> 69 trend[69,1]    NA
#> 70 trend[70,1]    NA
#> 71 trend[71,1]    NA
#> 72 trend[72,1]    NA
#> 73 trend[73,1]    NA
#> 74 trend[74,1]    NA
#> 75 trend[75,1]    NA
#> 
# }
================================================ FILE: docs/reference/index.html ================================================ Package index • mvgam Skip to contents

All functions

add_residuals()
Calculate randomized quantile residuals for mvgam objects
all_neon_tick_data
NEON Amblyomma and Ixodes tick abundance survey data
augment(<mvgam>)
Augment an mvgam object's data
code() stancode(<mvgam_prefit>) stancode(<mvgam>) standata(<mvgam_prefit>)
Stan code and data objects for mvgam models
conditional_effects(<mvgam>) plot(<mvgam_conditional_effects>) print(<mvgam_conditional_effects>)
Display conditional effects of predictors for mvgam models
dynamic()
Defining dynamic coefficients in mvgam formulae
ensemble()
Combine forecasts from mvgam models into evenly weighted ensembles
eval_mvgam() roll_eval_mvgam() compare_mvgams()
Evaluate forecasts from fitted mvgam objects
fevd()
Calculate latent VAR forecast error variance decompositions
fitted(<mvgam>)
Expected values of the posterior predictive distribution for mvgam objects
forecast()
Extract or compute hindcasts and forecasts for a fitted mvgam object
formula(<mvgam>) formula(<mvgam_prefit>)
Extract formulae from mvgam objects
get_mvgam_priors()
Extract information on default prior distributions for an mvgam model
GP()
Specify dynamic Gaussian process trends in mvgam models
drawDotmvgam() eval_smoothDothilbertDotsmooth() eval_smoothDotmodDotsmooth() eval_smoothDotmoiDotsmooth()
Enhance post-processing of mvgam models using gratia functionality
hindcast()
Extract hindcasts for a fitted mvgam object
how_to_cite()
Generate a methods description for mvgam models
variables(<mvgam>)
Index mvgam objects
irf()
Calculate latent VAR impulse response functions
jsdgam()
Fit Joint Species Distribution Models in mvgam
lfo_cv()
Approximate leave-future-out cross-validation of fitted mvgam objects
logLik(<mvgam>)
Compute pointwise Log-Likelihoods from fitted mvgam objects
loo(<mvgam>) loo_compare(<mvgam>)
LOO information criteria for mvgam models
lv_correlations()
Calculate trend correlations based on latent factor loadings for mvgam models
mcmc_plot(<mvgam>)
MCMC plots of mvgam parameters, as implemented in bayesplot
model.frame(<mvgam>) model.frame(<mvgam_prefit>)
Extract model.frame from a fitted mvgam object
smooth.construct(<moi.smooth.spec>) smooth.construct(<mod.smooth.spec>) Predict.matrix(<moi.smooth>) Predict.matrix(<mod.smooth>)
Monotonic splines in mvgam models
mvgam-class
Fitted mvgam object description
mvgam()
Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series
nuts_params(<mvgam>) log_posterior(<mvgam>) rhat(<mvgam>) neff_ratio(<mvgam>)
Extract diagnostic quantities of mvgam models
as.data.frame(<mvgam>) as.matrix(<mvgam>) as.array(<mvgam>) as_draws(<mvgam>) as_draws_matrix(<mvgam>) as_draws_df(<mvgam>) as_draws_array(<mvgam>) as_draws_list(<mvgam>) as_draws_rvars(<mvgam>)
Extract posterior draws from fitted mvgam objects
tweedie() student_t() betar() nb() lognormal() student() bernoulli() beta_binomial() nmix()
Supported mvgam families
mvgam_fevd-class
mvgam_fevd object description
mvgam_forecast-class
mvgam_forecast object description
mvgam_formulae
Details of formula specifications in mvgam models
mvgam_irf-class
mvgam_irf object description
get_coef(<mvgam>) set_coef(<mvgam>) get_vcov(<mvgam>) get_predict(<mvgam>) get_data(<mvgam>) get_data(<mvgam_prefit>) find_predictors(<mvgam>) find_predictors(<mvgam_prefit>)
Helper functions for marginaleffects calculations in mvgam models
mvgam_trends
Supported latent trend models in mvgam
pairs(<mvgam>)
Create a matrix of output plots from a mvgam object
PW()
Specify piecewise linear or logistic trends in mvgam models
plot(<mvgam>)
Default plots for mvgam models
plot(<mvgam_fevd>)
Plot forecast error variance decompositions from an mvgam_fevd object
plot(<mvgam_irf>)
Plot impulse responses from an mvgam_irf object
plot(<mvgam_lfo>)
Plot Pareto-k and ELPD values from a mvgam_lfo object
plot_mvgam_factors()
Latent factor summaries for a fitted mvgam object
plot_mvgam_fc() plot(<mvgam_forecast>)
Plot posterior forecast predictions from mvgam models
plot_mvgam_pterms()
Plot parametric term partial effects for mvgam models
plot_mvgam_randomeffects()
Plot random effect terms from mvgam models
plot_mvgam_resids()
Residual diagnostics for a fitted mvgam object
plot_mvgam_series()
Plot observed time series used for mvgam modelling
plot_mvgam_smooth()
Plot smooth terms from mvgam models
plot_mvgam_trend()
Plot latent trend predictions from mvgam models
plot_mvgam_uncertainty()
Plot forecast uncertainty contributions from mvgam models
portal_data
Portal Project rodent capture survey data
posterior_epred(<mvgam>)
Draws from the expected value of the posterior predictive distribution for mvgam objects
posterior_linpred(<mvgam>)
Posterior draws of the linear predictor for mvgam objects
posterior_predict(<mvgam>)
Draws from the posterior predictive distribution for mvgam objects
ppc()
Plot conditional posterior predictive checks from mvgam models
pp_check(<mvgam>)
Posterior Predictive Checks for mvgam models
predict(<mvgam>)
Predict from a fitted mvgam model
print(<mvgam>)
Summary for a fitted mvgam object
residuals(<mvgam>)
Posterior draws of residuals from mvgam models
residual_cor()
Extract residual correlations based on latent factors from a fitted jsdgam
RW() AR() CAR() VAR()
Specify autoregressive dynamic processes in mvgam
score()
Compute probabilistic forecast scores for mvgam models
series_to_mvgam()
Convert timeseries object to format necessary for mvgam models
sim_mvgam()
Simulate a set of time series for modelling in mvgam
stability()
Calculate measures of latent VAR community stability
summary(<mvgam>) summary(<mvgam_prefit>) coef(<mvgam>)
Summary for a fitted mvgam models
update(<mvgam>) update(<jsdgam>)
Update an existing mvgam model object
ZMVN()
Specify correlated residual processes in mvgam
================================================ FILE: docs/reference/irf.mvgam.html ================================================ Calculate latent VAR impulse response functions — irf.mvgam • mvgam Skip to contents

Compute Generalized or Orthogonalized Impulse Response Functions (IRFs) from mvgam models with Vector Autoregressive dynamics

Usage

irf(object, ...)

# S3 method for class 'mvgam'
irf(object, h = 1, cumulative = FALSE, orthogonal = FALSE, ...)

Arguments

object

list object of class mvgam resulting from a call to mvgam() that used a Vector Autoregressive latent process model (either as VAR(cor = FALSE) or VAR(cor = TRUE))

...

ignored

h

Positive integer specifying the forecast horizon over which to calculate the IRF

cumulative

Logical flag indicating whether the IRF should be cumulative

orthogonal

Logical flag indicating whether orthogonalized IRFs should be calculated. Note that the order of the variables matters when calculating these

Value

An object of class mvgam_irf containing the posterior IRFs. This object can be used with the supplied S3 functions plot

Details

Generalized or Orthogonalized Impulse Response Functions can be computed using the posterior estimates of Vector Autoregressive parameters. This function generates a positive "shock" for a target process at time t = 0 and then calculates how each of the remaining processes in the latent VAR are expected to respond over the forecast horizon h. The function computes IRFs for all processes in the object and returns them in an array that can be plotted using the S3 plot function. To inspect community-level metrics of stability using latent VAR processes, you can use the related stability function.

References

PH Pesaran & Shin Yongcheol (1998). Generalized impulse response analysis in linear multivariate models. Economics Letters 58: 17–29.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some time series that follow a latent VAR(1) process
simdat <- sim_mvgam(family = gaussian(),
                    n_series = 4,
                    trend_model = VAR(cor = TRUE),
                    prop_trend = 1)
plot_mvgam_series(data = simdat$data_train, series = 'all')


# Fit a model that uses a latent VAR(1)
mod <- mvgam(y ~ -1,
             trend_formula = ~ 1,
             trend_model = VAR(cor = TRUE),
             family = gaussian(),
             data = simdat$data_train,
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c752de38.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Calulate Generalized IRFs for each series
irfs <- irf(mod, h = 12, cumulative = FALSE)

# Plot them
plot(irfs, series = 1)

plot(irfs, series = 2)

plot(irfs, series = 3)

# }
================================================ FILE: docs/reference/jsdgam.html ================================================ Fit Joint Species Distribution Models in mvgam — jsdgam • mvgam Skip to contents

This function sets up a Joint Species Distribution Model whereby the residual associations among species can be modelled in a reduced-rank format using a set of latent factors. The factor specification is extremely flexible, allowing users to include spatial, temporal or any other type of predictor effects to more efficiently capture unmodelled residual associations, while the observation model can also be highly flexible (including all smooth, GP and other effects that mvgam can handle)

Usage

jsdgam(
  formula,
  factor_formula = ~-1,
  knots,
  factor_knots,
  data,
  newdata,
  family = poisson(),
  unit = time,
  species = series,
  share_obs_params = FALSE,
  priors,
  n_lv = 2,
  backend = getOption("brms.backend", "cmdstanr"),
  algorithm = getOption("brms.algorithm", "sampling"),
  control = list(max_treedepth = 10, adapt_delta = 0.8),
  chains = 4,
  burnin = 500,
  samples = 500,
  thin = 1,
  parallel = TRUE,
  threads = 1,
  silent = 1,
  run_model = TRUE,
  return_model_data = FALSE,
  ...
)

Arguments

formula

A formula object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, s(), te(), ti(), t2(), as well as time-varying dynamic() terms, nonparametric gp() terms and offsets using offset(), can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). Details of the formula syntax used by mvgam can be found in mvgam_formulae

factor_formula

A formula object specifying the linear predictor effects for the latent factors. Use by = trend within calls to functional terms (i.e. s(), te(), ti(), t2(), dynamic(), or gp()) to ensure that each factor captures a different axis of variation. See the example below as an illustration

knots

An optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

factor_knots

An optional list containing user specified knot values to be used for basis construction of any smooth terms in factor_formula. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and factor_formula objects

newdata

Optional dataframe or list of test data containing the same variables as in data. If included, the observations in variable y will be set to NA when fitting the model so that posterior simulations can be obtained

family

family specifying the observation family for the outcomes. Currently supported families are:

Default is poisson(). See mvgam_families for more details

unit

The unquoted name of the variable that represents the unit of analysis in data over which latent residuals should be correlated. This variable should be either a numeric or integer variable in the supplied data. Defaults to time to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. See examples below for further details and explanations

species

The unquoted name of the factor variable that indexes the different response units in data (usually 'species' in a JSDM). Defaults to series to be consistent with other mvgam models

share_obs_params

logical. If TRUE and the family has additional family-specific observation parameters (e.g. variance components in student_t() or gaussian(), or dispersion parameters in nb() or betar()), these parameters will be shared across all outcome variables. This is handy if you have multiple outcomes (time series in most mvgam models) that you believe share some properties, such as being from the same species over different spatial units. Default is FALSE.

priors

An optional data.frame with prior definitions (in Stan syntax) or, preferentially, a vector containing objects of class brmsprior (see. prior for details). See get_mvgam_priors and for more information on changing default prior distributions

n_lv

integer the number of latent factors to use for modelling residual associations. Cannot be > n_species. Defaults arbitrarily to 2

backend

Character string naming the package to use as the backend for fitting the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally for the current R session via the "brms.backend" option (see options). Details on the rstan and cmdstanr packages are available at https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/, respectively

algorithm

Character string naming the estimation approach to use. Options are "sampling" for MCMC (the default), "meanfield" for variational inference with factorized normal distributions, "fullrank" for variational inference with a multivariate normal distribution, "laplace" for a Laplace approximation (only available when using cmdstanr as the backend) or "pathfinder" for the pathfinder algorithm (only currently available when using cmdstanr as the backend). Can be set globally for the current R session via the "brms.algorithm" option (see options). Limited testing suggests that "meanfield" performs best out of the non-MCMC approximations for dynamic GAMs, possibly because of the difficulties estimating covariances among the many spline parameters and latent trend parameters. But rigorous testing has not been carried out

control

A named list for controlling the sampler's behaviour. Valid elements include max_treedepth, adapt_delta and init

chains

integer specifying the number of parallel chains for the model. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

burnin

integer specifying the number of warmup iterations of the Markov chain to run to tune sampling algorithms. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

samples

integer specifying the number of post-warmup iterations of the Markov chain to run for sampling the posterior distribution

thin

Thinning interval for monitors. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

parallel

logical specifying whether multiple cores should be used for generating MCMC simulations in parallel. If TRUE, the number of cores to use will be min(c(chains, parallel::detectCores() - 1))

threads

integer Experimental option to use multithreading for within-chain parallelisation in Stan. We recommend its use only if you are experienced with Stan's reduce_sum function and have a slow running model that cannot be sped up by any other means. Currently works for all families when using Cmdstan as the backend

silent

Verbosity level between 0 and 2. If 1 (the default), most of the informational messages of compiler and sampler are suppressed. If 2, even more messages are suppressed. The actual sampling progress is still printed. Set refresh = 0 to turn this off as well. If using backend = "rstan" you can also set open_progress = FALSE to prevent opening additional progress bars.

run_model

logical. If FALSE, the model is not fitted but instead the function will return the model file and the data / initial values that are needed to fit the model outside of mvgam

return_model_data

logical. If TRUE, the list of data that is needed to fit the model is returned, along with the initial values for smooth and AR parameters, once the model is fitted. This will be helpful if users wish to modify the model file to add other stochastic elements that are not currently available in mvgam. Default is FALSE to reduce the size of the returned object, unless run_model == FALSE

...

Other arguments to pass to mvgam

Value

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each species and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods

Details

Joint Species Distribution Models allow for responses of multiple species to be learned hierarchically, whereby responses to environmental variables in formula can be partially pooled and any latent, unmodelled residual associations can also be learned. In mvgam, both of these effects can be modelled with the full power of latent factor Hierarchical GAMs, providing unmatched flexibility to model full communities of species. When calling jsdgam, an initial State-Space model using trend = 'None' is set up and then modified to include the latent factors and their linear predictors. Consequently, you can inspect priors for these models using get_mvgam_priors by supplying the relevant formula, factor_formula, data and family arguments and keeping the default trend = 'None'.

In a JSDGAM, the expectation of response \(Y_{ij}\) is modelled with

$$g(\mu_{ij}) = X_i\beta + u_i\theta_j,$$

where \(g(.)\) is a known link function, \(X\) is a design matrix of linear predictors (with associated \(\beta\) coefficients), \(u\) are \(n_{lv}\)-variate latent factors (\(n_{lv}\)<<\(n_{species}\)) and \(\theta_j\) are species-specific loadings on the latent factors, respectively. The design matrix \(X\) and \(\beta\) coefficients are constructed and modelled using formula and can contain any of mvgam's predictor effects, including random intercepts and slopes, multidimensional penalized smooths, GP effects etc... The factor loadings \(\theta_j\) are constrained for identifiability but can be used to reconstruct an estimate of the species' residual variance-covariance matrix using \(\Theta \Theta'\) (see the example below and residual_cor() for details). The latent factors are further modelled using: $$ u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) \quad $$ where the second design matrix \(Q\) and associated \(\beta_{factor}\) coefficients are constructed and modelled using factor_formula. Again, the effects that make up this linear predictor can contain any of mvgam's allowed predictor effects, providing enormous flexibility for modelling species' communities.

References

Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784.

David I Warton, F Guillaume Blanchet, Robert B O’Hara, Otso Ovaskainen, Sara Taskinen, Steven C Walker & Francis KC Hui (2015). So many variables: joint modeling in community ecology. Trends in Ecology & Evolution 30:12, 766-779.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate latent count data for 500 spatial locations and 10 species
set.seed(0)
N_points <- 500
N_species <- 10

# Species-level intercepts (on the log scale)
alphas <- runif(N_species, 2, 2.25)

# Simulate a covariate and species-level responses to it
temperature <- rnorm(N_points)
betas <- runif(N_species, -0.5, 0.5)

# Simulate points uniformly over a space
lon <- runif(N_points, min = 150, max = 155)
lat <- runif(N_points, min = -20, max = -19)

# Set up spatial basis functions as a tensor product of lat and lon
sm <- mgcv::smoothCon(mgcv::te(lon, lat, k = 5),
                      data = data.frame(lon, lat),
                      knots = NULL)[[1]]

# The design matrix for this smooth is in the 'X' slot
des_mat <- sm$X
dim(des_mat)
#> [1] 500  25

# Function to generate a random covariance matrix where all variables
# have unit variance (i.e. diagonals are all 1)
random_Sigma = function(N){
  L_Omega <- matrix(0, N, N);
  L_Omega[1, 1] <- 1;
  for (i in 2 : N) {
    bound <- 1;
    for (j in 1 : (i - 1)) {
      L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound));
      bound <- bound - L_Omega[i, j] ^ 2;
    }
    L_Omega[i, i] <- sqrt(bound);
  }
  Sigma <- L_Omega %*% t(L_Omega);
  return(Sigma)
}

# Simulate a variance-covariance matrix for the correlations among
# basis coefficients
Sigma <- random_Sigma(N = NCOL(des_mat))

# Now simulate the species-level basis coefficients hierarchically, where
# spatial basis function correlations are a convex sum of a base correlation
# matrix and a species-level correlation matrix
basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma))
base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma)
for(t in 1:N_species){
  corOmega <- (cov2cor(Sigma) * 0.7) +
                 (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat))))
  basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega)
}

# Simulate the latent spatial processes
st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){
  data.frame(lat = lat,
             lon = lon,
             species = paste0('species_', t),
             temperature = temperature,
             process = alphas[t] +
               betas[t] * temperature +
               des_mat %*% basis_coefs[t,])
}))

# Now take noisy observations at some of the points (60)
obs_points <- sample(1:N_points, size = 60, replace = FALSE)
obs_points <- data.frame(lat = lat[obs_points],
                         lon = lon[obs_points],
                         site = 1:60)

# Keep only the process data at these points
st_process %>%
  dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>%
  # now take noisy Poisson observations of the process
  dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>%
  dplyr::mutate(species = factor(species,
                                 levels = paste0('species_', 1:N_species))) %>%
  dplyr::group_by(lat, lon) -> dat

# View the count distributions for each species
library(ggplot2)
ggplot(dat, aes(x = count)) +
  geom_histogram() +
  facet_wrap(~ species, scales = 'free')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) +
  geom_point(size = 2.25) +
  facet_wrap(~ species, scales = 'free') +
  scale_color_viridis_c() +
  theme_classic()


# Inspect default priors for a joint species model with three spatial factors
priors <- get_mvgam_priors(formula = count ~
                            # Environmental model includes random slopes for
                            # a linear effect of temperature
                            s(species, bs = 're', by = temperature),

                          # Each factor estimates a different nonlinear spatial process, using
                          # 'by = trend' as in other mvgam State-Space models
                          factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1,
                          n_lv = 3,

                          # The data and grouping variables
                          data = dat,
                          unit = site,
                          species = species,

                          # Poisson observations
                          family = poisson())
#> Warning: gp effects in mvgam cannot yet handle autogrouping
#> resetting all instances of 'gr = TRUE' to 'gr = FALSE'
#> This warning is displayed once per session.
head(priors)
#>                                            param_name param_length
#> 1                                         (Intercept)            1
#> 2                                   vector[1] mu_raw;            1
#> 3                       vector<lower=0>[1] sigma_raw;            1
#> 4 real<lower=0> alpha_gp_trend(lon, lat):trendtrend1;            1
#> 5 real<lower=0> alpha_gp_trend(lon, lat):trendtrend2;            1
#> 6 real<lower=0> alpha_gp_trend(lon, lat):trendtrend3;            1
#>                                    param_info
#> 1                                 (Intercept)
#> 2             s(species):temperature pop mean
#> 3               s(species):temperature pop sd
#> 4 gp(lon, lat):trendtrend1 marginal deviation
#> 5 gp(lon, lat):trendtrend2 marginal deviation
#> 6 gp(lon, lat):trendtrend3 marginal deviation
#>                                                          prior
#> 1                        (Intercept) ~ student_t(3, 2.1, 2.5);
#> 2                                       mu_raw ~ std_normal();
#> 3                            sigma_raw ~ student_t(3, 0, 2.5);
#> 4 alpha_gp_trend(lon, lat):trendtrend1 ~ student_t(3, 0, 2.5);
#> 5 alpha_gp_trend(lon, lat):trendtrend2 ~ student_t(3, 0, 2.5);
#> 6 alpha_gp_trend(lon, lat):trendtrend3 ~ student_t(3, 0, 2.5);
#>                                            example_change new_lowerbound
#> 1                             (Intercept) ~ normal(0, 1);           <NA>
#> 2                            mu_raw ~ normal(0.65, 0.15);           <NA>
#> 3                          sigma_raw ~ exponential(0.27);           <NA>
#> 4 alpha_gp_trend(lon, lat):trendtrend1 ~ normal(0, 0.86);           <NA>
#> 5 alpha_gp_trend(lon, lat):trendtrend2 ~ normal(0, 0.84);           <NA>
#> 6 alpha_gp_trend(lon, lat):trendtrend3 ~ normal(0, 0.78);           <NA>
#>   new_upperbound
#> 1           <NA>
#> 2           <NA>
#> 3           <NA>
#> 4           <NA>
#> 5           <NA>
#> 6           <NA>

# Fit a JSDM that estimates hierarchical temperature responses
# and that uses three latent spatial factors
mod <- jsdgam(formula = count ~
                # Environmental model includes random slopes for a
                # linear effect of temperature
                s(species, bs = 're', by = temperature),

              # Each factor estimates a different nonlinear spatial process, using
              # 'by = trend' as in other mvgam State-Space models
              factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1,
              n_lv = 3,

              # Change default priors for fixed random effect variances and
              # factor P marginal deviations to standard normal
              priors = c(prior(std_normal(),
                               class = sigma_raw),
                         prior(std_normal(),
                               class = `alpha_gp_trend(lon, lat):trendtrend1`),
                         prior(std_normal(),
                               class = `alpha_gp_trend(lon, lat):trendtrend2`),
                         prior(std_normal(),
                               class = `alpha_gp_trend(lon, lat):trendtrend3`)),

              # The data and the grouping variables
              data = dat,
              unit = site,
              species = species,

              # Poisson observations
              family = poisson(),
              chains = 2,
              silent = 2)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-161822fb7bd.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):      
#>                                         \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> sta
#> n/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                
#>               \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Plot species-level intercept estimates
plot_predictions(mod, condition = 'species',
                 type = 'link')


# Plot species' hierarchical responses to temperature
plot_predictions(mod, condition = c('temperature', 'species', 'species'),
                 type = 'link')


# Plot posterior median estimates of the latent spatial factors
plot(mod, type = 'smooths', trend_effects = TRUE)


# Or using gratia, if you have it installed
if(requireNamespace('gratia', quietly = TRUE)){
  gratia::draw(mod, trend_effects = TRUE)
}


# Calculate residual spatial correlations
post_cors <- residual_cor(mod)
#> Package `corpcor` required for this function to work.
#> Installing package into ‘C:/Users/uqnclar2/AppData/Local/R/win-library/4.4’
#> (as ‘lib’ is unspecified)
#> package ‘corpcor’ successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#> 	C:\Users\uqnclar2\AppData\Local\Temp\Rtmpodm0uo\downloaded_packages
names(post_cors)
#>  [1] "cor"        "cor_lower"  "cor_upper"  "sig_cor"    "cov"       
#>  [6] "prec"       "prec_lower" "prec_upper" "sig_prec"   "trace"     
# Look at lower and upper credible interval estimates for
# some of the estimated correlations
post_cors$cor[1:5, 1:5]
#>            species_1  species_2   species_3  species_4   species_5
#> species_1  1.0000000  0.7570458  0.69634691  0.7576608 -0.26420750
#> species_2  0.7570458  1.0000000  0.19415381  0.4180474 -0.49248276
#> species_3  0.6963469  0.1941538  1.00000000  0.8941231 -0.07249719
#> species_4  0.7576608  0.4180474  0.89412308  1.0000000 -0.47278141
#> species_5 -0.2642075 -0.4924828 -0.07249719 -0.4727814  1.00000000
post_cors$cor_upper[1:5, 1:5]
#>           species_1  species_2 species_3  species_4  species_5
#> species_1 1.0000000  0.9999387 0.9718005  0.9585023  0.1490273
#> species_2 0.9999387  1.0000000 0.5509244  0.7210487 -0.2036591
#> species_3 0.9718005  0.5509244 1.0000000  0.9754868  0.2749775
#> species_4 0.9585023  0.7210487 0.9754868  1.0000000 -0.1509840
#> species_5 0.1490273 -0.2036591 0.2749775 -0.1509840  1.0000000
post_cors$cor_lower[1:5, 1:5]
#>            species_1   species_2  species_3   species_4  species_5
#> species_1  1.0000000  0.43440522  0.1996586  0.35785727 -0.6422784
#> species_2  0.4344052  1.00000000 -0.1869907  0.04720916 -0.7303211
#> species_3  0.1996586 -0.18699072  1.0000000  0.74860157 -0.4280180
#> species_4  0.3578573  0.04720916  0.7486016  1.00000000 -0.7389906
#> species_5 -0.6422784 -0.73032110 -0.4280180 -0.73899063  1.0000000
# A quick and dirty plot of the posterior median correlations
image(post_cors$cor)


# Posterior predictive checks and ELPD-LOO can ascertain model fit
pp_check(mod, type = "pit_ecdf_grouped",
         group = "species", ndraws = 100)

loo(mod)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> 
#> Computed from 1000 by 600 log-likelihood matrix.
#> 
#>          Estimate    SE
#> elpd_loo  -2194.9  50.4
#> p_loo       426.8  26.0
#> looic      4389.8 100.8
#> ------
#> MCSE of elpd_loo is NA.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.0, 1.5]).
#> 
#> Pareto k diagnostic values:
#>                           Count Pct.    Min. ESS
#> (-Inf, 0.67]   (good)     436   72.7%   1       
#>    (0.67, 1]   (bad)      136   22.7%   <NA>    
#>     (1, Inf)   (very bad)  28    4.7%   <NA>    
#> See help('pareto-k-diagnostic') for details.

# Forecast log(counts) for entire region (site value doesn't matter as long
# as each spatial location has a different and unique site identifier);
# note this calculation takes a few minutes because of the need to calculate
# draws from the stochastic latent factors
newdata <- st_process %>%
                   dplyr::mutate(species = factor(species,
                                                  levels = paste0('species_',
                                                                  1:N_species))) %>%
                   dplyr::group_by(lat, lon) %>%
                   dplyr::mutate(site = dplyr::cur_group_id()) %>%
                   dplyr::ungroup()
preds <- predict(mod, newdata = newdata)

# Plot the median log(count) predictions on a grid
newdata$log_count <- preds[,1]
ggplot(newdata, aes(x = lon, y = lat, col = log_count)) +
  geom_point(size = 1.5) +
  facet_wrap(~ species, scales = 'free') +
  scale_color_viridis_c() +
  theme_classic()

# }
================================================ FILE: docs/reference/lfo_cv.mvgam.html ================================================ Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam • mvgam Skip to contents

Approximate leave-future-out cross-validation of fitted mvgam objects

Usage

lfo_cv(object, ...)

# S3 method for class 'mvgam'
lfo_cv(
  object,
  data,
  min_t,
  fc_horizon = 1,
  pareto_k_threshold = 0.7,
  silent = 1,
  ...
)

Arguments

object

list object of class mvgam. See mvgam()

...

Ignored

data

A dataframe or list containing the model response variable and covariates required by the GAM formula. Should include columns: 'series' (character or factor index of the series IDs) 'time' (numeric index of the time point for each observation). Any other variables to be included in the linear predictor of formula must also be present

min_t

Integer specifying the minimum training time required before making predictions from the data. Default is either the 30th timepoint in the observational data, or whatever training time allows for at least 10 lfo-cv calculations, if possible. This value is essentially arbitrary so it is highly recommended to change it to something that is more suitable to the data and models being evaluated.

fc_horizon

Integer specifying the number of time steps ahead for evaluating forecasts

pareto_k_threshold

Proportion specifying the threshold over which the Pareto shape parameter is considered unstable, triggering a model refit. Default is 0.7

silent

Verbosity level between 0 and 2. If 1 (the default), most of the informational messages of compiler and sampler are suppressed. If 2, even more messages are suppressed. The actual sampling progress is still printed. Set refresh = 0 to turn this off as well. If using backend = "rstan" you can also set open_progress = FALSE to prevent opening additional progress bars.

Value

A list of class mvgam_lfo containing the approximate ELPD scores, the Pareto-k shape values and 'the specified pareto_k_threshold

Details

Approximate leave-future-out cross-validation uses an expanding training window scheme to evaluate a model on its forecasting ability. The steps used in this function mirror those laid out in the lfo vignette from the loo package, written by Paul Bürkner, Jonah Gabry, Aki Vehtari. First, we refit the model using the first min_t observations to perform a single exact fc_horizon-ahead forecast step. This forecast is evaluated against the min_t + fc_horizon out of sample observations using the Expected Log Predictive Density (ELPD). Next, we approximate each successive round of expanding window forecasts by moving forward one step at a time for i in 1:N_evaluations and re-weighting draws from the model's posterior predictive distribution using Pareto Smoothed Importance Sampling (PSIS). In each iteration i, PSIS weights are obtained for the next observation that would have been included in the model if we had re-fit (i.e. the last observation that would have been in the training data, or min_t + i). If these importance ratios are stable, we consider the approximation adequate and use the re-weighted posterior's forecast for evaluating the next holdout set of testing observations ((min_t + i + 1):(min_t + i + fc_horizon)). At some point the importance ratio variability will become too large and importance sampling will fail. This is indicated by the estimated shape parameter k of the generalized Pareto distribution crossing a certain threshold pareto_k_threshold. Only then do we refit the model using all of the observations up to the time of the failure. We then restart the process and iterate forward until the next refit is triggered (Bürkner et al. 2020).

References

Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models Journal of Statistical Computation and Simulation. 90:14, 2499-2523.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate from a Poisson-AR2 model with a seasonal smooth
set.seed(100)
dat <- sim_mvgam(T = 75,
                n_series = 1,
                prop_trend = 0.75,
                trend_model = 'AR2',
                family = poisson())

# Plot the time series
plot_mvgam_series(data = dat$data_train,
                 newdata = dat$data_test,
                 series = 1)


# Fit an appropriate model
mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
               trend_model = AR(p = 2),
               family = poisson(),
               data = dat$data_train,
               newdata = dat$data_test,
               chains = 2,
               silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8ce74417c.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Fit a less appropriate model
mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6),
              trend_model = RW(),
              family = poisson(),
              data = dat$data_train,
              newdata = dat$data_test,
              chains = 2,
              silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c50f947c1.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Compare Discrete Ranked Probability Scores for the testing period
fc_ar2 <- forecast(mod_ar2)
fc_rw <- forecast(mod_rw)
score_ar2 <- score(fc_ar2, score = 'drps')
score_rw <- score(fc_rw, score = 'drps')
sum(score_ar2$series_1$score)
#> [1] 23.30538
sum(score_rw$series_1$score)
#> [1] 758.3094

# Now use approximate leave-future-out CV to compare
# rolling forecasts; start at time point 40 to reduce
# computational time and to ensure enough data is available
# for estimating model parameters
lfo_ar2 <- lfo_cv(mod_ar2,
                 min_t = 40,
                 fc_horizon = 3,
                 silent = 2)
lfo_rw <- lfo_cv(mod_rw,
                min_t = 40,
                fc_horizon = 3,
                silent = 2)

# Plot Pareto-K values and ELPD estimates
plot(lfo_ar2)

plot(lfo_rw)


# Proportion of timepoints in which AR2 model gives better forecasts
length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) /
      length(lfo_ar2$elpds)
#> [1] 1

# A higher total ELPD is preferred
lfo_ar2$sum_ELPD
#> [1] -77.51777
lfo_rw$sum_ELPD
#> [1] -95.0276
# }
================================================ FILE: docs/reference/logLik.mvgam.html ================================================ Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam • mvgam Skip to contents

Compute pointwise Log-Likelihoods from fitted mvgam objects

Usage

# S3 method for class 'mvgam'
logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)

Arguments

object

list object of class mvgam or jsdgam

linpreds

Optional matrix of linear predictor draws to use for calculating pointwise log-likelihoods

newdata

Optional data.frame or list object specifying which series each column in linpreds belongs to. If linpreds is supplied, then newdata must also be supplied

family_pars

Optional list containing posterior draws of family-specific parameters (i.e. shape, scale or overdispersion parameters). Required if linpreds and newdata are supplied

include_forecast

Logical. If newdata were fed to the model to compute forecasts, should the log-likelihood draws for these observations also be returned. Defaults to TRUE

...

Ignored

Value

A matrix of dimension n_samples x n_observations containing the pointwise log-likelihood draws for all observations in newdata. If no newdata is supplied, log-likelihood draws are returned for all observations that were originally fed to the model (training observations and, if supplied to the original model via the newdata argument in mvgam, testing observations)

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1')
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             trend_model = AR(),
             data = simdat$data_train,
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3a1511b1.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Extract logLikelihood values
lls <- logLik(mod)
str(lls)
#>  num [1:1000, 1:75] -0.263 -0.125 -0.205 -0.192 -0.318 ...
# }
================================================ FILE: docs/reference/loo.mvgam.html ================================================ LOO information criteria for mvgam models — loo.mvgam • mvgam Skip to contents

Extract the LOOIC (leave-one-out information criterion) using loo::loo()

Usage

# S3 method for class 'mvgam'
loo(x, incl_dynamics = TRUE, ...)

# S3 method for class 'mvgam'
loo_compare(x, ..., model_names = NULL, incl_dynamics = TRUE)

Arguments

x

Object of class mvgam

incl_dynamics

Logical; indicates if any latent dynamic structures that were included in the model should be considered when calculating in-sample log-likelihoods. Defaults to TRUE

...

More mvgam objects.

model_names

If NULL (the default) will use model names derived from deparsing the call. Otherwise will use the passed values as model names.

Value

for loo.mvgam, an object of class psis_loo (see loo::loo() for details). For loo_compare.mvgam, an object of class compare.loo ( loo::loo_compare() for details)

Details

When comparing two (or more) fitted mvgam models, we can estimate the difference in their in-sample predictive accuracies using the Expcted Log Predictive Density (ELPD). This metric can be approximated using Pareto Smoothed Importance Sampling, which is a method to re-weight posterior draws to approximate what predictions the models might have made for a given datapoint had that datapoint not been included in the original model fit (i.e. if we were to run a leave-one-out cross-validation and then made a prediction for the held-out datapoint). See details from loo::loo() and loo::loo_compare() for further information on how this importance sampling works.

There are two fundamentally different ways to calculate ELPD from mvgam models that included dynamic latent processes (i.e. "trend_models"). The first is to use the predictions that were generated when estimating these latent processes by setting incl_dynamics = TRUE. This works in the same way that setting incl_autocor = TRUE in brms::prepare_predictions(). But it may also be desirable to compare predictions by considering that the dynamic processes are nuisance parameters that we'd wish to account for when making inferences about other processes in the model (i.e. the linear predictor effects). Setting incl_dynamics = FALSE will accomplish this by ignoring the dynamic processes when making predictions. This option matches up with what mvgam's prediction functions return (i.e. predict.mvgam, ppc, pp_check.mvgam, posterior_epred.mvgam) and will be far less forgiving of models that may be overfitting the training data due to highly flexible dynamic processes (such as Random Walks, for example). However setting incl_dynamics = FALSE will often result in less stable Pareto k diagnostics for models with dynamic trends, making ELPD comparisons difficult and unstable. It is therefore recommended to generally stick with incl_dynamics = TRUE when comparing models based on in-sample fits, and then to perhaps use forecast evaluations for further scrutiny of models (see for example forecast.mvgam, score.mvgam_forecast and lfo_cv)

Examples

# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and independent AR1 dynamic processes
set.seed(111)
simdat <- sim_mvgam(seasonality = 'hierarchical',
                   trend_model = AR(),
                   family = gaussian())

# Fit a model with shared seasonality
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             data = rbind(simdat$data_train,
             simdat$data_test),
             family = gaussian(),
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c31b655b3.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Inspect the model and calculate LOO
conditional_effects(mod1)

mc.cores.def <- getOption('mc.cores')
options(mc.cores = 1)
loo(mod1)
#> 
#> Computed from 1000 by 300 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -363.7 11.1
#> p_loo         6.9  0.5
#> looic       727.4 22.2
#> ------
#> MCSE of elpd_loo is 0.1.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 2.1]).
#> 
#> All Pareto k estimates are good (k < 0.67).
#> See help('pareto-k-diagnostic') for details.

# Now fit a model with hierarchical seasonality
mod2 <- update(mod1,
              formula = y ~ s(season, bs = 'cc', k = 6) +
              s(season, series, bs = 'fs',
              xt = list(bs = 'cc'), k = 4),
              chains = 2,
              silent = 2)
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c98114fa.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
conditional_effects(mod2)


loo(mod2)
#> 
#> Computed from 1000 by 300 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -308.9 11.4
#> p_loo        12.4  1.0
#> looic       617.7 22.7
#> ------
#> MCSE of elpd_loo is 0.1.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.3]).
#> 
#> All Pareto k estimates are good (k < 0.67).
#> See help('pareto-k-diagnostic') for details.

# Now add AR1 dynamic errors to mod2
mod3 <- update(mod2,
              trend_model = AR(),
              chains = 2,
              silent = 2)
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c14563ad8.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
conditional_effects(mod3)


plot(mod3, type = 'trend')

loo(mod3)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> 
#> Computed from 1000 by 300 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -235.3  9.5
#> p_loo       181.9  7.2
#> looic       470.7 19.1
#> ------
#> MCSE of elpd_loo is NA.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.0, 0.1]).
#> 
#> Pareto k diagnostic values:
#>                           Count Pct.    Min. ESS
#> (-Inf, 0.67]   (good)     152   50.7%   1       
#>    (0.67, 1]   (bad)      137   45.7%   <NA>    
#>     (1, Inf)   (very bad)  11    3.7%   <NA>    
#> See help('pareto-k-diagnostic') for details.

# Compare models using LOO
loo_compare(mod1, mod2, mod3)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#>      elpd_diff se_diff
#> mod3    0.0       0.0 
#> mod2  -73.5       5.1 
#> mod1 -128.3       9.6 
options(mc.cores = mc.cores.def)

# Compare forecast abilities using an expanding training window and
# forecasting ahead 1 timepoint from each window; the first window by includes
# the first 92 timepoints (of the 100 that were simulated)
max(mod2$obs_data$time)
#> [1] 100
lfo_mod2 <- lfo_cv(mod2, min_t = 92)
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c74ae7191.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 1.0 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 1.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 1.0 seconds.
#> Total execution time: 1.2 seconds.
#> 
lfo_mod3 <- lfo_cv(mod3, min_t = 92)
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c18c31528.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 16.3 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 16.9 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 16.6 seconds.
#> Total execution time: 17.0 seconds.
#> 
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 7.4 seconds.
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 16.9 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 12.2 seconds.
#> Total execution time: 17.0 seconds.
#> 
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 8.4 seconds.
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 16.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 12.4 seconds.
#> Total execution time: 16.5 seconds.
#> 
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 10.1 seconds.
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 29.7 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 19.9 seconds.
#> Total execution time: 29.7 seconds.
#> 

# Take the difference in forecast ELPDs; a model with higher ELPD is preferred,
# so negative values here indicate that mod3 gave better forecasts for a particular
# out of sample timepoint
plot(y = lfo_mod2$elpds - lfo_mod3$elpds,
    x = lfo_mod2$eval_timepoints, pch = 16,
    ylab = 'ELPD_mod2 - ELPD_mod3',
    xlab = 'Evaluation timepoint')
abline(h = 0, lty = 'dashed')

# }
================================================ FILE: docs/reference/lv_correlations.html ================================================ Calculate trend correlations based on latent factor loadings for mvgam models — lv_correlations • mvgam Skip to contents

This function uses samples of latent trends for each series from a fitted mvgam model to calculates correlations among series' trends

Usage

lv_correlations(object)

Arguments

object

list object of class mvgam

Value

A list object containing the mean posterior correlations and the full array of posterior correlations

Examples

# \donttest{
simdat <- sim_mvgam()
mod <- mvgam(y ~ s(season, bs = 'cc',
                  k = 6),
            trend_model = AR(),
            use_lv = TRUE,
            n_lv = 2,
            data = simdat$data_train,
            burnin = 300,
            samples = 300,
            chains = 2,
            silent = 2)
#> Warning in 'C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model_0f5e123adc47a9208b1c8eaa9e58906b.stan', line 23, column 31: Found

#>     int division:

#>       n_lv * (n_lv - 1) / 2

#>     Values will be rounded towards zero. If rounding is not desired you can

#>     write

#>     the division as

#>       n_lv * (n_lv - 1) / 2.0

#>     If rounding is intended please use the integer division operator %/%.

#> Warning in 'C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c633b5a5b.stan', line 23, column 33: Found

#>     int division:

#>       n_lv * (n_lv - 1) / 2

#>     Values will be rounded towards zero. If rounding is not desired you can

#>     write

#>     the division as

#>       n_lv * (n_lv - 1) / 2.0

#>     If rounding is intended please use the integer division operator %/%.

#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c633b5a5b.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
lvcors <- lv_correlations(mod)
names(lvcors)
#> [1] "mean_correlations"      "posterior_correlations"
lapply(lvcors, class)
#> $mean_correlations
#> [1] "matrix" "array" 
#> 
#> $posterior_correlations
#> [1] "list"
#> 
# }
================================================ FILE: docs/reference/mcmc_plot.mvgam.html ================================================ MCMC plots of mvgam parameters, as implemented in bayesplot — mcmc_plot.mvgam • mvgam Skip to contents

Convenient way to call MCMC plotting functions implemented in the bayesplot package for mvgam models

Usage

# S3 method for class 'mvgam'
mcmc_plot(
  object,
  type = "intervals",
  variable = NULL,
  regex = FALSE,
  use_alias = TRUE,
  ...
)

Arguments

object

An R object typically of class brmsfit

type

The type of the plot. Supported types are (as names) hist, dens, hist_by_chain, dens_overlay, violin, intervals, areas, areas_ridges, combo, acf, acf_bar, trace, trace_highlight, scatter, hex, pairs, violin, rhat, rhat_hist, neff, neff_hist and nuts_energy. For an overview on the various plot types see MCMC-overview.

variable

Names of the variables (parameters) to plot, as given by a character vector or a regular expression (if regex = TRUE). By default, a hopefully not too large selection of variables is plotted.

regex

Logical; Indicates whether variable should be treated as regular expressions. Defaults to FALSE.

use_alias

Logical. If more informative names for parameters are available (i.e. for beta coefficients b or for smoothing parameters rho), replace the uninformative names with the more informative alias. Defaults to TRUE

...

Additional arguments passed to the plotting functions. See MCMC-overview for more details.

Value

A ggplot object that can be further customized using the ggplot2 package.

See also

mvgam_draws for an overview of some of the shortcut strings that can be used for argument variable

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2,
             silent = 2)
mcmc_plot(mod)

mcmc_plot(mod, type = 'neff_hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mcmc_plot(mod, variable = 'betas', type = 'areas')

mcmc_plot(mod, variable = 'trend_params', type = 'combo')

# }
================================================ FILE: docs/reference/model.frame.mvgam.html ================================================ Extract model.frame from a fitted mvgam object — model.frame.mvgam • mvgam Skip to contents

Extract model.frame from a fitted mvgam object

Usage

# S3 method for class 'mvgam'
model.frame(formula, trend_effects = FALSE, ...)

# S3 method for class 'mvgam_prefit'
model.frame(formula, trend_effects = FALSE, ...)

Arguments

formula

a model formula or terms object or an R object.

trend_effects

logical, return the model.frame from the observation model (if FALSE) or from the underlying process model (ifTRUE)

...

Ignored

Value

A matrix containing the fitted model frame

Author

Nicholas J Clark

================================================ FILE: docs/reference/monotonic.html ================================================ Monotonic splines in mvgam models — monotonic • mvgam Skip to contents

Uses constructors from package splines2 to build monotonically increasing or decreasing splines. Details also in Wang & Yan (2021).

Usage

# S3 method for class 'moi.smooth.spec'
smooth.construct(object, data, knots)

# S3 method for class 'mod.smooth.spec'
smooth.construct(object, data, knots)

# S3 method for class 'moi.smooth'
Predict.matrix(object, data)

# S3 method for class 'mod.smooth'
Predict.matrix(object, data)

Arguments

object

A smooth specification object, usually generated by a term s(x, bs = "moi", ...) or s(x, bs = "mod", ...)

data

a list containing just the data (including any by variable) required by this term, with names corresponding to object$term (and object$by). The by variable is the last element.

knots

a list containing any knots supplied for basis setup — in same order and with same names as data. Can be NULL. See details for further information.

Value

An object of class "moi.smooth" or "mod.smooth". In addition to the usual elements of a smooth class documented under smooth.construct, this object will contain a slot called boundary that defines the endpoints beyond which the spline will begin extrapolating (extrapolation is flat due to the first order penalty placed on the smooth function)

Details

The constructor is not normally called directly, but is rather used internally by mvgam. If they are not supplied then the knots of the spline are placed evenly throughout the covariate values to which the term refers: For example, if fitting 101 data with an 11 knot spline of x then there would be a knot at every 10th (ordered) x value. The spline is an implementation of the closed-form I-spline basis based on the recursion formula given by Ramsay (1988), in which the basis coefficients must be constrained to either be non-negative (for monotonically increasing functions) or non-positive (monotonically decreasing)

Take note that when using either monotonic basis, the number of basis functions k must be supplied as an even integer due to the manner in which monotonic basis functions are constructed

Note

This constructor will result in a valid smooth if using a call to gam or bam, however the resulting functions will not be guaranteed to be monotonic because constraints on basis coefficients will not be enforced

References

Wang, Wenjie, and Jun Yan. "Shape-Restricted Regression Splines with R Package splines2." Journal of Data Science 19.3 (2021).

Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3(4), 425–441.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate data from a monotonically increasing function
set.seed(123123)
x <- runif(80) * 4 - 1
x <- sort(x)
f <- exp(4 * x) / (1 + exp(4 * x))
y <- f + rnorm(80) * 0.1
plot(x, y)


# A standard TRPS smooth doesn't capture monotonicity
library(mgcv)
#> Warning: package ‘mgcv’ was built under R version 4.3.3
#> Loading required package: nlme
#> Warning: package ‘nlme’ was built under R version 4.3.3
#> This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
#> 
#> Attaching package: ‘mgcv’
#> The following objects are masked from ‘package:mvgam’:
#> 
#>     betar, nb
mod_data <- data.frame(y = y, x = x)
mod <- gam(y ~ s(x, k = 16),
           data = mod_data,
           family = gaussian())

library(marginaleffects)
#> Warning: package ‘marginaleffects’ was built under R version 4.3.3
#> Please cite the software developers who make your work possible.
#> One package:             citation("package_name")
#> All project packages:    softbib::softbib()
plot_predictions(mod,
                 by = 'x',
                 newdata = data.frame(x = seq(min(x) - 0.5,
                                              max(x) + 0.5,
                                              length.out = 100)),
                 points = 0.5)


# Using the 'moi' basis in mvgam rectifies this
mod_data$time <- 1:NROW(mod_data)
mod2 <- mvgam(y ~ s(x, bs = 'moi', k = 18),
             data = mod_data,
             family = gaussian(),
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c66f12eaa.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

plot_predictions(mod2,
                 by = 'x',
                 newdata = data.frame(x = seq(min(x) - 0.5,
                                              max(x) + 0.5,
                                              length.out = 100)),
                 points = 0.5)


plot(mod2, type = 'smooth', realisations = TRUE)


# 'by' terms that produce a different smooth for each level of the 'by'
# factor are also allowed
set.seed(123123)
x <- runif(80) * 4 - 1
x <- sort(x)

# Two different monotonic smooths, one for each factor level
f <- exp(4 * x) / (1 + exp(4 * x))
f2 <- exp(3.5 * x) / (1 + exp(3 * x))
fac <- c(rep('a', 80), rep('b', 80))
y <- c(f + rnorm(80) * 0.1,
       f2 + rnorm(80) * 0.2)
plot(x, y[1:80])

plot(x, y[81:160])


# Gather all data into a data.frame, including the factor 'by' variable
mod_data <- data.frame(y, x, fac = as.factor(fac))
mod_data$time <- 1:NROW(mod_data)

# Fit a model with different smooths per factor level
mod <- mvgam(y ~ s(x, bs = 'moi', by = fac, k = 8),
             data = mod_data,
             family = gaussian(),
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c1ab9404d.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Visualise the different monotonic functions
plot_predictions(mod, condition = c('x', 'fac', 'fac'),
                 points = 0.5)

plot(mod, type = 'smooth', realisations = TRUE)


# First derivatives (on the link scale) should never be
# negative for either factor level
(derivs <- slopes(mod, variables = 'x',
                 by = c('x', 'fac'),
                 type = 'link'))
#> 
#>       x fac Estimate  2.5 % 97.5 %
#>  -0.987   a    0.338 0.1067  0.856
#>  -0.987   b    0.202 0.0298  0.690
#>  -0.841   a    0.520 0.3206  0.787
#>  -0.841   b    0.301 0.1358  0.584
#>  -0.796   a    0.557 0.3633  0.795
#> --- 150 rows omitted. See ?print.marginaleffects --- 
#>   2.853   b    3.029 1.2324  4.742
#>   2.870   a    0.227 0.0251  1.010
#>   2.870   b    3.165 1.0940  5.140
#>   2.879   a    0.229 0.0240  1.054
#>   2.879   b    3.236 0.9982  5.321
#> Term: x
#> Type:  link 
#> Comparison: dY/dX
#> 
all(derivs$estimate > 0)
#> [1] TRUE
# }
================================================ FILE: docs/reference/mvgam-class.html ================================================ Fitted mvgam object description — mvgam-class • mvgam Skip to contents

A fitted mvgam object returned by function mvgam. Run methods(class = "mvgam") to see an overview of available methods.

Details

A mvgam object contains the following elements:

  • call the original observation model formula

  • trend_call If a trend_formula was supplied, the original trend model formula is returned. Otherwise NULL

  • family character description of the observation distribution

  • trend_model character description of the latent trend model

  • trend_map data.frame describing the mapping of trend states to observations, if supplied in the original model. Otherwise NULL

  • drift Logical specifying whether a drift term was used in the trend model

  • priors If the model priors were updated from their defaults, the prior dataframe will be returned. Otherwise NULL

  • model_output The MCMC object returned by the fitting engine. If the model was fitted using Stan, this will be an object of class stanfit (see stanfit-class for details). If JAGS was used as the backend, this will be an object of class runjags (see runjags-class for details)

  • model_file The character string model file used to describe the model in either Stan or JAGS syntax

  • model_data If return_model_data was set to TRUE when fitting the model, the list object containing all data objects needed to condition the model is returned. Each item in the list is described in detail at the top of the model_file. Otherwise NULL

  • inits If return_model_data was set to TRUE when fitting the model, the initial value functions used to initialise the MCMC chains will be returned. Otherwise NULL

  • monitor_pars The parameters that were monitored during MCMC sampling are returned as a character vector

  • sp_names A character vector specifying the names for each smoothing parameter

  • mgcv_model An object of class gam containing the mgcv version of the observation model. This object is used for generating the linear predictor matrix when making predictions for new data. The coefficients in this model object will contain the posterior median coefficients from the GAM linear predictor, but these are only used if generating plots of smooth functions that mvgam currently cannot handle (such as plots for three-dimensional smooths). This model therefore should not be used for inference. See gamObject for details

  • trend_mgcv_model If a trend_formula was supplied, an object of class gam containing the mgcv version of the trend model. Otherwise NULL

  • ytimes The matrix object used in model fitting for indexing which series and timepoints were observed in each row of the supplied data. Used internally by some downstream plotting and prediction functions

  • resids A named list object containing posterior draws of Dunn-Smyth randomized quantile residuals

  • use_lv Logical flag indicating whether latent dynamic factors were used in the model

  • n_lv If use_lv == TRUE, the number of latent dynamic factors used in the model

  • upper_bounds If bounds were supplied in the original model fit, they will be returned. Otherwise NULL

  • obs_data The original data object (either a list or dataframe) supplied in model fitting.

  • test_data If test data were supplied (as argument newdata in the original model), it will be returned. Othwerise NULL

  • fit_engine Character describing the fit engine, either as stan or jags

  • backend Character describing the backend used for modelling, either as rstan, cmdstanr or rjags

  • algorithm Character describing the algorithm used for finding the posterior, either as sampling, laplace, pathfinder, meanfield or fullrank

  • max_treedepth If the model was fitted using Stan, the value supplied for the maximum treedepth tuning parameter is returned (see stan for details). Otherwise NULL

  • adapt_delta If the model was fitted using Stan, the value supplied for the adapt_delta tuning parameter is returned (see stan for details). Otherwise NULL

See also

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam-package.html ================================================ mvgam: Multivariate (Dynamic) Generalized Additive Models — mvgam-package • mvgam Skip to contents

Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2022) doi:10.1111/2041-210X.13974 .

Author

Maintainer: Nicholas J Clark nicholas.j.clark1214@gmail.com (ORCID)

Other contributors:

  • Sarah Heaps (ORCID) (VARMA parameterisations) [contributor]

  • Scott Pease (ORCID) (broom enhancements) [contributor]

  • Matthijs Hollanders (ORCID) (ggplot visualizations) [contributor]

================================================ FILE: docs/reference/mvgam.html ================================================ Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam • mvgam Skip to contents

This function estimates the posterior distribution for Generalised Additive Models (GAMs) that can include smooth spline functions, specified in the GAM formula, as well as latent temporal processes, specified by trend_model. Further modelling options include State-Space representations to allow covariates and dynamic processes to occur on the latent 'State' level while also capturing observation-level effects. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fits can easily be assessed and compared with posterior predictive checks, forecast comparisons and leave-one-out / leave-future-out cross-validation.

Usage

mvgam(
  formula,
  trend_formula,
  knots,
  trend_knots,
  trend_model = "None",
  noncentred = FALSE,
  family = poisson(),
  share_obs_params = FALSE,
  data,
  newdata,
  use_lv = FALSE,
  n_lv,
  trend_map,
  priors,
  run_model = TRUE,
  prior_simulation = FALSE,
  residuals = TRUE,
  return_model_data = FALSE,
  backend = getOption("brms.backend", "cmdstanr"),
  algorithm = getOption("brms.algorithm", "sampling"),
  control = list(max_treedepth = 10, adapt_delta = 0.8),
  chains = 4,
  burnin = 500,
  samples = 500,
  thin = 1,
  parallel = TRUE,
  threads = 1,
  save_all_pars = FALSE,
  silent = 1,
  autoformat = TRUE,
  refit = FALSE,
  lfo = FALSE,
  ...
)

Arguments

formula

A formula object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, s(), te(), ti(), t2(), as well as time-varying dynamic() terms, nonparametric gp() terms and offsets using offset(), can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In nmix() family models, the formula is used to set up a linear predictor for the detection probability. Details of the formula syntax used by mvgam can be found in mvgam_formulae

trend_formula

An optional formula object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. Should not have a response variable specified on the left-hand side of the formula (i.e. a valid option would be ~ season + s(year)). Also note that you should not use the identifier series in this formula to specify effects that vary across time series. Instead you should use trend. This will ensure that models in which a trend_map is supplied will still work consistently (i.e. by allowing effects to vary across process models, even when some time series share the same underlying process model). This feature is only currently available for RW(), AR() and VAR() trend models. In nmix() family models, the trend_formula is used to set up a linear predictor for the underlying latent abundance. Be aware that it can be very challenging to simultaneously estimate intercept parameters for both the observation mode (captured by formula) and the process model (captured by trend_formula). Users are recommended to drop one of these using the - 1 convention in the formula right hand side.

knots

An optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

trend_knots

As for knots above, this is an optional list of knot values for smooth functions within the trend_formula

trend_model

character or function specifying the time series dynamics for the latent trend. Options are:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • ZMVN or ZMVN() (Zero-Mean Multivariate Normal; only available in Stan)

  • 'RW' or RW()

  • 'AR1' or AR(p = 1)

  • 'AR2' or AR(p = 2)

  • 'AR3' or AR(p = 3)

  • 'CAR1' or CAR(p = 1)

  • 'VAR1' or VAR()(only available in Stan)

  • 'PWlogistic, 'PWlinear' or PW() (only available in Stan)

  • 'GP' or GP() (Gaussian Process with squared exponential kernel; only available in Stan)

For all trend types apart from ZMVN(), GP(), CAR() and PW(), moving average and/or correlated process error terms can also be estimated (for example, RW(cor = TRUE) will set up a multivariate Random Walk if n_series > 1). It is also possible for many multivariate trends to estimate hierarchical correlations if the data are structured among levels of a relevant grouping factor. See mvgam_trends for more details and see ZMVN for an example.

noncentred

logical Use the non-centred parameterisation for autoregressive trend models? Setting to TRUE will reparameterise the model to avoid possible degeneracies that can show up when estimating the latent dynamic random effects. For some models, this can produce big gains in efficiency, meaning that fewer burnin and sampling iterations are required for posterior exploration. But for other models, where the data are highly informative about the latent dynamic processes, this can actually lead to worse performance. Only available for certain trend models (i.e. RW(), AR(), or CAR(), or for trend = 'None' when using a trend_formula). Not yet available for moving average or correlated error models

family

family specifying the exponential observation family for the series. Currently supported families are:

  • gaussian() for real-valued data

  • betar() for proportional data on (0,1)

  • lognormal() for non-negative real-valued data

  • student_t() for real-valued data

  • Gamma() for non-negative real-valued data

  • bernoulli() for binary data

  • poisson() for count data

  • nb() for overdispersed count data

  • binomial() for count data with imperfect detection when the number of trials is known; note that the cbind() function must be used to bind the discrete observations and the discrete number of trials

  • beta_binomial() as for binomial() but allows for overdispersion

  • nmix() for count data with imperfect detection when the number of trials is unknown and should be modeled via a State-Space N-Mixture model. The latent states are Poisson, capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. See mvgam_families for an example of how to use this family

Default is poisson(). See mvgam_families for more details

share_obs_params

logical. If TRUE and the family has additional family-specific observation parameters (e.g. variance components in student_t() or gaussian(), or dispersion parameters in nb() or betar()), these parameters will be shared across all outcome variables. This is handy if you have multiple outcomes (time series in most mvgam models) that you believe share some properties, such as being from the same species over different spatial units. Default is FALSE.

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and optional trend_formula. Most models should include columns:

  • series (a factor index of the series IDs; the number of levels should be identical to the number of unique series labels (i.e. n_series = length(levels(data$series))))

  • time (numeric or integer index of the time point for each observation). For most dynamic trend types available in mvgam (see argument trend_model), time should be measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)). However you can use irregularly spaced intervals if using trend_model = CAR(1), though note that any temporal intervals that are exactly 0 will be adjusted to a very small number (1e-12) to prevent sampling errors. See an example of CAR() trends in CAR

Note however that there are special cases where these identifiers are not needed. For example, models with hierarchical temporal correlation processes (e.g. AR(gr = region, subgr = species)) should NOT include a series identifier, as this will be constructed internally (see mvgam_trends and AR for details). mvgam can also fit models that do not include a time variable if there are no temporal dynamic structures included (i.e. trend_model = 'None' or trend_model = ZMVN()). data should also include any other variables to be included in the linear predictor of formula

newdata

Optional dataframe or list of test data containing the same variables as in data. If included, the observations in variable y will be set to NA when fitting the model so that posterior simulations can be obtained

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for RW(), AR() and GP() trend models. Defaults to FALSE

n_lv

integer the number of latent dynamic factors to use if use_lv == TRUE. Cannot be > n_series. Defaults arbitrarily to min(2, floor(n_series / 2))

trend_map

Optional data.frame specifying which series should depend on which latent trends. Useful for allowing multiple series to depend on the same latent trend process, but with different observation processes. If supplied, a latent factor model is set up by setting use_lv = TRUE and using the mapping to set up the shared trends. Needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each series in the data (names should perfectly match factor levels of the series variable in data). Note that if this is supplied, the intercept parameter in the process model will NOT be automatically suppressed. Not yet supported for models in wich the latent factors evolve in continuous time (CAR()). See examples for details

priors

An optional data.frame with prior definitions or, preferentially, a vector containing objects of class brmsprior (see. prior for details). See get_mvgam_priors and Details' for more information on changing default prior distributions

run_model

logical. If FALSE, the model is not fitted but instead the function will return the model file and the data / initial values that are needed to fit the model outside of mvgam

prior_simulation

logical. If TRUE, no observations are fed to the model, and instead simulations from prior distributions are returned

residuals

Logical indicating whether to compute series-level randomized quantile residuals and include them as part of the returned object. Defaults to TRUE, but you can set to FALSE to save computational time and reduce the size of the returned object (users can always add residuals to an object of class mvgam using add_residuals)

return_model_data

logical. If TRUE, the list of data that is needed to fit the model is returned, along with the initial values for smooth and AR parameters, once the model is fitted. This will be helpful if users wish to modify the model file to add other stochastic elements that are not currently available in mvgam. Default is FALSE to reduce the size of the returned object, unless run_model == FALSE

backend

Character string naming the package to use as the backend for fitting the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally for the current R session via the "brms.backend" option (see options). Details on the rstan and cmdstanr packages are available at https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/, respectively

algorithm

Character string naming the estimation approach to use. Options are "sampling" for MCMC (the default), "meanfield" for variational inference with factorized normal distributions, "fullrank" for variational inference with a multivariate normal distribution, "laplace" for a Laplace approximation (only available when using cmdstanr as the backend) or "pathfinder" for the pathfinder algorithm (only currently available when using cmdstanr as the backend). Can be set globally for the current R session via the "brms.algorithm" option (see options). Limited testing suggests that "meanfield" performs best out of the non-MCMC approximations for dynamic GAMs, possibly because of the difficulties estimating covariances among the many spline parameters and latent trend parameters. But rigorous testing has not been carried out

control

A named list for controlling the sampler's behaviour. Valid elements include max_treedepth, adapt_delta and init

chains

integer specifying the number of parallel chains for the model. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

burnin

integer specifying the number of warmup iterations of the Markov chain to run to tune sampling algorithms. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

samples

integer specifying the number of post-warmup iterations of the Markov chain to run for sampling the posterior distribution

thin

Thinning interval for monitors. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

parallel

logical specifying whether multiple cores should be used for generating MCMC simulations in parallel. If TRUE, the number of cores to use will be min(c(chains, parallel::detectCores() - 1))

threads

integer Experimental option to use multithreading for within-chain parallelisation in Stan. We recommend its use only if you are experienced with Stan's reduce_sum function and have a slow running model that cannot be sped up by any other means. Currently works for all families apart from nmix() and when using Cmdstan as the backend

save_all_pars

Logical flag to indicate if draws from all variables defined in Stan's parameters block should be saved (default is FALSE).

silent

Verbosity level between 0 and 2. If 1 (the default), most of the informational messages of compiler and sampler are suppressed. If 2, even more messages are suppressed. The actual sampling progress is still printed. Set refresh = 0 to turn this off as well. If using backend = "rstan" you can also set open_progress = FALSE to prevent opening additional progress bars.

autoformat

Logical. Use the stanc parser to automatically format the Stan code and check for deprecations. Only for development purposes, so leave to TRUE

refit

Logical indicating whether this is a refit, called using update.mvgam. Users should leave as FALSE

lfo

Logical indicating whether this is part of a call to lfo_cv.mvgam. Returns a lighter version of the model with no residuals and fewer monitored parameters to speed up post-processing. But other downstream functions will not work properly, so users should always leave this set as FALSE

...

Further arguments passed to Stan. For backend = "rstan" the arguments are passed to sampling or vb. For backend = "cmdstanr" the arguments are passed to the cmdstanr::sample, cmdstanr::variational, cmdstanr::laplace or cmdstanr::pathfinder method

Value

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

Details

Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence but we do not want to rely on extrapolating from a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours). In addition, smooths can often try to wiggle excessively to capture any autocorrelation that is present in a time series, which exacerbates the problem of forecasting ahead. As GAMs are very naturally viewed through a Bayesian lens, and we often must model time series that show complex distributional features and missing data, parameters for mvgam models are estimated in a Bayesian framework using Markov Chain Monte Carlo by default. A general overview is provided in the primary vignettes: vignette("mvgam_overview") and vignette("data_in_mvgam"). For a full list of available vignettes see vignette(package = "mvgam")

Formula syntax: Details of the formula syntax used by mvgam can be found in mvgam_formulae. Note that it is possible to supply an empty formula where there are no predictors or intercepts in the observation model (i.e. y ~ 0 or y ~ -1). In this case, an intercept-only observation model will be set up but the intercept coefficient will be fixed at zero. This can be handy if you wish to fit pure State-Space models where the variation in the dynamic trend controls the average expectation, and/or where intercepts are non-identifiable (as in piecewise trends, see examples below)

Families and link functions: Details of families supported by mvgam can be found in mvgam_families.

Trend models: Details of latent error process models supported by mvgam can be found in mvgam_trends.

Priors: Default priors for intercepts and any scale parameters are generated using the same practice as brms. Prior distributions for most important model parameters can be altered by the user to inspect model sensitivities to given priors (see get_mvgam_priors for details). Note that latent trends are estimated on the link scale so choose priors accordingly. However more control over the model specification can be accomplished by first using mvgam as a baseline, then editing the returned model accordingly. The model file can be edited and run outside of mvgam by setting run_model = FALSE and this is encouraged for complex modelling tasks. Note, no priors are formally checked to ensure they are in the right syntax so it is up to the user to ensure these are correct

Random effects: For any smooth terms using the random effect basis (smooth.construct.re.smooth.spec), a non-centred parameterisation is automatically employed to avoid degeneracies that are common in hierarchical models. Note however that centred versions may perform better for series that are particularly informative, so as with any foray into Bayesian modelling, it is worth building an understanding of the model's assumptions and limitations by following a principled workflow. Also note that models are parameterised using drop.unused.levels = FALSE in jagam to ensure predictions can be made for all levels of the supplied factor variable

Observation level parameters: When more than one series is included in data and an observation family that contains more than one parameter is used, additional observation family parameters (i.e. phi for nb() or sigma for gaussian()) are by default estimated independently for each series. But if you wish for the series to share the same observation parameters, set share_obs_params = TRUE

Residuals: For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution

Using Stan: mvgam is primarily designed to use Hamiltonian Monte Carlo for parameter estimation via the software Stan (using either the cmdstanr or rstan interface). There are great advantages when using Stan over Gibbs / Metropolis Hastings samplers, which includes the option to estimate nonlinear effects via Hilbert space approximate Gaussian Processes, the availability of a variety of inference algorithms (i.e. variational inference, laplacian inference etc...) and capabilities to enforce stationarity for complex Vector Autoregressions. Because of the many advantages of Stan over JAGS, further development of the package will only be applied to Stan. This includes the planned addition of more response distributions, plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged to opt for Stan over JAGS in any proceeding workflows

How to start?: The mvgam cheatsheet is a good starting place if you are just learning to use the package. It gives an overview of the package's key functions and objects, as well as providing a reasonable workflow that new users can follow. In general it is recommended to

References

Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784.

Nicholas J Clark, SK Morgan Ernest, Henry Senyondo, Juniper Simonis, Ethan P White, Glenda M Yenni, KANK Karunarathna (2025). Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. 13:e18929 https://doi.org/10.7717/peerj.18929

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate a collection of three time series that have shared seasonal dynamics
# and independent AR1 trends, with a Poisson observation process
set.seed(0)
dat <- sim_mvgam(
  T = 80,
  n_series = 3,
  mu = 2,
  trend_model = AR(p = 1),
  prop_missing = 0.1,
  prop_trend = 0.6
)

# Plot key summary statistics for a single series
plot_mvgam_series(data = dat$data_train, series = 1)
#> Warning: Removed 5 rows containing non-finite outside the scale range (`stat_bin()`).


# Plot all series together
plot_mvgam_series(data = dat$data_train, series = "all")


# Formulate a model using Stan where series share a cyclic smooth for
# seasonality and each series has an independent AR1 temporal process.
# Note that 'noncentred = TRUE' will likely give performance gains.
# Set run_model = FALSE to inspect the returned objects
mod1 <- mvgam(
  formula = y ~ s(season, bs = "cc", k = 6),
  data = dat$data_train,
  trend_model = AR(),
  family = poisson(),
  noncentred = TRUE,
  run_model = FALSE
)

# View the model code in Stan language
stancode(mod1)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[4, 4] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // raw latent trends
#>   matrix[n, n_series] trend_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] += ar1[s] * trend[1 : (n - 1), s];
#>   }
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.9, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 5] ~ multi_normal_prec(zero[2 : 5], S1[1 : 4, 1 : 4] * lambda[1]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   to_vector(trend_raw) ~ std_normal();
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# View the data objects needed to fit the model in Stan
sdata1 <- standata(mod1)
str(sdata1)
#> List of 18
#>  $ y           : num [1:60, 1:3] 4 5 7 39 51 26 6 6 4 2 ...
#>  $ n           : int 60
#>  $ X           : num [1:180, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:5] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:4, 1:4] 1.244 -0.397 0.384 0.619 -0.397 ...
#>  $ zero        : num [1:5] 0 0 0 0 0
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 0.853
#>  $ ytimes      : int [1:60, 1:3] 1 4 7 10 13 16 19 22 25 28 ...
#>  $ n_series    : int 3
#>  $ sp          : Named num 0.368
#>   ..- attr(*, "names")= chr "s(season)"
#>  $ y_observed  : num [1:60, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ total_obs   : int 180
#>  $ num_basis   : int 5
#>  $ n_sp        : num 1
#>  $ n_nonmissing: int 164
#>  $ obs_ind     : int [1:164] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ flat_ys     : num [1:164] 4 5 7 39 51 26 6 6 4 2 ...
#>  $ flat_xs     : num [1:164, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:5] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "AR1"

# Now fit the model
mod1 <- mvgam(
  formula = y ~ s(season, bs = "cc", k = 6),
  data = dat$data_train,
  trend_model = AR(),
  family = poisson(),
  noncentred = TRUE,
  chains = 2,
  silent = 2
)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp0I0R3Y/model-2718194a6f1d.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Extract the model summary
summary(mod1)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 6)
#> <environment: 0x000001b088c85638>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 60 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50% 97.5% Rhat n_eff
#> (Intercept)  1.900  2.00  2.10 1.00   794
#> s(season).1  0.065  0.31  0.55 1.00   468
#> s(season).2  0.580  0.83  1.10 1.01   356
#> s(season).3 -0.058  0.18  0.39 1.00   555
#> s(season).4 -0.680 -0.43 -0.17 1.00   659
#> 
#> Approximate significance of GAM smooths:
#>            edf Ref.df Chi.sq p-value    
#> s(season) 3.79      4   36.5  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Latent trend parameter AR estimates:
#>           2.5%   50% 97.5% Rhat n_eff
#> ar1[1]    0.30  0.73 0.980 1.01   282
#> ar1[2]   -0.95 -0.43 0.057 1.01   122
#> ar1[3]    0.18  0.69 0.980 1.00   296
#> sigma[1]  0.42  0.56 0.760 1.01   323
#> sigma[2]  0.33  0.49 0.690 1.01   229
#> sigma[3]  0.38  0.51 0.720 1.01   354
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 10:06:24 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod1) to get started describing this model

# Plot the estimated historical trend and forecast for one series
plot(mod1, type = "trend", series = 1)

plot(mod1, type = "forecast", series = 1)


# Residual diagnostics
plot(mod1, type = "residuals", series = 1)

resids <- residuals(mod1)
str(resids)
#>  num [1:180, 1:4] -0.144 NaN -0.143 0.284 -0.819 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"

# Fitted values and residuals can also be added to training data
augment(mod1)
#> # A tibble: 180 × 14
#>        y season  year series    time .observed .fitted .fit.variability
#>    <int>  <int> <int> <fct>    <int>     <int>   <dbl>            <dbl>
#>  1     4      1     1 series_1     1         4    4.64             1.52
#>  2    NA      1     1 series_2     1        NA    7.04             4.29
#>  3     4      1     1 series_3     1         4    4.62             1.60
#>  4     5      2     1 series_1     2         5    4.73             1.70
#>  5     2      2     1 series_2     2         2    3.79             1.44
#>  6    NA      2     1 series_3     2        NA    5.32             3.15
#>  7     7      3     1 series_1     3         7    8.56             2.53
#>  8    12      3     1 series_2     3        12   11.5              3.08
#>  9     4      3     1 series_3     3         4    5.35             1.93
#> 10    39      4     1 series_1     4        39   36.3              5.93
#> # ℹ 170 more rows
#> # ℹ 6 more variables: .fit.cred.low <dbl>, .fit.cred.high <dbl>, .resid <dbl>,
#> #   .resid.variability <dbl>, .resid.cred.low <dbl>, .resid.cred.high <dbl>

# Compute the forecast using covariate information in data_test
fc <- forecast(mod1, newdata = dat$data_test)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x000001b088c85638> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:60] 4 5 7 39 51 26 6 6 4 2 ...
#>   ..$ series_2: int [1:60] NA 2 12 16 6 31 9 15 5 3 ...
#>   ..$ series_3: int [1:60] 4 NA 4 NA NA 16 7 7 3 NA ...
#>  $ train_times       : int [1:60] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:20] 1 NA NA 13 18 20 16 6 NA 4 ...
#>   ..$ series_2: int [1:20] 4 36 8 6 7 NA NA 1 6 4 ...
#>   ..$ series_3: int [1:20] 6 8 5 5 19 14 1 1 7 0 ...
#>  $ test_times        : int [1:20] 61 62 63 64 65 66 67 68 69 70 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:60] 4 1 4 4 4 3 5 3 6 8 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:60] 1 9 4 2 4 1 14 3 4 12 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:60] 10 4 4 6 6 3 5 6 1 7 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: int [1:1000, 1:20] 2 1 3 2 17 2 0 1 4 7 ...
#>   ..$ series_2: int [1:1000, 1:20] 2 5 6 1 2 6 7 4 7 9 ...
#>   ..$ series_3: int [1:1000, 1:20] 8 7 6 16 10 18 3 10 2 2 ...
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(fc)
#> Out of sample DRPS:
#> 57.089579
#> Warning: Removed 8 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# Plot the estimated seasonal smooth function
plot(mod1, type = "smooths")


# Plot estimated first derivatives of the smooth
plot(mod1, type = "smooths", derivatives = TRUE)


# Plot partial residuals of the smooth
plot(mod1, type = "smooths", residuals = TRUE)


# Plot posterior realisations for the smooth
plot(mod1, type = "smooths", realisations = TRUE)


# Plot conditional response predictions using marginaleffects
conditional_effects(mod1)

plot_predictions(mod1, condition = "season", points = 0.5)
#> Warning: Removed 16 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# Generate posterior predictive checks using bayesplot
pp_check(mod1)
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.
#> Warning: NA responses are not shown in 'pp_check'.


# Extract observation model beta coefficient draws as a data.frame
beta_draws_df <- as.data.frame(mod1, variable = "betas")
head(beta_draws_df)
#>   (Intercept) s(season).1 s(season).2 s(season).3 s(season).4
#> 1     1.85730    0.335897    0.924472    0.106442   -0.378858
#> 2     1.98551    0.336287    0.750726    0.207395   -0.388888
#> 3     1.97766    0.339708    0.822352    0.267692   -0.366780
#> 4     1.97217    0.248420    0.680271    0.323205   -0.188949
#> 5     2.05115    0.216980    0.598978    0.259691   -0.228897
#> 6     2.02619    0.312091    0.772603    0.303855   -0.262500
str(beta_draws_df)
#> 'data.frame':	1000 obs. of  5 variables:
#>  $ (Intercept): num  1.86 1.99 1.98 1.97 2.05 ...
#>  $ s(season).1: num  0.336 0.336 0.34 0.248 0.217 ...
#>  $ s(season).2: num  0.924 0.751 0.822 0.68 0.599 ...
#>  $ s(season).3: num  0.106 0.207 0.268 0.323 0.26 ...
#>  $ s(season).4: num  -0.379 -0.389 -0.367 -0.189 -0.229 ...

# Investigate model fit
mc.cores.def <- getOption("mc.cores")
options(mc.cores = 1)
loo(mod1)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> 
#> Computed from 1000 by 164 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -459.2  8.8
#> p_loo        89.3  4.9
#> looic       918.4 17.7
#> ------
#> MCSE of elpd_loo is NA.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.9]).
#> 
#> Pareto k diagnostic values:
#>                           Count Pct.    Min. ESS
#> (-Inf, 0.67]   (good)     68    41.5%   75      
#>    (0.67, 1]   (bad)      82    50.0%   <NA>    
#>     (1, Inf)   (very bad) 14     8.5%   <NA>    
#> See help('pareto-k-diagnostic') for details.
options(mc.cores = mc.cores.def)


# Example of supplying a trend_map so that some series can share
# latent trend processes
sim <- sim_mvgam(n_series = 3)
mod_data <- sim$data_train

# Here, we specify only two latent trends; series 1 and 2 share a trend,
# while series 3 has it's own unique latent trend
trend_map <- data.frame(
  series = unique(mod_data$series),
  trend = c(1, 1, 2)
)

# Fit the model using AR1 trends
mod <- mvgam(
  formula = y ~ s(season, bs = "cc", k = 6),
  trend_map = trend_map,
  trend_model = AR(),
  data = mod_data,
  return_model_data = TRUE,
  chains = 2,
  silent = 2
)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp0I0R3Y/model-27182b606fd4.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# The mapping matrix is now supplied as data to the model in the 'Z' element
mod$model_data$Z
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1
code(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent trends
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[4, 4] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent factor SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // latent factor AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // dynamic factors
#>   matrix[n, n_lv] LV;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // trends and dynamic factor loading matrix
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // derived latent trends
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 0, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 5] ~ multi_normal_prec(zero[2 : 5], S1[1 : 4, 1 : 4] * lambda[1]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for factor SD parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // dynamic factor estimates
#>   LV[1, 1 : n_lv] ~ normal(0, sigma);
#>   for (j in 1 : n_lv) {
#>     LV[2 : n, j] ~ normal(ar1[j] * LV[1 : (n - 1), j], sigma[j]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   penalty = 1.0 / (sigma .* sigma);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# The first two series share an identical latent trend; the third is different
plot(mod, type = "trend", series = 1)

plot(mod, type = "trend", series = 2)

plot(mod, type = "trend", series = 3)



# Example of how to use dynamic coefficients
# Simulate a time-varying coefficient for the effect of temperature
set.seed(123)
N <- 200
beta_temp <- vector(length = N)
beta_temp[1] <- 0.4
for (i in 2:N) {
  beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05)
}
plot(beta_temp)


# Simulate a covariate called 'temp'
temp <- rnorm(N, sd = 1)

# Simulate some noisy Gaussian observations
out <- rnorm(N,
  mean = 4 + beta_temp * temp,
  sd = 0.5
)

# Gather necessary data into a data.frame; split into training / testing
data <- data.frame(out, temp, time = seq_along(temp))
data_train <- data[1:180, ]
data_test <- data[181:200, ]

# Fit the model using the dynamic() formula helper
mod <- mvgam(
  formula =
    out ~ dynamic(temp,
      scale = FALSE,
      k = 40
    ),
  family = gaussian(),
  data = data_train,
  newdata = data_test,
  chains = 2,
  silent = 2
)
#> Warning: gp effects in mvgam cannot yet handle autogrouping
#> resetting all instances of 'gr = TRUE' to 'gr = FALSE'
#> This warning is displayed once per session.
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp0I0R3Y/model-271861f12de.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Inspect the model summary, forecast and time-varying coefficient distribution
summary(mod)
#> GAM formula:
#> out ~ gp(time, by = temp, c = 5/4, k = 40, scale = FALSE)
#> <environment: 0x000001b088c85638>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 200 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.44 0.49  0.55    1  1291
#> 
#> GAM coefficient (beta) estimates:
#>                    2.5%      50% 97.5% Rhat n_eff
#> (Intercept)       4.000  4.0e+00 4.100 1.00  1466
#> gp(time):temp.1   0.900  3.2e+00 6.200 1.00   761
#> gp(time):temp.2  -2.700  1.4e+00 5.600 1.00   649
#> gp(time):temp.3  -5.300 -1.6e+00 3.400 1.00   739
#> gp(time):temp.4  -5.200 -1.2e+00 1.900 1.00   759
#> gp(time):temp.5  -2.300  4.8e-01 3.500 1.00   937
#> gp(time):temp.6  -2.200  1.8e-01 3.500 1.00   871
#> gp(time):temp.7  -3.100 -3.6e-01 2.100 1.00  1152
#> gp(time):temp.8  -1.800  2.2e-01 2.700 1.00   951
#> gp(time):temp.9  -1.400  2.8e-01 2.500 1.00   969
#> gp(time):temp.10 -2.400 -3.3e-01 1.200 1.00   814
#> gp(time):temp.11 -2.100 -6.9e-02 1.200 1.00   967
#> gp(time):temp.12 -0.960  1.5e-01 2.200 1.00   898
#> gp(time):temp.13 -1.400  1.3e-09 1.300 1.00  1036
#> gp(time):temp.14 -1.600 -8.0e-03 0.680 1.00  1112
#> gp(time):temp.15 -0.960  4.7e-11 1.200 1.00   793
#> gp(time):temp.16 -1.000  1.1e-06 1.000 1.00  1006
#> gp(time):temp.17 -0.710  5.7e-08 0.860 1.00   932
#> gp(time):temp.18 -0.840 -3.5e-14 0.870 1.00   697
#> gp(time):temp.19 -1.100 -3.2e-07 0.410 1.00  1005
#> gp(time):temp.20 -0.760  3.6e-20 0.580 1.00  1316
#> gp(time):temp.21 -0.320  1.7e-06 0.950 1.00   754
#> gp(time):temp.22 -0.670 -3.0e-14 0.340 1.01   533
#> gp(time):temp.23 -0.700 -7.1e-13 0.310 1.00   725
#> gp(time):temp.24 -0.270  1.2e-12 0.650 1.01   723
#> gp(time):temp.25 -0.340 -4.2e-27 0.300 1.00   922
#> gp(time):temp.26 -0.830 -2.1e-21 0.150 1.00   509
#> gp(time):temp.27 -0.240  1.6e-25 0.390 1.00   869
#> gp(time):temp.28 -0.190  6.4e-21 0.500 1.00   809
#> gp(time):temp.29 -0.240 -1.6e-50 0.270 1.00  1141
#> gp(time):temp.30 -0.230 -4.1e-41 0.220 1.00   847
#> gp(time):temp.31 -0.190  1.9e-57 0.290 1.00  1254
#> gp(time):temp.32 -0.160 -8.5e-55 0.160 1.00   987
#> gp(time):temp.33 -0.150 -3.2e-91 0.130 1.00  1181
#> gp(time):temp.34 -0.150 -1.9e-62 0.120 1.00   867
#> gp(time):temp.35 -0.095 -1.4e-84 0.160 1.00   779
#> gp(time):temp.36 -0.230 -1.9e-78 0.061 1.00   745
#> gp(time):temp.37 -0.240  4.9e-77 0.037 1.00   383
#> gp(time):temp.38 -0.051  4.6e-81 0.160 1.00   290
#> gp(time):temp.39 -0.045  1.7e-81 0.097 1.00   685
#> gp(time):temp.40 -0.083 -5.3e-90 0.041 1.00   574
#> 
#> GAM gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                     2.5%   50%  97.5% Rhat n_eff
#> alpha_gp(time):temp 0.17  0.33   0.73 1.00   360
#> rho_gp(time):temp   9.10 30.00 100.00 1.01   148
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 39 of 1000 iterations ended with a divergence (3.9%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 10:07:50 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
plot(mod, type = "smooths")

fc <- forecast(mod, newdata = data_test)
plot(fc)
#> Out of sample CRPS:
#> 6.32777364953055


# Propagating the smooth term shows how the coefficient is expected to evolve
plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 180, lty = "dashed", lwd = 2)
points(beta_temp, pch = 16)



# Example showing how to incorporate an offset; simulate some count data
# with different means per series
set.seed(100)
dat <- sim_mvgam(
  prop_trend = 0, mu = c(0, 2, 2),
  seasonality = "hierarchical"
)

# Add offset terms to the training and testing data
dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series)
dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series)

# Fit a model that includes the offset in the linear predictor as well as
# hierarchical seasonal smooths
mod <- mvgam(
  formula = y ~ offset(offset) +
    s(series, bs = "re") +
    s(season, bs = "cc") +
    s(season, by = series, m = 1, k = 5),
  data = dat$data_train,
  chains = 2,
  silent = 2
)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp0I0R3Y/model-2718797d1653.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |
#>                                 ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/
#> tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Inspect the model file to see the modification to the linear predictor
# (eta)
code(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   vector[total_obs] off_set; // offset vector
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   matrix[4, 4] S2; // mgcv smooth penalty matrix S2
#>   matrix[4, 4] S3; // mgcv smooth penalty matrix S3
#>   matrix[4, 4] S4; // mgcv smooth penalty matrix S4
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 21] = b_raw[1 : 21];
#>   b[22 : 24] = mu_raw[1] + b_raw[22 : 24] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.6, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior for s(season):seriesseries_1...
#>   b_raw[10 : 13] ~ multi_normal_prec(zero[10 : 13],
#>                                      S2[1 : 4, 1 : 4] * lambda[2]);
#>   
#>   // prior for s(season):seriesseries_2...
#>   b_raw[14 : 17] ~ multi_normal_prec(zero[14 : 17],
#>                                      S3[1 : 4, 1 : 4] * lambda[3]);
#>   
#>   // prior for s(season):seriesseries_3...
#>   b_raw[18 : 21] ~ multi_normal_prec(zero[18 : 21],
#>                                      S4[1 : 4, 1 : 4] * lambda[4]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[22 : 24] ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, off_set[obs_ind], b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b + off_set;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# Forecasts for the first two series will differ in magnitude
fc <- forecast(mod, newdata = dat$data_test)
layout(matrix(1:2, ncol = 2))
plot(fc, series = 1, ylim = c(0, 75))
#> Out of sample DRPS:
#> 26.74677

plot(fc, series = 2, ylim = c(0, 75))
#> Out of sample DRPS:
#> 101.419478

layout(1)

# Changing the offset for the testing data should lead to changes in
# the forecast
dat$data_test$offset <- dat$data_test$offset - 2
fc <- forecast(mod, newdata = dat$data_test)
plot(fc)
#> Out of sample DRPS:
#> 41.478396


# Relative Risks can be computed by fixing the offset to the same value
# for each series
dat$data_test$offset <- rep(1, NROW(dat$data_test))
preds_rr <- predict(mod,
  type = "link", newdata = dat$data_test,
  summary = FALSE
)
series1_inds <- which(dat$data_test$series == "series_1")
series2_inds <- which(dat$data_test$series == "series_2")

# Relative Risks are now more comparable among series
layout(matrix(1:2, ncol = 2))
plot(preds_rr[1, series1_inds],
  type = "l", col = "grey75",
  ylim = range(preds_rr),
  ylab = "Series1 Relative Risk", xlab = "Time"
)
for (i in 2:50) {
  lines(preds_rr[i, series1_inds], col = "grey75")
}

plot(preds_rr[1, series2_inds],
  type = "l", col = "darkred",
  ylim = range(preds_rr),
  ylab = "Series2 Relative Risk", xlab = "Time"
)
for (i in 2:50) {
  lines(preds_rr[i, series2_inds], col = "darkred")
}

layout(1)


# Example showcasing how cbind() is needed for Binomial observations
# Simulate two time series of Binomial trials
trials <- sample(c(20:25), 50, replace = TRUE)
x <- rnorm(50)
detprob1 <- plogis(-0.5 + 0.9 * x)
detprob2 <- plogis(-0.1 - 0.7 * x)
dat <- rbind(
  data.frame(
    y = rbinom(n = 50, size = trials, prob = detprob1),
    time = 1:50,
    series = "series1",
    x = x,
    ntrials = trials
  ),
  data.frame(
    y = rbinom(n = 50, size = trials, prob = detprob2),
    time = 1:50,
    series = "series2",
    x = x,
    ntrials = trials
  )
)
dat <- dplyr::mutate(dat, series = as.factor(series))
dat <- dplyr::arrange(dat, time, series)
plot_mvgam_series(data = dat, series = "all")


# Fit a model using the binomial() family; must specify observations
# and number of trials in the cbind() wrapper
mod <- mvgam(
  formula =
    cbind(y, ntrials) ~ series + s(x, by = series),
  family = binomial(),
  data = dat,
  chains = 2,
  silent = 2
)
#> Warning: Binomial and Beta-binomial families require cbind(n_successes, n_trials)
#> in the formula left-hand side. Do not use cbind(n_successes, n_failures)!
#> This warning is displayed once per session.
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp0I0R3Y/model-2718134b5c27.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                       
#>        \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tb
#> b_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
summary(mod)
#> GAM formula:
#> cbind(y, ntrials) ~ series + s(x, by = series)
#> <environment: 0x000001b088c85638>
#> 
#> Family:
#> binomial
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 2 
#> 
#> N timepoints:
#> 50 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                        2.5%      50%  97.5% Rhat n_eff
#> (Intercept)          -0.430 -0.31000 -0.170 1.00   535
#> seriesseries2        -0.076  0.10000  0.280 1.01   490
#> s(x):seriesseries1.1 -0.100  0.05900  0.760 1.03    50
#> s(x):seriesseries1.2 -0.320 -0.00250  0.370 1.00   330
#> s(x):seriesseries1.3 -0.250 -0.01000  0.100 1.00   223
#> s(x):seriesseries1.4 -0.190  0.00190  0.220 1.00   268
#> s(x):seriesseries1.5 -0.110 -0.00270  0.050 1.01   246
#> s(x):seriesseries1.6 -0.150  0.00440  0.210 1.00   323
#> s(x):seriesseries1.7 -0.130 -0.00470  0.089 1.00   381
#> s(x):seriesseries1.8 -0.820 -0.00910  0.690 1.00   315
#> s(x):seriesseries1.9  0.034  0.86000  1.100 1.03    56
#> s(x):seriesseries2.1 -0.330 -0.02300  0.130 1.02   103
#> s(x):seriesseries2.2 -0.230 -0.01100  0.130 1.01   292
#> s(x):seriesseries2.3 -0.080 -0.00220  0.077 1.00   530
#> s(x):seriesseries2.4 -0.120 -0.00140  0.100 1.00   558
#> s(x):seriesseries2.5 -0.037  0.00033  0.039 1.00   352
#> s(x):seriesseries2.6 -0.083  0.00360  0.120 1.00   613
#> s(x):seriesseries2.7 -0.056 -0.00110  0.054 1.00   714
#> s(x):seriesseries2.8 -0.480 -0.01300  0.370 1.00   548
#> s(x):seriesseries2.9 -0.840 -0.63000 -0.260 1.01   126
#> 
#> Approximate significance of GAM smooths:
#>                      edf Ref.df Chi.sq p-value    
#> s(x):seriesseries1 3.242      9   48.5  <2e-16 ***
#> s(x):seriesseries2 0.985      9   23.3    0.35    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 10:09:03 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
pp_check(mod,
  type = "bars_grouped",
  group = "series", ndraws = 50
)

pp_check(mod,
  type = "ecdf_overlay_grouped",
  group = "series", ndraws = 50
)

conditional_effects(mod, type = "link")


# }
================================================ FILE: docs/reference/mvgam_diagnostics.html ================================================ Extract diagnostic quantities of mvgam models — mvgam_diagnostics • mvgam Skip to contents

Extract quantities that can be used to diagnose sampling behavior of the algorithms applied by Stan at the back-end of mvgam.

Usage

# S3 method for class 'mvgam'
nuts_params(object, pars = NULL, ...)

# S3 method for class 'mvgam'
log_posterior(object, ...)

# S3 method for class 'mvgam'
rhat(x, pars = NULL, ...)

# S3 method for class 'mvgam'
neff_ratio(object, pars = NULL, ...)

Arguments

object, x

A mvgam or jsdgam object.

pars

An optional character vector of parameter names. For nuts_params these will be NUTS sampler parameter names rather than model parameters. If pars is omitted all parameters are included.

...

Arguments passed to individual methods.

Value

The exact form of the output depends on the method.

Details

For more details see bayesplot-extractors.

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1')
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
            trend_model = AR(),
            noncentred = TRUE,
            data = simdat$data_train,
            chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.3 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 0.3 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.3 seconds.
#> Total execution time: 0.5 seconds.
#> 
np <- nuts_params(mod)
head(np)
#>   Chain Iteration     Parameter    Value
#> 1     1         1 accept_stat__ 1.000000
#> 2     1         2 accept_stat__ 0.992268
#> 3     1         3 accept_stat__ 0.777018
#> 4     1         4 accept_stat__ 0.989895
#> 5     1         5 accept_stat__ 0.854329
#> 6     1         6 accept_stat__ 0.928752

# extract the number of divergence transitions
sum(subset(np, Parameter == "divergent__")$Value)
#> [1] 0

head(neff_ratio(mod))
#>  mus[1,1]  mus[2,1]  mus[3,1]  mus[4,1]  mus[5,1]  mus[6,1] 
#> 0.7872905 0.6691465 0.7611481 0.5512319 0.7750102 0.7717447 
# }
================================================ FILE: docs/reference/mvgam_draws.html ================================================ Extract posterior draws from fitted mvgam objects — mvgam_draws • mvgam Skip to contents

Extract posterior draws in conventional formats as data.frames, matrices, or arrays.

Usage

# S3 method for class 'mvgam'
as.data.frame(
  x,
  row.names = NULL,
  optional = TRUE,
  variable = "betas",
  use_alias = TRUE,
  regex = FALSE,
  ...
)

# S3 method for class 'mvgam'
as.matrix(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...)

# S3 method for class 'mvgam'
as.array(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...)

# S3 method for class 'mvgam'
as_draws(
  x,
  variable = NULL,
  regex = FALSE,
  inc_warmup = FALSE,
  use_alias = TRUE,
  ...
)

# S3 method for class 'mvgam'
as_draws_matrix(
  x,
  variable = NULL,
  regex = FALSE,
  inc_warmup = FALSE,
  use_alias = TRUE,
  ...
)

# S3 method for class 'mvgam'
as_draws_df(
  x,
  variable = NULL,
  regex = FALSE,
  inc_warmup = FALSE,
  use_alias = TRUE,
  ...
)

# S3 method for class 'mvgam'
as_draws_array(
  x,
  variable = NULL,
  regex = FALSE,
  inc_warmup = FALSE,
  use_alias = TRUE,
  ...
)

# S3 method for class 'mvgam'
as_draws_list(
  x,
  variable = NULL,
  regex = FALSE,
  inc_warmup = FALSE,
  use_alias = TRUE,
  ...
)

# S3 method for class 'mvgam'
as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)

Arguments

x

list object of class mvgam

row.names

Ignored

optional

Ignored

variable

A character specifying which parameters to extract. Can either be one of the following options:

  • obs_params (other parameters specific to the observation model, such as overdispsersions for negative binomial models or observation error SD for gaussian / student-t models)

  • betas (beta coefficients from the GAM observation model linear predictor; default)

  • smooth_params (smoothing parameters from the GAM observation model)

  • linpreds (estimated linear predictors on whatever link scale was used in the model)

  • trend_params (parameters governing the trend dynamics, such as AR parameters, trend SD parameters or Gaussian Process parameters)

  • trend_betas (beta coefficients from the GAM latent process model linear predictor; only available if a trend_formula was supplied in the original model)

  • trend_smooth_params (process model GAM smoothing parameters; only available if a trend_formula was supplied in the original model)

  • trend_linpreds (process model linear predictors on the identity scale; only available if a trend_formula was supplied in the original model)

OR can be a character vector providing the variables to extract

use_alias

Logical. If more informative names for parameters are available (i.e. for beta coefficients b or for smoothing parameters rho), replace the uninformative names with the more informative alias. Defaults to TRUE

regex

Logical. If not using one of the prespecified options for extractions, should variable be treated as a (vector of) regular expressions? Any variable in x matching at least one of the regular expressions will be selected. Defaults to FALSE.

...

Ignored

inc_warmup

Should warmup draws be included? Defaults to FALSE.

Value

A data.frame, matrix, or array containing the posterior draws.

Examples

# \donttest{
sim <- sim_mvgam(family = Gamma())
mod1 <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = 'AR1',
             data = sim$data_train,
             family = Gamma(),
             chains = 2,
             silent = 2)
#> Warning: Supplying trend_model as a character string is deprecated
#> Please use the dedicated functions (i.e. RW() or ZMVN()) instead
#> This warning is displayed once per session.
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c33d43f8.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
beta_draws_df <- as.data.frame(mod1, variable = 'betas')
head(beta_draws_df)
#>   (Intercept) s(season).1 s(season).2 s(season).3 s(season).4 s(season).5 s(season).6 s(season).7 s(season).8
#> 1    0.319726   -0.972376   -1.148710   -0.734276    0.462118     1.59223    0.934666    0.399375    0.176647
#> 2    0.180193   -1.616300   -1.036090   -0.768414    0.655170     1.20669    0.844760    0.243960    0.868440
#> 3    0.304901   -1.073380   -1.322890   -0.503359    0.964529     1.30952    1.059640    0.516082    0.115212
#> 4    0.268791   -1.224270   -0.946018   -0.636852    0.379731     1.22659    0.803573    0.351586    0.719370
#> 5    0.245721   -0.968587   -1.256310   -0.893110    0.591861     1.64159    1.097970    0.255019    0.607658
#> 6    0.135226   -1.155560   -1.011880   -0.336560    0.623057     1.02631    0.844620    0.643441    0.583991
str(beta_draws_df)
#> 'data.frame':	1000 obs. of  9 variables:
#>  $ (Intercept): num  0.32 0.18 0.305 0.269 0.246 ...
#>  $ s(season).1: num  -0.972 -1.616 -1.073 -1.224 -0.969 ...
#>  $ s(season).2: num  -1.149 -1.036 -1.323 -0.946 -1.256 ...
#>  $ s(season).3: num  -0.734 -0.768 -0.503 -0.637 -0.893 ...
#>  $ s(season).4: num  0.462 0.655 0.965 0.38 0.592 ...
#>  $ s(season).5: num  1.59 1.21 1.31 1.23 1.64 ...
#>  $ s(season).6: num  0.935 0.845 1.06 0.804 1.098 ...
#>  $ s(season).7: num  0.399 0.244 0.516 0.352 0.255 ...
#>  $ s(season).8: num  0.177 0.868 0.115 0.719 0.608 ...

beta_draws_mat <- as.matrix(mod1, variable = 'betas')
head(beta_draws_mat)
#>     variable
#> draw (Intercept) s(season).1 s(season).2 s(season).3 s(season).4 s(season).5 s(season).6 s(season).7
#>    1    0.319726   -0.972376   -1.148710   -0.734276    0.462118     1.59223    0.934666    0.399375
#>    2    0.180193   -1.616300   -1.036090   -0.768414    0.655170     1.20669    0.844760    0.243960
#>    3    0.304901   -1.073380   -1.322890   -0.503359    0.964529     1.30952    1.059640    0.516082
#>    4    0.268791   -1.224270   -0.946018   -0.636852    0.379731     1.22659    0.803573    0.351586
#>    5    0.245721   -0.968587   -1.256310   -0.893110    0.591861     1.64159    1.097970    0.255019
#>    6    0.135226   -1.155560   -1.011880   -0.336560    0.623057     1.02631    0.844620    0.643441
#>     variable
#> draw s(season).8
#>    1    0.176647
#>    2    0.868440
#>    3    0.115212
#>    4    0.719370
#>    5    0.607658
#>    6    0.583991
str(beta_draws_mat)
#>  num [1:1000, 1:9] 0.32 0.18 0.305 0.269 0.246 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ draw    : chr [1:1000] "1" "2" "3" "4" ...
#>   ..$ variable: chr [1:9] "(Intercept)" "s(season).1" "s(season).2" "s(season).3" ...
#>  - attr(*, "nchains")= int 2

shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE)
head(shape_pars)# }
#>     variable
#> draw shape[1] shape[2] shape[3]
#>    1 1.365670 0.973384 1.237680
#>    2 0.869383 1.512070 0.948521
#>    3 0.790533 1.020250 1.035100
#>    4 0.957937 1.139440 1.354040
#>    5 0.981846 0.952409 1.087680
#>    6 0.967652 1.438820 1.650290
================================================ FILE: docs/reference/mvgam_families.html ================================================ Supported mvgam families — mvgam_families • mvgam Skip to contents

Supported mvgam families

Usage

tweedie(link = "log")

student_t(link = "identity")

betar(...)

nb(...)

lognormal(...)

student(...)

bernoulli(...)

beta_binomial(...)

nmix(link = "log")

Arguments

link

a specification for the family link function. At present these cannot be changed

...

Arguments to be passed to the mgcv version of the associated functions

Value

Objects of class family

Details

mvgam currently supports the following standard observation families:

  • gaussian with identity link, for real-valued data

  • poisson with log-link, for count data

  • Gamma with log-link, for non-negative real-valued data

  • binomial with logit-link, for count data when the number of trials is known (and must be supplied)

In addition, the following extended families from the mgcv and brms packages are supported:

  • betar with logit-link, for proportional data on (0,1)

  • nb with log-link, for count data

  • lognormal with identity-link, for non-negative real-valued data

  • bernoulli with logit-link, for binary data

  • beta_binomial with logit-link, as for binomial() but allows for overdispersion

Finally, mvgam supports the three extended families described here:

  • tweedie with log-link, for count data (power parameter p fixed at 1.5)

  • student_t() (or student) with identity-link, for real-valued data

  • nmix for count data with imperfect detection modeled via a State-Space N-Mixture model. The latent states are Poisson (with log link), capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. The observation formula in these models is used to set up a linear predictor for the detection probability (with logit link). See the example below for a more detailed worked explanation of the nmix() family

Only poisson(), nb(), and tweedie() are available if using JAGS. All families, apart from tweedie(), are supported if using Stan.

Note that currently it is not possible to change the default link functions in mvgam, so any call to change these will be silently ignored

Author

Nicholas J Clark

Examples

# \donttest{
# Example showing how to set up N-mixture models
set.seed(999)
# Simulate observations for species 1, which shows a declining trend and 0.7 detection probability
data.frame(site = 1,
          # five replicates per year; six years
          replicate = rep(1:5, 6),
          time = sort(rep(1:6, 5)),
          species = 'sp_1',
          # true abundance declines nonlinearly
          truth = c(rep(28, 5),
                    rep(26, 5),
                    rep(23, 5),
                    rep(16, 5),
                    rep(14, 5),
                    rep(14, 5)),
          # observations are taken with detection prob = 0.7
          obs = c(rbinom(5, 28, 0.7),
                  rbinom(5, 26, 0.7),
                  rbinom(5, 23, 0.7),
                  rbinom(5, 15, 0.7),
                  rbinom(5, 14, 0.7),
                  rbinom(5, 14, 0.7))) %>%
 # add 'series' information, which is an identifier of site, replicate and species
 dplyr::mutate(series = paste0('site_', site,
                               '_', species,
                               '_rep_', replicate),
               time = as.numeric(time),
               # add a 'cap' variable that defines the maximum latent N to
               # marginalize over when estimating latent abundance; in other words
               # how large do we realistically think the true abundance could be?
               cap = 80) %>%
 dplyr::select(- replicate) -> testdat

# Now add another species that has a different temporal trend and a smaller
# detection probability (0.45 for this species)
testdat = testdat %>%
 dplyr::bind_rows(data.frame(site = 1,
                             replicate = rep(1:5, 6),
                             time = sort(rep(1:6, 5)),
                             species = 'sp_2',
                             truth = c(rep(4, 5),
                                       rep(7, 5),
                                       rep(15, 5),
                                       rep(16, 5),
                                       rep(19, 5),
                                       rep(18, 5)),
                             obs = c(rbinom(5, 4, 0.45),
                                     rbinom(5, 7, 0.45),
                                     rbinom(5, 15, 0.45),
                                     rbinom(5, 16, 0.45),
                                     rbinom(5, 19, 0.45),
                                     rbinom(5, 18, 0.45))) %>%
                    dplyr::mutate(series = paste0('site_', site,
                                                  '_', species,
                                                  '_rep_', replicate),
                                  time = as.numeric(time),
                                  cap = 50) %>%
                    dplyr::select(-replicate))

# series identifiers
testdat$species <- factor(testdat$species,
                          levels = unique(testdat$species))
testdat$series <- factor(testdat$series,
                         levels = unique(testdat$series))

# The trend_map to state how replicates are structured
testdat %>%
# each unique combination of site*species is a separate process
dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>%
 dplyr::select(trend, series) %>%
 dplyr::distinct() -> trend_map
trend_map
#>    trend            series
#> 1      1 site_1_sp_1_rep_1
#> 2      1 site_1_sp_1_rep_2
#> 3      1 site_1_sp_1_rep_3
#> 4      1 site_1_sp_1_rep_4
#> 5      1 site_1_sp_1_rep_5
#> 6      2 site_1_sp_2_rep_1
#> 7      2 site_1_sp_2_rep_2
#> 8      2 site_1_sp_2_rep_3
#> 9      2 site_1_sp_2_rep_4
#> 10     2 site_1_sp_2_rep_5

# Fit a model
mod <- mvgam(
            # the observation formula sets up linear predictors for
            # detection probability on the logit scale
            formula = obs ~ species - 1,

            # the trend_formula sets up the linear predictors for
            # the latent abundance processes on the log scale
            trend_formula = ~ s(time, by = trend, k = 4) + species,

            # the trend_map takes care of the mapping
            trend_map = trend_map,

            # nmix() family and data
            family = nmix(),
            data = testdat,

            # priors can be set in the usual way
            priors = c(prior(std_normal(), class = b),
                       prior(normal(1, 1.5), class = Intercept_trend)),
            chains = 2)
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 finished in 4.2 seconds.
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 4.7 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 4.4 seconds.
#> Total execution time: 5.1 seconds.
#> 

# The usual diagnostics
summary(mod)
#> GAM observation formula:
#> obs ~ species - 1
#> <environment: 0x00000283526ecb60>
#> 
#> GAM process formula:
#> ~s(time, by = trend, k = 4) + species
#> <environment: 0x00000283526ecb60>
#> 
#> Family:
#> nmix
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 10 
#> 
#> N timepoints:
#> 6 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM observation model coefficient (beta) estimates:
#>              2.5%     50% 97.5% Rhat n_eff
#> speciessp_1 -0.19  0.7300  1.40 1.01   411
#> speciessp_2 -1.20 -0.0064  0.89 1.00   398
#> 
#> GAM process model coefficient (beta) estimates:
#>                               2.5%     50%  97.5% Rhat n_eff
#> (Intercept)_trend            2.700  3.0000  3.400 1.01   394
#> speciessp_2_trend           -1.200 -0.6100  0.150 1.00   329
#> s(time):trendtrend1.1_trend -0.064  0.0160  0.180 1.01   375
#> s(time):trendtrend1.2_trend -0.230  0.0035  0.280 1.00   685
#> s(time):trendtrend1.3_trend -0.450 -0.2400 -0.042 1.00   680
#> s(time):trendtrend2.1_trend -0.160 -0.0110  0.076 1.00   456
#> s(time):trendtrend2.2_trend -0.170  0.0230  0.350 1.00   475
#> s(time):trendtrend2.3_trend  0.059  0.3400  0.630 1.00   664
#> 
#> Approximate significance of GAM process smooths:
#>                        edf Ref.df Chi.sq p-value
#> s(time):seriestrend1 0.809      3   4.22    0.57
#> s(time):seriestrend2 0.999      3   3.00    0.54
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Tue Feb 18 10:33:03 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

# Plotting conditional effects
library(ggplot2); library(marginaleffects)
plot_predictions(mod, condition = 'species',
                 type = 'detection') +
     ylab('Pr(detection)') +
     ylim(c(0, 1)) +
     theme_classic() +
     theme(legend.position = 'none')


# Example showcasing how cbind() is needed for Binomial observations
# Simulate two time series of Binomial trials
trials <- sample(c(20:25), 50, replace = TRUE)
x <- rnorm(50)
detprob1 <- plogis(-0.5 + 0.9*x)
detprob2 <- plogis(-0.1 -0.7*x)
dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1),
                        time = 1:50,
                        series = 'series1',
                        x = x,
                        ntrials = trials),
             data.frame(y = rbinom(n = 50, size = trials, prob = detprob2),
                        time = 1:50,
                        series = 'series2',
                        x = x,
                        ntrials = trials))
dat <- dplyr::mutate(dat, series = as.factor(series))
dat <- dplyr::arrange(dat, time, series)

# Fit a model using the binomial() family; must specify observations
# and number of trials in the cbind() wrapper
mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series),
             family = binomial(),
             data = dat)
#> Warning: Binomial and Beta-binomial families require cbind(n_successes, n_trials)
#> in the formula left-hand side. Do not use cbind(n_successes, n_failures)!
#> This warning is displayed once per session.
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 4 finished in 3.3 seconds.
#> Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 4.4 seconds.
#> Chain 2 finished in 4.2 seconds.
#> Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 3 finished in 4.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 4.0 seconds.
#> Total execution time: 4.8 seconds.
#> 
summary(mod)
#> GAM formula:
#> cbind(y, ntrials) ~ series + s(x, by = series)
#> <environment: 0x00000283526ecb60>
#> 
#> Family:
#> binomial
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 2 
#> 
#> N timepoints:
#> 50 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                        2.5%     50%  97.5% Rhat n_eff
#> (Intercept)          -0.510 -0.3900 -0.270 1.00  1118
#> seriesseries2         0.095  0.2700  0.430 1.00  1261
#> s(x):seriesseries1.1 -0.110  0.0590  0.490 1.03   280
#> s(x):seriesseries1.2 -0.180  0.0110  0.250 1.01   928
#> s(x):seriesseries1.3 -0.130 -0.0100  0.047 1.02   446
#> s(x):seriesseries1.4 -0.140 -0.0085  0.096 1.00   919
#> s(x):seriesseries1.5 -0.077 -0.0053  0.044 1.01   658
#> s(x):seriesseries1.6 -0.072  0.0096  0.120 1.01   765
#> s(x):seriesseries1.7 -0.087 -0.0078  0.047 1.01   737
#> s(x):seriesseries1.8 -0.600  0.0850  1.100 1.01   749
#> s(x):seriesseries1.9  0.160  0.7300  1.000 1.03   257
#> s(x):seriesseries2.1 -0.440 -0.0610  0.120 1.01   432
#> s(x):seriesseries2.2 -0.150  0.0240  0.320 1.01   209
#> s(x):seriesseries2.3 -0.051  0.0053  0.120 1.01   624
#> s(x):seriesseries2.4 -0.180 -0.0160  0.078 1.01   223
#> s(x):seriesseries2.5 -0.073 -0.0048  0.049 1.00   522
#> s(x):seriesseries2.6 -0.065  0.0140  0.130 1.01   253
#> s(x):seriesseries2.7 -0.092 -0.0098  0.051 1.01   247
#> s(x):seriesseries2.8 -0.540  0.1500  1.200 1.01   221
#> s(x):seriesseries2.9 -0.840 -0.5600 -0.068 1.01   436
#> 
#> Approximate significance of GAM smooths:
#>                      edf Ref.df Chi.sq p-value    
#> s(x):seriesseries1 0.992      9   35.8    0.57    
#> s(x):seriesseries2 2.147      9   31.5 1.8e-05 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Tue Feb 18 10:33:51 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
# }
================================================ FILE: docs/reference/mvgam_fevd-class.html ================================================ mvgam_fevd object description — mvgam_fevd-class • mvgam Skip to contents

A mvgam_fevd object returned by function fevd. Run methods(class = "mvgam_fevd") to see an overview of available methods.

Details

A mvgam_fevd object contains a list of posterior forecast error variance decompositions, each stored as its own list

See also

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam_forecast-class.html ================================================ mvgam_forecast object description — mvgam_forecast-class • mvgam Skip to contents

A mvgam_forecast object returned by function hindcast or forecast. Run methods(class = "mvgam_forecast") to see an overview of available methods.

Details

A mvgam_forecast object contains the following elements:

  • call the original observation model formula

  • trend_call If a trend_formula was supplied, the original trend model formula is returned. Otherwise NULL

  • family character description of the observation distribution

  • family_pars list containing draws of family-specific parameters (i.e. shape, scale or overdispersion parameters). Only returned if type = link. Otherwise NULL

  • trend_model character description of the latent trend model

  • drift Logical specifying whether a drift term was used in the trend model

  • use_lv Logical flag indicating whether latent dynamic factors were used in the model

  • fit_engine Character describing the fit engine, either as stan or jags

  • type The type of predictions included (either link, response or trend)

  • series_names Names of the time series, taken from levels(data$series) in the original model fit

  • train_observations A list of training observation vectors of length n_series

  • train_times A vector of the unique training times

  • test_observations If the forecast function was used, a list of test observation vectors of length n_series. Otherwise NULL

  • test_times If the forecast function was used, a vector of the unique validation (testing) times. Otherwise NULL

  • hindcasts A list of posterior hindcast distributions of length n_series.

  • forecasts If the forecast function was used, a list of posterior forecast distributions of length n_series. Otherwise NULL

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam_formulae.html ================================================ Details of formula specifications in mvgam models — mvgam_formulae • mvgam Skip to contents

Details of formula specifications in mvgam models

Details

mvgam will accept an observation model formula and an optional process model formula (via the argument trend_formula). Neither of these formulae can be specified as lists, contrary to the accepted behaviour in some mgcv or brms models.

Note that it is possible to supply an empty formula where there are no predictors or intercepts in the observation model (i.e. y ~ 0 or y ~ -1). In this case, an intercept-only observation model will be set up but the intercept coefficient will be fixed at zero. This can be handy if you wish to fit pure State-Space models where the variation in the dynamic trend controls the average expectation, and/or where intercepts are non-identifiable.

The formulae supplied to mvgam and jsdgam are exactly like those supplied to glm except that smooth terms, s, te, ti and t2, time-varying effects using dynamic, monotonically increasing (using s(x, bs = 'moi')) or decreasing splines (using s(x, bs = 'mod'); see smooth.construct.moi.smooth.spec for details), as well as Gaussian Process functions using gp and offsets using offset can be added to the right hand side (and . is not supported in mvgam formulae).

Further details on specifying different kinds of smooth functions, and how to control their behaviours by modifying their potential complexities and / or how the penalties behave, can be found in the extensive documentation for the mgcv package.

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam_irf-class.html ================================================ mvgam_irf object description — mvgam_irf-class • mvgam Skip to contents

A mvgam_irf object returned by function irf. Run methods(class = "mvgam_irf") to see an overview of available methods.

Details

A mvgam_irf object contains a list of posterior IRFs, each stored as its own list

See also

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam_marginaleffects.html ================================================ Helper functions for marginaleffects calculations in mvgam models — mvgam_marginaleffects • mvgam Skip to contents

Helper functions for marginaleffects calculations in mvgam models

Functions needed for working with marginaleffects

Functions needed for getting data / objects with insight

Usage

# S3 method for class 'mvgam'
get_coef(model, trend_effects = FALSE, ...)

# S3 method for class 'mvgam'
set_coef(model, coefs, trend_effects = FALSE, ...)

# S3 method for class 'mvgam'
get_vcov(model, vcov = NULL, ...)

# S3 method for class 'mvgam'
get_predict(model, newdata, type = "response", process_error = FALSE, ...)

# S3 method for class 'mvgam'
get_data(x, source = "environment", verbose = TRUE, ...)

# S3 method for class 'mvgam_prefit'
get_data(x, source = "environment", verbose = TRUE, ...)

# S3 method for class 'mvgam'
find_predictors(
  x,
  effects = c("fixed", "random", "all"),
  component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments",
    "correlation", "smooth_terms"),
  flatten = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for class 'mvgam_prefit'
find_predictors(
  x,
  effects = c("fixed", "random", "all"),
  component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments",
    "correlation", "smooth_terms"),
  flatten = FALSE,
  verbose = TRUE,
  ...
)

Arguments

model

Model object

trend_effects

logical, extract from the process model component (only applicable if a trend_formula was specified in the model)

...

Additional arguments are passed to the predict() method supplied by the modeling package.These arguments are particularly useful for mixed-effects or bayesian models (see the online vignettes on the marginaleffects website). Available arguments can vary from model to model, depending on the range of supported arguments by each modeling package. See the "Model-Specific Arguments" section of the ?slopes documentation for a non-exhaustive list of available arguments.

coefs

vector of coefficients to insert in the model object

vcov

Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:

  • FALSE: Do not compute standard errors. This can speed up computation considerably.

  • TRUE: Unit-level standard errors using the default vcov(model) variance-covariance matrix.

  • String which indicates the kind of uncertainty estimates to return.

    • Heteroskedasticity-consistent: "HC", "HC0", "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5". See ?sandwich::vcovHC

    • Heteroskedasticity and autocorrelation consistent: "HAC"

    • Mixed-Models degrees of freedom: "satterthwaite", "kenward-roger"

    • Other: "NeweyWest", "KernHAC", "OPG". See the sandwich package documentation.

  • One-sided formula which indicates the name of cluster variables (e.g., ~unit_id). This formula is passed to the cluster argument of the sandwich::vcovCL function.

  • Square covariance matrix

  • Function which returns a covariance matrix (e.g., stats::vcov(model))

newdata

Grid of predictor values at which we evaluate the slopes.

  • Warning: Please avoid modifying your dataset between fitting the model and calling a marginaleffects function. This can sometimes lead to unexpected results.

  • NULL (default): Unit-level slopes for each observed value in the dataset (empirical distribution). The dataset is retrieved using insight::get_data(), which tries to extract data from the environment. This may produce unexpected results if the original data frame has been altered since fitting the model.

  • datagrid() call to specify a custom grid of regressors. For example:

    • newdata = datagrid(cyl = c(4, 6)): cyl variable equal to 4 and 6 and other regressors fixed at their means or modes.

    • See the Examples section and the datagrid() documentation.

  • subset() call with a single argument to select a subset of the dataset used to fit the model, ex: newdata = subset(treatment == 1)

  • dplyr::filter() call with a single argument to select a subset of the dataset used to fit the model, ex: newdata = filter(treatment == 1)

  • string:

    • "mean": Slopes evaluated when each predictor is held at its mean or mode.

    • "median": Slopes evaluated when each predictor is held at its median or mode.

    • "balanced": Slopes evaluated on a balanced grid with every combination of categories and numeric variables held at their means.

    • "tukey": Slopes evaluated at Tukey's 5 numbers.

    • "grid": Slopes evaluated on a grid of representative numbers (Tukey's 5 numbers and unique values of categorical predictors).

type

string indicates the type (scale) of the predictions used to compute contrasts or slopes. This can differ based on the model type, but will typically be a string such as: "response", "link", "probs", or "zero". When an unsupported string is entered, the model-specific list of acceptable values is returned in an error message. When type is NULL, the first entry in the error message is used by default.

process_error

logical. If TRUE, uncertainty in the latent process (or trend) model is incorporated in predictions

x

A fitted model.

source

String, indicating from where data should be recovered. If source = "environment" (default), data is recovered from the environment (e.g. if the data is in the workspace). This option is usually the fastest way of getting data and ensures that the original variables used for model fitting are returned. Note that always the current data is recovered from the environment. Hence, if the data was modified after model fitting (e.g., variables were recoded or rows filtered), the returned data may no longer equal the model data. If source = "frame" (or "mf"), the data is taken from the model frame. Any transformed variables are back-transformed, if possible. This option returns the data even if it is not available in the environment, however, in certain edge cases back-transforming to the original data may fail. If source = "environment" fails to recover the data, it tries to extract the data from the model frame; if source = "frame" and data cannot be extracted from the model frame, data will be recovered from the environment. Both ways only returns observations that have no missing data in the variables used for model fitting.

verbose

Toggle messages and warnings.

effects

Should model data for fixed effects ("fixed"), random effects ("random") or both ("all") be returned? Only applies to mixed or gee models.

component

Which type of parameters to return, such as parameters for the conditional model, the zero-inflated part of the model, the dispersion term, the instrumental variables or marginal effects be returned? Applies to models with zero-inflated and/or dispersion formula, or to models with instrumental variables (so called fixed-effects regressions), or models with marginal effects (from mfx). See details in section Model Components .May be abbreviated. Note that the conditional component also refers to the count or mean component - names may differ, depending on the modeling package. There are three convenient shortcuts (not applicable to all model classes):

  • component = "all" returns all possible parameters.

  • If component = "location", location parameters such as conditional, zero_inflated, smooth_terms, or instruments are returned (everything that are fixed or random effects - depending on the effects argument - but no auxiliary parameters).

  • For component = "distributional" (or "auxiliary"), components like sigma, dispersion, beta or precision (and other auxiliary parameters) are returned.

flatten

Logical, if TRUE, the values are returned as character vector, not as list. Duplicated values are removed.

Value

Objects suitable for internal 'marginaleffects' functions to proceed. See marginaleffects::get_coef(), marginaleffects::set_coef(), marginaleffects::get_vcov(), marginaleffects::get_predict(), insight::get_data() and insight::find_predictors() for details

Author

Nicholas J Clark

================================================ FILE: docs/reference/mvgam_trends.html ================================================ Supported latent trend models in mvgam — mvgam_trends • mvgam Skip to contents

Supported latent trend models in mvgam

Details

mvgam currently supports the following dynamic trend models:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • ZMVN() (zero-mean correlated errors, useful for modelling time series where no autoregressive terms are needed or for modelling data that are not sampled as time series)

  • RW()

  • AR(p = 1, 2, or 3)

  • CAR(p = 1)(continuous time autoregressive trends; only available in Stan)

  • VAR()(only available in Stan)

  • PW() (piecewise linear or logistic trends; only available in Stan)

  • GP() (Gaussian Process with squared exponential kernel; only available in Stan)

For most dynamic trend types available in mvgam (see argument trend_model), time should be measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)). However you can use irregularly spaced intervals if using trend_model = CAR(1), though note that any temporal intervals that are exactly 0 will be adjusted to a very small number (1e-12) to prevent sampling errors. For all autoregressive trend types apart from CAR(), moving average and/or correlated process error terms can also be estimated (for example, RW(cor = TRUE) will set up a multivariate Random Walk if data contains >1 series). Hierarchical process error correlations can also be handled if the data contain relevant observation units that are nested into relevant grouping and subgrouping levels (i.e. using AR(gr = region, subgr = species))

Note that only RW, AR1, AR2 and AR3 are available if using JAGS. All trend models are supported if using Stan. Dynamic factor models can be used in which the latent factors evolve as either RW, AR1-3, VAR or GP. For VAR models (i.e. VAR and VARcor models), users can either fix the trend error covariances to be 0 (using VAR) or estimate them and potentially allow for contemporaneously correlated errors using VARcor. For all VAR models, stationarity of the latent process is enforced through the prior using the parameterisation given by Heaps (2022). Stationarity is not enforced when using AR1, AR2 or AR3 models, though this can be changed by the user by specifying lower and upper bounds on autoregressive parameters using functionality in get_mvgam_priors and the priors argument in mvgam. Piecewise trends follow the formulation in the popular prophet package produced by Facebook, where users can allow for changepoints to control the potential flexibility of the trend. See Taylor and Letham (2018) for details

References

Sarah E. Heaps (2022) Enforcing stationarity through the prior in Vector Autoregressions. Journal of Computational and Graphical Statistics. 32:1, 1-10.

Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American Statistician 72.1, 37-45.

See also

RW, AR, CAR, VAR, PW, GP, ZMVN

================================================ FILE: docs/reference/pairs.mvgam.html ================================================ Create a matrix of output plots from a mvgam object — pairs.mvgam • mvgam Skip to contents

A pairs method that is customized for MCMC output.

Usage

# S3 method for class 'mvgam'
pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)

Arguments

x

An object of class mvgam or jsdgam

variable

Names of the variables (parameters) to plot, as given by a character vector or a regular expression (if regex = TRUE). By default, a hopefully not too large selection of variables is plotted.

regex

Logical; Indicates whether variable should be treated as regular expressions. Defaults to FALSE.

use_alias

Logical. If more informative names for parameters are available (i.e. for beta coefficients b or for smoothing parameters rho), replace the uninformative names with the more informative alias. Defaults to TRUE

...

Further arguments to be passed to mcmc_pairs.

Value

Plottable objects whose classes depend on the arguments supplied. See mcmc_pairs for details.

Details

For a detailed description see mcmc_pairs.

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1')
mod <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c20f26b82.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.3 seconds.
#> Chain 2 finished in 0.3 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.3 seconds.
#> Total execution time: 0.4 seconds.
#> 
pairs(mod)

pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE)

# }

================================================ FILE: docs/reference/pfilter_mvgam_fc.html ================================================ Forecast from a particle filtered mvgam object — pfilter_mvgam_fc • mvgam Skip to contents

This function generates a forecast from a set of particles that each capture a unique proposal about the current state of the system that was modelled in the mvgam object. The covariate and timepoint information from data_test is used to generate the GAM component forecast, while the trends are run forward in time according to their state space dynamics. The forecast is a weighted ensemble, with weights determined by each particle's proposal likelihood prior to the most recent assimilation step

Usage

pfilter_mvgam_fc(
  file_path = "pfilter",
  n_cores = 2,
  newdata,
  data_test,
  plot_legend = TRUE,
  legend_position = "topleft",
  ylim,
  return_forecasts = FALSE
)

Arguments

file_path

character string specifying the file path where the particles have been saved

n_cores

integer specifying number of cores for generating particle forecasts in parallel

newdata

A dataframe or list of test data containing at least 'series' and time', in addition to any other variables included in the linear predictor of formula

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

plot_legend

logical stating whether to include a legend to highlight which observations were used for calibration and which were assimilated by the particle filter

legend_position

The legend location may be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location.

ylim

Optional vector of y-axis limits (min, max). The same limits will be used for all plots

return_forecasts

logical. If TRUE, the returned list object will contain plots of forecasts as well as the forecast objects (each as a matrix of dimension n_particles x horizon)

Value

A named list containing functions that call base R plots of each series' forecast. Optionally the actual forecasts are returned within the list as a separate list of matrices

================================================ FILE: docs/reference/pfilter_mvgam_init.html ================================================ Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init • mvgam Skip to contents

This function generates a set of particles that each captures a unique proposal about the current state of the system. The next observation in data_assim is assimilated and particles are weighted by their proposal's multivariate composite likelihood to update the model's forecast distribution

Usage

pfilter_mvgam_init(
  object,
  newdata,
  data_assim,
  n_particles = 1000,
  file_path = "pfilter",
  n_cores = 2
)

Arguments

object

list object returned from mvgam

newdata

A dataframe or list of test data containing at least one more observation per series (beyond the last observation seen by the model in object) to be assimilated by the particle filter. Should at least contain 'series' and 'time' for the one-step ahead horizon, in addition to any other variables included in the linear predictor of object

data_assim

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

n_particles

integer specifying the number of unique particles to generate for tracking the latent system state

file_path

character string specifying the file path for saving the initiated particles

n_cores

integer specifying number of cores for generating particle forecasts in parallel

Value

A list object of length = n_particles containing information on parameters and current state estimates for each particle is generated and saved, along with other important information from the original model, to an .rda object in file_path

================================================ FILE: docs/reference/pfilter_mvgam_online.html ================================================ Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online • mvgam Skip to contents

This function operates sequentially on new observations in data_assim to update the posterior forecast distribution. It is a wrapper that calls pfilter_mvgam_smooth. In each iteration, the next observation is assimilated and particles are weighted by their proposal's multivariate composite likelihood

Usage

pfilter_mvgam_online(
  newdata,
  data_assim,
  file_path = "pfilter",
  threshold = 0.5,
  use_resampling = FALSE,
  kernel_lambda = 0.25,
  n_cores = 1
)

Arguments

newdata

A dataframe or list of test data containing at least one more observation per series (beyond the last observation seen by the model when initialising particles with pfilter_mvgam_init or in previous calls to pfilter_mvgam_online. Should at least contain 'series' and 'time' for the one-step ahead horizon, in addition to any other variables included in the linear predictor of object

data_assim

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

file_path

character string specifying the file path for locating the particles

threshold

proportional numeric specifying the Effective Sample Size limit under which resampling of particles will be triggered (calculated as ESS / n_particles) if use_resampling == TRUE. Should be between 0 and 1

use_resampling

logical specifying whether resampling should be used when ESS falls below the specified threshold. Default for this option is FALSE, relying instead on kernel smoothing only to maintain particle diversity

kernel_lambda

proportional numeric specifying the strength of kernel smoothing to use when pulling low weight particles toward the high likelihood state space. Should be between 0 and 1

n_cores

integer specifying number of cores for generating particle forecasts in parallel

Value

A list object of length = n_particles containing information on parameters and current state estimates for each particle is generated and saved, along with other important information from the original model, to an .rda object in file_path

================================================ FILE: docs/reference/pfilter_mvgam_smooth.html ================================================ Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth • mvgam Skip to contents

This function operates on a new observation in next_assim to update the posterior forecast distribution. The next observation is assimilated and particle weights are updated in light of their most recent their multivariate composite likelihood. Low weight particles are smoothed towards the high weight state space using importance sampling, and options are given for using resampling of high weight particles when Effective Sample Size falls below a user-specified threshold

Usage

pfilter_mvgam_smooth(
  particles,
  mgcv_model,
  next_assim,
  threshold = 0.25,
  n_cores = 1,
  use_resampling = FALSE,
  kernel_lambda = 0.5
)

Arguments

particles

A list of particles that have been run up to one observation prior to the observation in next_assim

mgcv_model

A gam model returned through a call to link{mvgam}

next_assim

A dataframe of test data containing at one more observation per series (beyond the last observation seen by the model when initialising particles with pfilter_mvgam_init or in previous calls to pfilter_mvgam_online. Should at least contain 'series' and 'time' for the one-step ahead horizon, in addition to any other variables included in the linear predictor of object

threshold

proportional numeric specifying the Effective Sample Size limit under which resampling of particles will be triggered (calculated as ESS / n_particles) if use_resampling == TRUE. Should be between 0 and 1

n_cores

integer specifying number of cores for generating particle forecasts in parallel

use_resampling

logical specifying whether resampling should be used when ESS falls below the specified threshold. Note that resampling can result in loss of the original model's diversity of GAM beta coefficients, which may have undesirable consequences for the forecast distribution. If use_resampling is TRUE, some effort is made to remedy this by assigning randomly sampled draws of GAM beta coefficients from the original model's distribution to each particle. This does not however guarantee there will be no loss of diversity, especially if successive resampling take place. Default for this option is therefore FALSE

kernel_lambda

proportional numeric specifying the strength of smoothing to use when pulling low weight particles toward the high likelihood state space. Should be between 0 and 1

Value

A list object of length = n_particles containing information on parameters and current state estimates for each particle

================================================ FILE: docs/reference/piecewise_trends.html ================================================ Specify piecewise linear or logistic trends in mvgam models — PW • mvgam Skip to contents

Set up piecewise linear or logistic trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular piecewise trend models.

Usage

PW(
  n_changepoints = 10,
  changepoint_range = 0.8,
  changepoint_scale = 0.05,
  growth = "linear"
)

Arguments

n_changepoints

A non-negative integer specifying the number of potential changepoints. Potential changepoints are selected uniformly from the first changepoint_range proportion of timepoints in data. Default is 10

changepoint_range

Proportion of history in data in which trend changepoints will be estimated. Defaults to 0.8 for the first 80%.

changepoint_scale

Parameter modulating the flexibility of the automatic changepoint selection by altering the scale parameter of a Laplace distribution. The resulting prior will be double_exponential(0, changepoint_scale). Large values will allow many changepoints and a more flexible trend, while small values will allow few changepoints. Default is 0.05.

growth

Character string specifying either 'linear' or 'logistic' growth of the trend. If 'logistic', a variable labelled cap MUST be in data to specify the maximum saturation point for the trend (see details and examples in mvgam for more information). Default is 'linear'.

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Details

Offsets and intercepts: For each of these trend models, an offset parameter is included in the trend estimation process. This parameter will be incredibly difficult to identify if you also include an intercept in the observation formula. For that reason, it is highly recommended that you drop the intercept from the formula (i.e. y ~ x + 0 or y ~ x - 1, where x are your optional predictor terms).

Logistic growth and the cap variable: When forecasting growth, there is often some maximum achievable point that a time series can reach. For example, total market size, total population size or carrying capacity in population dynamics. It can be advantageous for the forecast to saturate at or near this point so that predictions are more sensible. This function allows you to make forecasts using a logistic growth trend model, with a specified carrying capacity. Note that this capacity does not need to be static over time, it can vary with each series x timepoint combination if necessary. But you must supply a cap value for each observation in the data when using growth = 'logistic'. For observation families that use a non-identity link function, the cap value will be internally transformed to the link scale (i.e. your specified cap will be log transformed if you are using a poisson() or nb() family). It is therefore important that you specify the cap values on the scale of your outcome. Note also that no missing values are allowed in cap.

References

Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45.

Examples

# \donttest{
# Example of logistic growth with possible changepoints
# Simple logistic growth model
dNt = function(r, N, k){
   r * N * (k - N)
}

# Iterate growth through time
Nt = function(r, N, t, k) {
for (i in 1:(t - 1)) {

 # population at next time step is current population + growth,
 # but we introduce several 'shocks' as changepoints
 if(i %in% c(5, 15, 25, 41, 45, 60, 80)){
   N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1),
                                 N[i], k))
   } else {
   N[i + 1] <- max(1, N[i] + dNt(r, N[i], k))
   }
  }
 N
}

# Simulate expected values
set.seed(11)
expected <- Nt(0.004, 2, 100, 30)
plot(expected, xlab = 'Time')


# Take Poisson draws
y <- rpois(100, expected)
plot(y, xlab = 'Time')


# Assemble data into dataframe and model. We set a
# fixed carrying capacity of 35 for this example, but note that
# this value is not required to be fixed at each timepoint
mod_data <- data.frame(y = y,
                       time = 1:100,
                       cap = 35,
                       series = as.factor('series_1'))
plot_mvgam_series(data = mod_data)


# The intercept is nonidentifiable when using piecewise
# trends because the trend functions have their own offset
# parameters 'm'; it is recommended to always drop intercepts
# when using these trend models
mod <- mvgam(y ~ 0,
             trend_model = PW(growth = 'logistic'),
             family = poisson(),
             data = mod_data,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c1ac83c6.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 5.1 seconds.
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 5.9 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 5.5 seconds.
#> Total execution time: 6.0 seconds.
#> 
summary(mod)
#> GAM formula:
#> y ~ 1
#> <environment: 0x000001d627ea1ee8>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> PW(growth = "logistic")
#> 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> Latent trend growth rate estimates:
#>             2.5%   50%  97.5% Rhat n_eff
#> k_trend[1] -0.27 -0.15 -0.073    1   355
#> 
#> Latent trend offset estimates:
#>            2.5%  50% 97.5% Rhat n_eff
#> m_trend[1]  -15 -4.1 -0.33    1   465
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 98 of 1000 iterations saturated the maximum tree depth of 10 (9.8%)
#>  *Run with max_treedepth set to a larger value to avoid saturation
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:54:14 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model

# Plot the posterior hindcast
plot(mod, type = 'forecast')


# View the changepoints with ggplot2 utilities
library(ggplot2)
mcmc_plot(mod, variable = 'delta_trend',
          regex = TRUE) +
scale_y_discrete(labels = mod$trend_model$changepoints) +
labs(y = 'Potential changepoint',
     x = 'Rate change')
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

# }
================================================ FILE: docs/reference/pipe.html ================================================ Pipe operator — %>% • mvgam Skip to contents

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).

================================================ FILE: docs/reference/plot.mvgam.html ================================================ Default plots for mvgam models — plot.mvgam • mvgam Skip to contents

This function takes a fitted mvgam object and produces plots of smooth functions, forecasts, trends and uncertainty components

Usage

# S3 method for class 'mvgam'
plot(
  x,
  type = "residuals",
  series = 1,
  residuals = FALSE,
  newdata,
  data_test,
  trend_effects = FALSE,
  ...
)

Arguments

x

list object returned from mvgam. See mvgam()

type

character specifying which type of plot to return. Options are: series, residuals, smooths, re (random effect smooths), pterms (parametric effects), forecast, trend, uncertainty, factors

series

integer specifying which series in the set is to be plotted. This is ignored if type == 're'

residuals

logical. If TRUE and type = 'smooths', posterior quantiles of partial residuals are added to plots of 1-D smooths as a series of ribbon rectangles. Partial residuals for a smooth term are the median Dunn-Smyth residuals that would be obtained by dropping the term concerned from the model, while leaving all other estimates fixed (i.e. the estimates for the term plus the original median Dunn-Smyth residuals). Note that because mvgam works with Dunn-Smyth residuals and not working residuals, which are used by mgcv, the magnitudes of partial residuals will be different to what you would expect from plot.gam. Interpretation is similar though, as these partial residuals should be evenly scattered around the smooth function if the function is well estimated

newdata

Optional dataframe or list of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original formula. This argument is optional when plotting out of sample forecast period observations (when type = forecast) and required when plotting uncertainty components (type = uncertainty).

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

trend_effects

logical. If TRUE and a trend_formula was used in model fitting, terms from the trend (i.e. process) model will be plotted

...

Additional arguments for each individual plotting function.

Value

A base R plot or set of plots

Details

These plots are useful for getting an overview of the fitted model and its estimated random effects or smooth functions, but the individual plotting functions and the functions from the marginaleffects and gratia packages offer far more more customisation.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some time series
dat <- sim_mvgam(T = 80, n_series = 3)

# Fit a basic model
mod <- mvgam(y ~ s(season, bs = 'cc') + s(series, bs = 're'),
            data = dat$data_train,
            trend_model = RW(),
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c38222f99.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Plot predictions and residuals for each series
plot(mod, type = 'forecast', series = 1)

plot(mod, type = 'forecast', series = 2)

plot(mod, type = 'forecast', series = 3)

plot(mod, type = 'residuals', series = 1)

plot(mod, type = 'residuals', series = 2)

plot(mod, type = 'residuals', series = 3)


# Plot model effects
plot(mod, type = 'smooths')

plot(mod, type = 're')


# More flexible plots with 'marginaleffects' utilities
library(marginaleffects)
plot_predictions(mod, condition = 'season', type = 'link')

plot_predictions(mod,
                condition = c('season', 'series', 'series'),
                type = 'link')

plot_predictions(mod, condition = 'series', type = 'link')


# When using a State-Space model with predictors on the process
# model, set trend_effects = TRUE to visualise process effects
mod <- mvgam(y ~ -1,
            trend_formula = ~ s(season, bs = 'cc'),
            data = dat$data_train,
            trend_model = RW(),
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3b362b14.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
plot(mod, type = 'smooths', trend_effects = TRUE)


# But marginaleffects functions work without any modification
plot_predictions(mod, condition = 'season', type = 'link')


# }
================================================ FILE: docs/reference/plot.mvgam_fevd.html ================================================ Plot forecast error variance decompositions from an mvgam_fevd object — plot.mvgam_fevd • mvgam Skip to contents

This function takes an mvgam_fevd object and produces a plot of the posterior median contributions to forecast variance for each series in the fitted Vector Autoregression

Usage

# S3 method for class 'mvgam_fevd'
plot(x, ...)

Arguments

x

list object of class mvgam_fevd. See fevd()

...

ignored

Value

A ggplot object, which can be further customized using the ggplot2 package

Author

Nicholas J Clark

================================================ FILE: docs/reference/plot.mvgam_irf.html ================================================ Plot impulse responses from an mvgam_irf object — plot.mvgam_irf • mvgam Skip to contents

This function takes an mvgam_irf object and produces plots of Impulse Response Functions

Usage

# S3 method for class 'mvgam_irf'
plot(x, series = 1, ...)

Arguments

x

list object of class mvgam_irf. See irf()

series

integer specifying which process series should be given the shock

...

ignored

Value

A ggplot object showing the expected response of each latent time series to a shock of the focal series

Author

Nicholas J Clark

================================================ FILE: docs/reference/plot.mvgam_lfo.html ================================================ Plot Pareto-k and ELPD values from a mvgam_lfo object — plot.mvgam_lfo • mvgam Skip to contents

This function takes an object of class mvgam_lfo and creates several informative diagnostic plots

Usage

# S3 method for class 'mvgam_lfo'
plot(x, ...)

Arguments

x

An object of class mvgam_lfo

...

Ignored

Value

A ggplot object of Pareto-k and ELPD values over the evaluation timepoints. For the Pareto-k plot, a dashed red line indicates the specified threshold chosen for triggering model refits. For the ELPD plot, a dashed red line indicates the bottom 10% quantile of ELPD values. Points below this threshold may represent outliers that were more difficult to forecast

================================================ FILE: docs/reference/plot_effects.mvgam.html ================================================ Effect plot as implemented in marginaleffects — plot_effects.mvgam • mvgam Skip to contents

Convenient way to call marginal or conditional effect plotting functions implemented in the marginaleffects package

Usage

plot_effects(object, ...)

# S3 method for mvgam
plot_effects(
  object,
  condition = NULL,
  by = NULL,
  newdata = NULL,
  type = NULL,
  conf_level = 0.95,
  wts = NULL,
  transform = NULL,
  points = 0,
  rug = FALSE,
  ...
)

Arguments

...

Additional arguments are passed to the predict() method supplied by the modeling package.These arguments are particularly useful for mixed-effects or bayesian models (see the online vignettes on the marginaleffects website). Available arguments can vary from model to model, depending on the range of supported arguments by each modeling package. See the "Model-Specific Arguments" section of the ?marginaleffects documentation for a non-exhaustive list of available arguments.

condition

Conditional predictions

  • Character vector (max length 3): Names of the predictors to display.

  • Named list (max length 3): List names correspond to predictors. List elements can be:

    • Numeric vector

    • Function which returns a numeric vector or a set of unique categorical values

    • Shortcut strings for common reference values: "minmax", "quartile", "threenum"

  • 1: x-axis. 2: color/shape. 3: facets.

  • Numeric variables in positions 2 and 3 are summarized by Tukey's five numbers ?stats::fivenum

by

Marginal predictions

  • Character vector (max length 3): Names of the categorical predictors to marginalize across.

  • 1: x-axis. 2: color. 3: facets.

newdata

When newdata is NULL, the grid is determined by the condition argument. When newdata is not NULL, the argument behaves in the same way as in the predictions() function.

type

string indicates the type (scale) of the predictions used to compute contrasts or slopes. This can differ based on the model type, but will typically be a string such as: "response", "link", "probs", or "zero". When an unsupported string is entered, the model-specific list of acceptable values is returned in an error message. When type is NULL, the default value is used. This default is the first model-related row in the marginaleffects:::type_dictionary dataframe.

conf_level

numeric value between 0 and 1. Confidence level to use to build a confidence interval.

wts

string or numeric: weights to use when computing average contrasts or slopes. These weights only affect the averaging in avg_*() or with the by argument, and not the unit-level estimates themselves. Internally, estimates and weights are passed to the weighted.mean() function.

  • string: column name of the weights variable in newdata. When supplying a column name to wts, it is recommended to supply the original data (including the weights variable) explicitly to newdata.

  • numeric: vector of length equal to the number of rows in the original data or in newdata (if supplied).

transform

A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results. For bayesian models, this function is applied to individual draws from the posterior distribution, before computing summaries.

points

Number between 0 and 1 which controls the transparency of raw data points. 0 (default) does not display any points.

rug

TRUE displays tick marks on the axes to mark the distribution of raw data.

Value

A ggplot object that can be further customized using the ggplot2 package

================================================ FILE: docs/reference/plot_mvgam_factors.html ================================================ Latent factor summaries for a fitted mvgam object — plot_mvgam_factors • mvgam Skip to contents

This function takes a fitted mvgam object and returns plots and summary statistics for the latent dynamic factors

Usage

plot_mvgam_factors(object, plot = TRUE)

Arguments

object

list object returned from mvgam. See mvgam()

plot

logical specifying whether factors should be plotted

Value

A dataframe of factor contributions and, optionally, a series of base R plots

Details

If the model in object was estimated using dynamic factors, it is possible that not all factors contributed to the estimated trends. This is due to the regularisation penalty that acts independently on each factor's Gaussian precision, which will squeeze un-needed factors to a white noise process (effectively dropping that factor from the model). In this function, each factor is tested against a null hypothesis of white noise by calculating the sum of the factor's 2nd derivatives. A factor that has a larger contribution will have a larger sum due to the weaker penalty on the factor's precision. If plot == TRUE, the factors are also plotted.

Author

Nicholas J Clark

Examples

# \donttest{
simdat <- sim_mvgam()
mod <- mvgam(y ~ s(season, bs = 'cc',
                  k = 6),
            trend_model = AR(),
            use_lv = TRUE,
            n_lv = 2,
            data = simdat$data_train,
            chains = 2)
#> Warning in 'C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model_0f5e123adc47a9208b1c8eaa9e58906b.stan', line 23, column 31: Found

#>     int division:

#>       n_lv * (n_lv - 1) / 2

#>     Values will be rounded towards zero. If rounding is not desired you can

#>     write

#>     the division as

#>       n_lv * (n_lv - 1) / 2.0

#>     If rounding is intended please use the integer division operator %/%.

#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 1.5 seconds.
#> Chain 2 finished in 1.6 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 1.5 seconds.
#> Total execution time: 1.7 seconds.
#> 
plot_mvgam_factors(mod)

#>         Contribution
#> Factor1    0.2470528
#> Factor2    0.7529472
# }
================================================ FILE: docs/reference/plot_mvgam_forecasts.html ================================================ Plot posterior forecast predictions from mvgam models — plot_mvgam_forecasts • mvgam Skip to contents

Plot posterior forecast predictions from mvgam models

Usage

plot_mvgam_fc(
  object,
  series = 1,
  newdata,
  data_test,
  realisations = FALSE,
  n_realisations = 15,
  hide_xlabels = FALSE,
  xlab,
  ylab,
  ylim,
  n_cores = 1,
  return_forecasts = FALSE,
  return_score = FALSE,
  ...
)

# S3 method for class 'mvgam_forecast'
plot(
  x,
  series = 1,
  realisations = FALSE,
  n_realisations = 15,
  xlab,
  ylab,
  ylim,
  ...
)

Arguments

object

list object of class mvgam. See mvgam()

series

integer specifying which series in the set is to be plotted

newdata

Optional dataframe or list of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original formula. If included, the covariate information in newdata will be used to generate forecasts from the fitted model equations. If this same newdata was originally included in the call to mvgam, then forecasts have already been produced by the generative model and these will simply be extracted and plotted. However if no newdata was supplied to the original model call, an assumption is made that the newdata supplied here comes sequentially after the data supplied as data in the original model (i.e. we assume there is no time gap between the last observation of series 1 in data and the first observation for series 1 in newdata). If newdata contains observations in column y, these observations will be used to compute a Discrete Rank Probability Score for the forecast distribution

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

realisations

logical. If TRUE, forecast realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible forecasts. If FALSE, the default, empirical quantiles of the forecast distribution are shown

n_realisations

integer specifying the number of posterior realisations to plot, if realisations = TRUE. Ignored otherwise

hide_xlabels

logical. If TRUE, no xlabels are printed to allow the user to add custom labels using axis from base R

xlab

label for x axis.

ylab

label for y axis.

ylim

Optional vector of y-axis limits (min, max)

n_cores

integer specifying number of cores for generating forecasts in parallel

return_forecasts

logical. If TRUE, the function will plot the forecast as well as returning the forecast object (as a matrix of dimension n_samples x horizon)

return_score

logical. If TRUE and out of sample test data is provided as newdata, a probabilistic score will be calculated and returned. The score used will depend on the observation family from the fitted model. Discrete families (poisson, negative binomial, tweedie) use the Discrete Rank Probability Score. Other families use the Continuous Rank Probability Score. The value returned is the sum of all scores within the out of sample forecast horizon

...

further par graphical parameters.

x

Object of class mvgam_forecast

Value

A base R graphics plot (for plot_mvgam_fc) or a ggplot object (for plot.mvgam_forecast) and an optional list containing the forecast distribution and the out of sample probabilistic forecast score

Details

plot_mvgam_fc generates posterior predictions from an object of class mvgam, calculates posterior empirical quantiles and plots them against the observed data. If realisations = FALSE, the returned plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of increasingly darker shades or red) as well as the posterior median (as a dark red line). If realisations = FALSE, a set of n_realisations posterior draws are shown. This function produces an older style base R plot, as opposed to plot.mvgam_forecast

plot.mvgam_forecast takes an object of class mvgam_forecast, in which forecasts have already been computed, and plots the resulting forecast distribution as a ggplot object. This function is therefore more versatile and is recommended over the older and clunkier plot_mvgam_fc version

If realisations = FALSE, these posterior quantiles are plotted along with the true observed data that was used to train the model. Otherwise, a spaghetti plot is returned to show possible forecast paths.

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 3, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
            trend_model = AR(),
            noncentred = TRUE,
            data = simdat$data_train,
            chains = 2,
            silent = 2)

# Hindcasts on response scale
hc <- hindcast(mod)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000027337a0d7d0> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr [1:3] "series_1" "series_2" "series_3"
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 0 1 1 0 5 1 3 4 5 4 ...
#>   ..$ series_2: int [1:75] 0 0 0 0 0 2 3 1 2 0 ...
#>   ..$ series_3: int [1:75] 1 0 0 0 3 1 0 1 3 3 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 0 1 0 0 0 0 0 0 1 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] 0 0 2 0 0 0 0 0 1 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 0 1 0 3 0 1 0 0 0 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(hc, series = 1)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 2)
#> No non-missing values in test_observations; cannot calculate forecast score

plot(hc, series = 3)
#> No non-missing values in test_observations; cannot calculate forecast score


# Forecasts on response scale
fc <- forecast(mod, newdata = simdat$data_test)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000027337a0d7d0> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 0 1 1 0 5 1 3 4 5 4 ...
#>   ..$ series_2: int [1:75] 0 0 0 0 0 2 3 1 2 0 ...
#>   ..$ series_3: int [1:75] 1 0 0 0 3 1 0 1 3 3 ...
#>  $ train_times       : int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:25] 1 0 4 2 4 4 3 5 2 2 ...
#>   ..$ series_2: int [1:25] 1 1 0 0 0 5 1 1 0 0 ...
#>   ..$ series_3: int [1:25] 0 0 1 0 3 3 0 3 0 0 ...
#>  $ test_times        : int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:75] 0 1 0 0 0 0 0 0 1 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:75] 0 0 2 0 0 0 0 0 1 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:75] 0 1 0 3 0 1 0 0 0 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: int [1:1000, 1:25] 0 0 1 0 0 1 0 1 1 0 ...
#>   ..$ series_2: int [1:1000, 1:25] 1 0 0 1 1 1 1 0 2 0 ...
#>   ..$ series_3: int [1:1000, 1:25] 1 0 0 0 3 0 0 0 0 0 ...
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(fc, series = 1)
#> Out of sample DRPS:
#> 27.997178

plot(fc, series = 2)
#> Out of sample DRPS:
#> 12.082925

plot(fc, series = 3)
#> Out of sample DRPS:
#> 13.097777


# Forecasts as expectations
fc <- forecast(mod, newdata = simdat$data_test, type = 'expected')
plot(fc, series = 1)

plot(fc, series = 2)

plot(fc, series = 3)


# Dynamic trend extrapolations
fc <- forecast(mod, newdata = simdat$data_test, type = 'trend')
plot(fc, series = 1)

plot(fc, series = 2)

plot(fc, series = 3)

# }
================================================ FILE: docs/reference/plot_mvgam_pterms.html ================================================ Plot parametric term partial effects for mvgam models — plot_mvgam_pterms • mvgam Skip to contents

This function plots posterior empirical quantiles for partial effects of parametric terms

Usage

plot_mvgam_pterms(object, trend_effects = FALSE)

Arguments

object

list object of class mvgam. See mvgam()

trend_effects

logical. If TRUE and a trend_formula was used in model fitting, terms from the trend (i.e. process) model will be plotted

Value

A base R graphics plot

Details

Posterior empirical quantiles of each parametric term's partial effect estimates (on the link scale) are calculated and visualised as ribbon plots. These effects can be interpreted as the partial effect that a parametric term contributes when all other terms in the model have been set to 0

================================================ FILE: docs/reference/plot_mvgam_randomeffects.html ================================================ Plot random effect terms from mvgam models — plot_mvgam_randomeffects • mvgam Skip to contents

This function plots posterior empirical quantiles for random effect smooths (bs = re)

Usage

plot_mvgam_randomeffects(object, trend_effects = FALSE)

Arguments

object

list object of class mvgam. See mvgam()

trend_effects

logical. If TRUE and a trend_formula was used in model fitting, terms from the trend (i.e. process) model will be plotted

Value

A base R graphics plot

Details

Posterior empirical quantiles of random effect coefficient estimates (on the link scale) are calculated and visualised as ribbon plots. Labels for coefficients are taken from the levels of the original factor variable that was used to specify the smooth in the model's formula

================================================ FILE: docs/reference/plot_mvgam_resids.html ================================================ Residual diagnostics for a fitted mvgam object — plot_mvgam_resids • mvgam Skip to contents

This function takes a fitted mvgam object and returns various residual diagnostic plots

Usage

plot_mvgam_resids(object, series = 1, n_draws = 100L, n_points = 1000L)

Arguments

object

list object returned from mvgam. See mvgam()

series

integer specifying which series in the set is to be plotted

n_draws

integer specifying the number of posterior residual draws to use for calculating uncertainty in the "ACF" and "pACF" frames. Default is 100

n_points

integer specifying the maximum number of points to show in the "Resids vs Fitted" and "Normal Q-Q Plot" frames. Default is 1000

Value

A series of facetted ggplot object

Details

A total of four ggplot plots are generated to examine posterior Dunn-Smyth residuals for the specified series. Plots include a residuals vs fitted values plot, a Q-Q plot, and two plots to check for any remaining temporal autocorrelation in the residuals. Note, all plots only report statistics from a sample of up to 100 posterior draws (to save computational time), so uncertainty in these relationships may not be adequately represented.

Author

Nicholas J Clark

Nicholas J Clark and Matthijs Hollanders

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 3, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
            trend_model = AR(),
            noncentred = TRUE,
            data = simdat$data_train,
            chains = 2,
            silent = 2)

# Plot Dunn Smyth residuals for some series
plot_mvgam_resids(mod)

plot_mvgam_resids(mod, series = 2)

# }
================================================ FILE: docs/reference/plot_mvgam_series.html ================================================ Plot observed time series used for mvgam modelling — plot_mvgam_series • mvgam Skip to contents

This function takes either a fitted mvgam object or a data.frame object and produces plots of observed time series, ACF, CDF and histograms for exploratory data analysis

Usage

plot_mvgam_series(
  object,
  data,
  newdata,
  y = "y",
  lines = TRUE,
  series = 1,
  n_bins,
  log_scale = FALSE
)

Arguments

object

Optional list object returned from mvgam. Either object or data must be supplied.

data

Optional data.frame or list of training data containing at least 'series' and 'time'. Use this argument if training data have been gathered in the correct format for mvgam modelling but no model has yet been fitted.

newdata

Optional data.frame or list of test data containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of formula. If included, the observed values in the test data are compared to the model's forecast distribution for exploring biases in model predictions.

y

Character. What is the name of the outcome variable in the supplied data? Defaults to 'y'

lines

Logical. If TRUE, line plots are used for visualizing time series. If FALSE, points are used.

series

Either a integer specifying which series in the set is to be plotted or the string 'all', which plots all series available in the supplied data

n_bins

integer specifying the number of bins to use for binning observed values when plotting a the histogram. Default is to use the number of bins returned by a call to hist in base R

log_scale

logical. If series == 'all', this flag is used to control whether the time series plot is shown on the log scale (using log(Y + 1)). This can be useful when visualizing many series that may have different observed ranges. Default is FALSE

Value

A set of ggplot objects. If series is an integer, the plots will show observed time series, autocorrelation and cumulative distribution functions, and a histogram for the series. If series == 'all', a set of observed time series plots is returned in which all series are shown on each plot but only a single focal series is highlighted, with all remaining series shown as faint gray lines.

Author

Nicholas J Clark and Matthijs Hollanders

Examples

# Simulate and plot series with observations bounded at 0 and 1 (Beta responses)
sim_data <- sim_mvgam(family = betar(),
                     trend_model = RW(), prop_trend = 0.6)
plot_mvgam_series(data = sim_data$data_train, series = 'all')

plot_mvgam_series(data = sim_data$data_train,
                 newdata = sim_data$data_test, series = 1)


# Now simulate series with overdispersed discrete observations
sim_data <- sim_mvgam(family = nb(), trend_model = RW(),
                     prop_trend = 0.6, phi = 10)
plot_mvgam_series(data = sim_data$data_train, series = 'all')

================================================ FILE: docs/reference/plot_mvgam_smooth.html ================================================ Plot smooth terms from mvgam models — plot_mvgam_smooth • mvgam Skip to contents

This function plots posterior empirical quantiles for a series-specific smooth term

Usage

plot_mvgam_smooth(
  object,
  trend_effects = FALSE,
  series = 1,
  smooth,
  residuals = FALSE,
  n_resid_bins = 25,
  realisations = FALSE,
  n_realisations = 15,
  derivatives = FALSE,
  newdata
)

Arguments

object

list object of class mvgam. See mvgam()

trend_effects

logical. If TRUE and a trend_formula was used in model fitting, terms from the trend (i.e. process) model will be plotted

series

integer specifying which series in the set is to be plotted

smooth

either a character or integer specifying which smooth term to be plotted

residuals

logical. If TRUE then posterior quantiles of partial residuals are added to plots of 1-D smooths as a series of ribbon rectangles. Partial residuals for a smooth term are the median Dunn-Smyth residuals that would be obtained by dropping the term concerned from the model, while leaving all other estimates fixed (i.e. the estimates for the term plus the original median Dunn-Smyth residuals). Note that because mvgam works with Dunn-Smyth residuals and not working residuals, which are used by mgcv, the magnitudes of partial residuals will be different to what you would expect from plot.gam. Interpretation is similar though, as these partial residuals should be evenly scattered around the smooth function if the function is well estimated

n_resid_bins

integer specifying the number of bins group the covariate into when plotting partial residuals. Setting this argument too high can make for messy plots that are difficult to interpret, while setting it too low will likely mask some potentially useful patterns in the partial residuals. Default is 25

realisations

logical. If TRUE, posterior realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible functions. If FALSE, the default, empirical quantiles of the posterior distribution are shown

n_realisations

integer specifying the number of posterior realisations to plot, if realisations = TRUE. Ignored otherwise

derivatives

logical. If TRUE, an additional plot will be returned to show the estimated 1st derivative for the specified smooth (Note, this only works for univariate smooths)

newdata

Optional dataframe for predicting the smooth, containing at least 'series' in addition to any other variables included in the linear predictor of the original model's formula. Note that this currently is only supported for plotting univariate smooths

Value

A base R graphics plot

Details

Smooth functions are shown as empirical quantiles (or spaghetti plots) of posterior partial expectations across a sequence of values between the variable's min and max, while zeroing out effects of all other variables. At present, only univariate and bivariate smooth plots are allowed, though note that bivariate smooths rely on default behaviour from plot.gam. plot_mvgam_smooth generates posterior predictions from an object of class mvgam, calculates posterior empirical quantiles and plots them. If realisations = FALSE, the returned plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of increasingly darker shades or red) as well as the posterior median (as a dark red line). If realisations = FALSE, a set of n_realisations posterior draws are shown. For more nuanced visualisation, supply newdata just as you would when predicting from a gam model or use the more flexible conditional_effects.mvgam. Alternatively, if you prefer to use partial effect plots in the style of gratia, and if you have the gratia package installed, you can use draw.mvgam. See gratia_mvgam_enhancements for details.

Author

Nicholas J Clark

================================================ FILE: docs/reference/plot_mvgam_trend.html ================================================ Plot latent trend predictions from mvgam models — plot_mvgam_trend • mvgam Skip to contents

Plot latent trend predictions from mvgam models

Usage

plot_mvgam_trend(
  object,
  series = 1,
  newdata,
  data_test,
  realisations = FALSE,
  n_realisations = 15,
  n_cores = 1,
  derivatives = FALSE,
  hide_xlabels = FALSE,
  xlab,
  ylab,
  ...
)

Arguments

object

list object returned from mvgam. See mvgam()

series

integer specifying which series in the set is to be plotted

newdata

Optional dataframe or list of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original formula.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

realisations

logical. If TRUE, posterior trend realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible trend paths. If FALSE, the default, empirical quantiles of the posterior distribution are shown

n_realisations

integer specifying the number of posterior realisations to plot, if realisations = TRUE. Ignored otherwise

n_cores

integer specifying number of cores for generating trend forecasts in parallel

derivatives

logical. If TRUE, an additional plot will be returned to show the estimated 1st derivative for the estimated trend

hide_xlabels

logical. If TRUE, no xlabels are printed to allow the user to add custom labels using axis from base R. Ignored if derivatives = TRUE

xlab

label for x axis.

ylab

label for y axis.

...

further par graphical parameters.

Value

A base R graphics plot

Examples

# \donttest{
simdat <- sim_mvgam(n_series = 3, trend_model = 'AR1')
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
            trend_model = AR(),
            noncentred = TRUE,
            data = simdat$data_train,
            chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 1.0 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 1.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 1.0 seconds.
#> Total execution time: 1.2 seconds.
#> 

# Plot estimated trends for some series
plot_mvgam_trend(mod)

plot_mvgam_trend(mod, series = 2)


# Extrapolate trends forward in time and plot on response scale
plot_mvgam_trend(mod, newdata = simdat$data_test)

plot_mvgam_trend(mod, newdata = simdat$data_test, series = 2)


# But it is recommended to compute extrapolations for all series
# first and then plot
trend_fc <- forecast(mod, newdata = simdat$data_test)
plot(trend_fc, series = 1)
#> Out of sample DRPS:
#> 29.244167

plot(trend_fc, series = 2)
#> Out of sample DRPS:
#> 29.555438

# }
================================================ FILE: docs/reference/plot_mvgam_uncertainty.html ================================================ Plot forecast uncertainty contributions from mvgam models — plot_mvgam_uncertainty • mvgam Skip to contents

Plot forecast uncertainty contributions from mvgam models

Usage

plot_mvgam_uncertainty(
  object,
  series = 1,
  newdata,
  data_test,
  legend_position = "topleft",
  hide_xlabels = FALSE
)

Arguments

object

list object returned from mvgam. See mvgam()

series

integer specifying which series in the set is to be plotted

newdata

A dataframe or list containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of formula

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

legend_position

The location may also be specified by setting x to a single keyword from the list: "none", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location (if it is not "none").

hide_xlabels

logical. If TRUE, no xlabels are printed to allow the user to add custom labels using axis from base R

Value

A base R graphics plot

Details

The basic idea of this function is to compute forecasts by ignoring one of the two primary components in a correlated residual model (i.e. by either ignoring the linear predictor effects or by ignoring the residual dynamics). Some caution is required however, as this function was designed early in the mvgam development cycle and there are now many types of models that it cannot handle very well. For example, models with shared latent states, or any type of State-Space models that include terms in the trend_formula, will either fail or give nonsensical results. Improvements are in the works to provide a more general way to decompose forecast uncertainties, so please check back at a later date.

================================================ FILE: docs/reference/portal_data.html ================================================ Portal Project rodent capture survey data — portal_data • mvgam Skip to contents

A dataset containing timeseries of total captures (across all control plots) for select rodent species from the Portal Project

Usage

portal_data

Format

A dataframe containing the following fields:

moon

time of sampling in lunar cycles

DM

Total captures of species Dipodomys merriami

DO

Total captures of species Dipodomys ordii

PP

Total captures of species Chaetodipus penicillatus

OT

Total captures of species Onychomys torridus

year

Sampling year

month

Sampling month

mintemp

Monthly mean minimum temperature

precipitation

Monthly mean precipitation

ndvi

Monthly mean Normalised Difference Vegetation Index

================================================ FILE: docs/reference/posterior_epred.mvgam.html ================================================ Draws from the expected value of the posterior predictive distribution for mvgam objects — posterior_epred.mvgam • mvgam Skip to contents

Compute posterior draws of the expected value of the posterior predictive distribution (i.e. the conditional expectation). Can be performed for the data used to fit the model (posterior predictive checks) or for new data. By definition, these predictions have smaller variance than the posterior predictions performed by the posterior_predict.mvgam method. This is because only the uncertainty in the expected value of the posterior predictive distribution is incorporated in the draws computed by posterior_epred while the residual error is ignored there. However, the estimated means of both methods averaged across draws should be very similar.

Usage

# S3 method for class 'mvgam'
posterior_epred(
  object,
  newdata,
  data_test,
  ndraws = NULL,
  process_error = TRUE,
  ...
)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

ndraws

Positive integer indicating how many posterior draws should be used. If NULL (the default) all draws are used.

process_error

Logical. If TRUE and newdata is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions. If no newdata is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)

...

Ignored

Value

A matrix of dimension n_samples x new_obs, where n_samples is the number of posterior samples from the fitted object and n_obs is the number of observations in newdata

Details

Note that for all types of predictions for models that did not include a trend_formula, uncertainty in the dynamic trend component can be ignored by setting process_error = FALSE. However, if a trend_formula was supplied in the model, predictions for this component cannot be ignored. If process_error = TRUE, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from brms models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a mvgam model, while the forecasting functions plot_mvgam_fc and forecast.mvgam are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends.

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
            trend_model = AR(),
            noncentred = TRUE,
            data = simdat$data_train,
            chains = 2,
            silent = 2)

# Compute posterior expectations
expectations <- posterior_epred(mod)
str(expectations)
#>  num [1:1000, 1:75] 2.85 2.41 2.5 2.5 2.55 ...
# }
================================================ FILE: docs/reference/posterior_linpred.mvgam.html ================================================ Posterior draws of the linear predictor for mvgam objects — posterior_linpred.mvgam • mvgam Skip to contents

Compute posterior draws of the linear predictor, that is draws before applying any link functions or other transformations. Can be performed for the data used to fit the model (posterior predictive checks) or for new data.

Usage

# S3 method for class 'mvgam'
posterior_linpred(
  object,
  transform = FALSE,
  newdata,
  ndraws = NULL,
  data_test,
  process_error = TRUE,
  ...
)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

transform

Logical; if FALSE (the default), draws of the linear predictor are returned. If TRUE, draws of the transformed linear predictor, i.e. the conditional expectation, are returned.

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.

ndraws

Positive integer indicating how many posterior draws should be used. If NULL (the default) all draws are used.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

process_error

Logical. If TRUE and newdata is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions. If no newdata is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)

...

Ignored

Value

A matrix of dimension n_samples x new_obs, where n_samples is the number of posterior samples from the fitted object and n_obs is the number of observations in newdata

Details

Note that for all types of predictions for models that did not include a trend_formula, uncertainty in the dynamic trend component can be ignored by setting process_error = FALSE. However, if a trend_formula was supplied in the model, predictions for this component cannot be ignored. If process_error = TRUE, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from brms models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a mvgam model, while the forecasting functions plot_mvgam_fc and forecast.mvgam are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends.

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c2df534e8.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Extract linear predictor values
linpreds <- posterior_linpred(mod)
str(linpreds)
#>  num [1:1000, 1:75] -0.7353 -0.5959 -0.8928 -0.0926 -2.269 ...
# }
================================================ FILE: docs/reference/posterior_predict.mvgam.html ================================================ Draws from the posterior predictive distribution for mvgam objects — posterior_predict.mvgam • mvgam Skip to contents

Compute posterior draws of the posterior predictive distribution. Can be performed for the data used to fit the model (posterior predictive checks) or for new data. By definition, these draws have higher variance than draws of the expected value of the posterior predictive distribution computed by posterior_epred.mvgam. This is because the residual error is incorporated in posterior_predict. However, the estimated means of both methods averaged across draws should be very similar.

Usage

# S3 method for class 'mvgam'
posterior_predict(
  object,
  newdata,
  data_test,
  ndraws = NULL,
  process_error = TRUE,
  ...
)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

ndraws

Positive integer indicating how many posterior draws should be used. If NULL (the default) all draws are used.

process_error

Logical. If TRUE and newdata is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions. If no newdata is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)

...

Ignored

Value

A matrix of dimension n_samples x new_obs, where n_samples is the number of posterior samples from the fitted object and n_obs is the number of observations in newdata

Details

Note that for all types of predictions for models that did not include a trend_formula, uncertainty in the dynamic trend component can be ignored by setting process_error = FALSE. However, if a trend_formula was supplied in the model, predictions for this component cannot be ignored. If process_error = TRUE, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from brms models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a mvgam model, while the forecasting functions plot_mvgam_fc and forecast.mvgam are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends.

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
            trend_model = AR(),
            data = simdat$data_train,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c6332e42.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Compute posterior predictions
predictions <- posterior_predict(mod)
str(predictions)
#>  int [1:1000, 1:75] 1 0 1 1 2 0 1 2 0 0 ...
# }
================================================ FILE: docs/reference/pp_check.mvgam.html ================================================ Posterior Predictive Checks for mvgam models — pp_check.mvgam • mvgam Skip to contents

Perform unconditional posterior predictive checks with the help of the bayesplot package.

Usage

# S3 method for class 'mvgam'
pp_check(
  object,
  type,
  ndraws = NULL,
  prefix = c("ppc", "ppd"),
  group = NULL,
  x = NULL,
  newdata = NULL,
  ...
)

Arguments

object

An object of class mvgam.

type

Type of the ppc plot as given by a character string. See PPC for an overview of currently supported types. You may also use an invalid type (e.g. type = "xyz") to get a list of supported types in the resulting error message.

ndraws

Positive integer indicating how many posterior draws should be used. If NULL all draws are used. If not specified, the number of posterior draws is chosen automatically. Ignored if draw_ids is not NULL.

prefix

The prefix of the bayesplot function to be applied. Either `"ppc"` (posterior predictive check; the default) or `"ppd"` (posterior predictive distribution), the latter being the same as the former except that the observed data is not shown for `"ppd"`.

group

Optional name of a factor variable in the model by which to stratify the ppc plot. This argument is required for ppc *_grouped types and ignored otherwise.

x

Optional name of a variable in the model. Only used for ppc types having an x argument and ignored otherwise.

newdata

Optional dataframe or list of test data containing the variables included in the linear predictor of formula. If not supplied, predictions are generated for the original observations used for the model fit. Ignored if using one of the residual plots (i.e. 'resid_hist')

...

Further arguments passed to predict.mvgam as well as to the PPC function specified in type.

Value

A ggplot object that can be further customized using the ggplot2 package.

Details

Unlike the conditional posterior checks provided by ppc, This function computes unconditional posterior predictive checks (i.e. it generates predictions for fake data without considering the true observations associated with those fake data). For a detailed explanation of each of the ppc functions, see the PPC documentation of the bayesplot package.

See also

Author

Nicholas J Clark

Examples

# \donttest{
simdat <- sim_mvgam(seasonality = "hierarchical")
mod <- mvgam(
  y ~ series +
    s(season, bs = "cc", k = 6) +
    s(season, series, bs = "fs", k = 4),
  data = simdat$data_train,
  chains = 2,
  silent = 2
)
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> Warning: model has repeated 1-d smooths of same variable.
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-16186c053d9.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Use pp_check(mod, type = "xyz") for a list of available plot types

# Default is a density overlay for all observations
pp_check(mod)
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.


# Rootograms particularly useful for count data
pp_check(mod, type = "rootogram")
#> Using all posterior draws for ppc type 'rootogram' by default.


# Grouping plots by series is useful
pp_check(mod,
  type = "bars_grouped",
  group = "series", ndraws = 50
)

pp_check(mod,
  type = "ecdf_overlay_grouped",
  group = "series", ndraws = 50
)

pp_check(mod,
  type = "stat_freqpoly_grouped",
  group = "series", ndraws = 50
)
#> Note: in most cases the default test statistic 'mean' is too weak to detect anything of interest.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


# Several types can be used to plot distributions of randomized
# quantile residuals
pp_check(
  object = mod,
  x = "season",
  type = "resid_ribbon"
)
#> Using 10 posterior draws for ppc type 'resid_ribbon' by default.

pp_check(
  object = mod,
  x = "season",
  group = "series",
  type = "resid_ribbon_grouped"
)
#> Using 10 posterior draws for ppc type 'resid_ribbon_grouped' by default.

pp_check(mod,
  ndraws = 5,
  type = "resid_hist_grouped",
  group = "series"
)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


# Custom functions accepted
pp_check(mod, type = "stat", stat = function(x) mean(x == 0))
#> Using all posterior draws for ppc type 'stat' by default.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(mod,
  type = "stat_grouped",
  stat = function(x) mean(x == 0),
  group = "series"
)
#> Using all posterior draws for ppc type 'stat_grouped' by default.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


# Some functions accept covariates to set the x-axes
pp_check(mod,
  x = "season",
  type = "ribbon_grouped",
  prob = 0.5,
  prob_outer = 0.8,
  group = "series"
)
#> Using all posterior draws for ppc type 'ribbon_grouped' by default.


# Many plots can be made without the observed data
pp_check(mod, prefix = "ppd")
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.

# }

================================================ FILE: docs/reference/ppc.mvgam.html ================================================ Plot conditional posterior predictive checks from mvgam models — ppc.mvgam • mvgam Skip to contents

Plot conditional posterior predictive checks from mvgam models

Usage

ppc(object, ...)

# S3 method for class 'mvgam'
ppc(
  object,
  newdata,
  data_test,
  series = 1,
  type = "hist",
  n_bins,
  legend_position,
  xlab,
  ylab,
  ...
)

Arguments

object

list object returned from mvgam. See mvgam()

...

further par graphical parameters.

newdata

Optional dataframe or list of test data containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of formula. If included, the observed values in the test data are compared to the model's forecast distribution for exploring biases in model predictions. Note this is only useful if the same newdata was also included when fitting the original model.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

series

integer specifying which series in the set is to be plotted

type

character specifying the type of posterior predictive check to calculate and plot. Valid options are: 'rootogram', 'mean', 'hist', 'density', 'prop_zero', 'pit' and 'cdf'

n_bins

integer specifying the number of bins to use for binning observed values when plotting a rootogram or histogram. Default is 50 bins for a rootogram, which means that if there are >50 unique observed values, bins will be used to prevent overplotting and facilitate interpretation. Default for a histogram is to use the number of bins returned by a call to hist in base R

legend_position

The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location. Or alternatively, use "none" to hide the legend.

xlab

label for x axis.

ylab

label for y axis.

Value

A base R graphics plot showing either a posterior rootogram (for type == 'rootogram'), the predicted vs observed mean for the series (for type == 'mean'), predicted vs observed proportion of zeroes for the series (for type == 'prop_zero'),predicted vs observed histogram for the series (for type == 'hist'), kernel density or empirical CDF estimates for posterior predictions (for type == 'density' or type == 'cdf') or a Probability Integral Transform histogram (for type == 'pit').

Details

Conditional posterior predictions are drawn from the fitted mvgam and compared against the empirical distribution of the observed data for a specified series to help evaluate the model's ability to generate unbiased predictions. For all plots apart from type = 'rootogram', posterior predictions can also be compared to out of sample observations as long as these observations were included as ' data_test' in the original model fit and supplied here. Rootograms are currently only plotted using the ' hanging' style.
Note that the predictions used for these plots are conditional on the observed data, i.e. they are those predictions that have been generated directly within the mvgam() model. They can be misleading if the model included flexible dynamic trend components. For a broader range of posterior checks that are created using unconditional "new data" predictions, see pp_check.mvgam

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some smooth effects and fit a model
set.seed(0)
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model
mod <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3),
  data = dat,
  family = gaussian(),
  chains = 2,
  silent = 2
)
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-161817fb4d8a.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_mat
#> h/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for con
#> structor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Posterior checks
ppc(mod, type = "hist")

ppc(mod, type = "density")

ppc(mod, type = "cdf")


# Many more options are available with pp_check()
pp_check(mod)
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.

pp_check(mod, type = "ecdf_overlay")
#> Using 10 posterior draws for ppc type 'ecdf_overlay' by default.

pp_check(mod, type = "freqpoly")
#> Using 10 posterior draws for ppc type 'freqpoly' by default.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# }
================================================ FILE: docs/reference/predict.mvgam.html ================================================ Predict from a fitted mvgam model — predict.mvgam • mvgam Skip to contents

Predict from a fitted mvgam model

Usage

# S3 method for mvgam
predict(
  object,
  newdata,
  data_test,
  type = "link",
  process_error = FALSE,
  summary = TRUE,
  robust = FALSE,
  probs = c(0.025, 0.975),
  ...
)

Arguments

object

list object of class mvgam or jsdgam. See mvgam()

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

type

When this has the value link (default) the linear predictor is calculated on the link scale. If expected is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When response is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When variance is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When type = "terms", each component of the linear predictor is returned separately in the form of a list (possibly with standard errors, if summary = TRUE): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type latent_N will return the estimated latent abundances from an N-mixture distribution, while type detection will return the estimated detection probability from an N-mixture distribution

process_error

Logical. If TRUE and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from the latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions

summary

Should summary statistics be returned instead of the raw values? Default is TRUE..

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

...

Ignored

Value

Predicted values on the appropriate scale. If summary = FALSE and type != "terms", the output is a matrix of dimension n_draw x n_observations

containing predicted values for each posterior draw in object.

If summary = TRUE and type != "terms", the output is an n_observations x E

matrix. The number of summary statistics E is equal to 2 + length(probs): The Estimate column contains point estimates (either mean or median depending on argument robust), while the Est.Error column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument robust). The remaining columns starting with Q contain quantile estimates as specified via argument probs.

If type = "terms" and summary = FALSE, the output is a named list

containing a separate slot for each effect, with the effects returned as matrices of dimension n_draw x 1. If summary = TRUE, the output resembles that from predict.gam when using the call predict.gam(object, type = "terms", se.fit = TRUE), where mean contributions from each effect are returned in matrix form while standard errors (representing the interval: (max(probs) - min(probs)) / 2) are returned in a separate matrix

Details

Note that for all types of predictions for models that did not include a trend_formula, uncertainty in the dynamic trend component can be ignored by setting process_error = FALSE. However, if a trend_formula was supplied in the model, predictions for this component cannot be ignored. If process_error = TRUE, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from brms models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a mvgam model, while the forecasting functions plot_mvgam_fc and forecast.mvgam are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends.

Examples

# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and independent AR1 dynamic processes
set.seed(111)
simdat <- sim_mvgam(seasonality = 'hierarchical',
                   trend_model = 'AR1',
                   family = gaussian())

# Fit a model with shared seasonality
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             data = simdat$data_train,
             family = gaussian(),
             trend_model = AR(),
             noncentred = TRUE,
             chains = 2,
             silent = 2)

# Generate predictions against observed data
preds <- predict(mod1,
                summary = TRUE)
head(preds)
#>          Estimate Est.Error       Q2.5       Q97.5
#> [1,] -0.004792439 0.1100971 -0.2554910  0.20130427
#> [2,] -0.004792439 0.1100971 -0.2554910  0.20130427
#> [3,] -0.004792439 0.1100971 -0.2554910  0.20130427
#> [4,] -0.217605056 0.1047161 -0.4358171 -0.03374743
#> [5,] -0.217605056 0.1047161 -0.4358171 -0.03374743
#> [6,] -0.217605056 0.1047161 -0.4358171 -0.03374743

# Generate predictions against test data
preds <- predict(mod1,
                newdata = simdat$data_test,
                summary = TRUE)
head(preds)
#>        Estimate Est.Error        Q2.5      Q97.5
#> [1,] -0.1711986 0.1085784 -0.38794662 0.02847842
#> [2,] -0.1711986 0.1085784 -0.38794662 0.02847842
#> [3,] -0.1711986 0.1085784 -0.38794662 0.02847842
#> [4,]  0.2894259 0.1312018  0.03487911 0.54980798
#> [5,]  0.2894259 0.1312018  0.03487911 0.54980798
#> [6,]  0.2894259 0.1312018  0.03487911 0.54980798
# }
================================================ FILE: docs/reference/print.mvgam.html ================================================ Summary for a fitted mvgam object — print.mvgam • mvgam Skip to contents

This function takes a fitted mvgam or jsdgam object and prints a quick summary

Usage

# S3 method for class 'mvgam'
print(x, ...)

Arguments

x

list object returned from mvgam

...

Ignored

Value

A list is printed on-screen

Details

A brief summary of the model's call is printed

Author

Nicholas J Clark

================================================ FILE: docs/reference/reexports.html ================================================ Objects exported from other packages — reexports • mvgam Skip to contents
================================================ FILE: docs/reference/residual_cor.jsdgam.html ================================================ Extract residual correlations based on latent factors from a fitted jsdgam — residual_cor.jsdgam • mvgam Skip to contents

Compute residual correlation estimates from Joint Species Distribution jsdgam models using latent factor loadings

Usage

residual_cor(object, ...)

# S3 method for jsdgam
residual_cor(
  object,
  summary = TRUE,
  robust = FALSE,
  probs = c(0.025, 0.975),
  ...
)

Arguments

object

list object of class mvgam resulting from a call to jsdgam()

...

ignored

summary

Should summary statistics be returned instead of the raw values? Default is TRUE..

robust

If FALSE (the default) the mean is used as a measure of central tendency. If TRUE, the median is used instead. Only used if summary is TRUE

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

Value

If summary = TRUE, a list with the following components:

cor, cor_lower, cor_upper

A set of \(p \times p\) correlation matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to probs

sig_cor

A \(p \times p\) correlation matrix containing only those correlations whose credible interval does not contain zero. All other correlations are set to zero

prec, prec_lower, prec_upper

A set of \(p \times p\) precision matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to probs

sig_prec

A \(p \times p\) precision matrix containing only those precisions whose credible interval does not contain zero. All other precisions are set to zero

cov

A \(p \times p\) posterior median or mean covariance matrix

trace

The median/mean point estimator of the trace (sum of the diagonal elements) of the residual covariance matrix cov

If summary = FALSE, this function returns a list containing the following components:

all_cormat

A \(n_{draws} \times p \times p\) array of posterior residual correlation matrix draws

all_covmat

A \(n_{draws} \times p \times p\) array of posterior residual covariance matrix draws

all_presmat

A \(n_{draws} \times p \times p\) array of posterior residual precision matrix draws

all_trace

A \(n_{draws}\) vector of posterior covariance trace draws

Details

Hui (2016) provides an excellent description of the quantities that this function calculates, so this passage is heavily paraphrased from his associated boral package.

In Joint Species Distribution Models, the residual covariance matrix is calculated based on the matrix of latent factor loading matrix \(\Theta\), where the residual covariance matrix \(\Sigma = \Theta\Theta'\). A strong residual covariance/correlation matrix between two species can be interpreted as evidence of species interaction (e.g., facilitation or competition), missing covariates, as well as any additional species correlation not accounted for by shared environmental captured in formula.

The residual precision matrix (also known as partial correlation matrix, Ovaskainen et al., 2016) is defined as the inverse of the residual correlation matrix. The precision matrix is often used to identify direct or causal relationships between two species e.g., two species can have a zero precision but still be correlated, which can be interpreted as saying that two species are not directly associated, but they are still correlated through other species. In other words, they are conditionally independent given the other species. It is important that the precision matrix does not exhibit the exact same properties of the correlation e.g., the diagonal elements are not equal to 1. Nevertheless, relatively larger values of precision may imply stronger direct relationships between two species.

In addition to the residual correlation and precision matrices, the median or mean point estimator of trace of the residual covariance matrix is returned, \(\sum\limits_{j=1}^p [\Theta\Theta']_{jj}\). Often used in other areas of multivariate statistics, the trace may be interpreted as the amount of covariation explained by the latent factors. One situation where the trace may be useful is when comparing a pure latent factor model (where no terms are suppled to formula) versus a model with latent factors and some additional predictors in formula -- the proportional difference in trace between these two models may be interpreted as the proportion of covariation between species explained by the predictors in formula. Of course, the trace itself is random due to the MCMC sampling, and so it is not always guaranteed to produce sensible answers.

References

Francis KC Hui (2016). BORAL - Bayesian ordination and regression analysis of multivariate abundance data in R. Methods in Ecology and Evolution. 7, 744-750.

Otso Ovaskainen et al. (2016). Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution, 7, 549-555.

See also

Author

Nicholas J Clark

================================================ FILE: docs/reference/residuals.mvgam.html ================================================ Posterior draws of residuals from mvgam models — residuals.mvgam • mvgam Skip to contents

This method extracts posterior draws of Dunn-Smyth (randomized quantile) residuals in the order in which the data were supplied to the model. It included additional arguments for obtaining summaries of the computed residuals

Usage

# S3 method for class 'mvgam'
residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)

Arguments

object

An object of class mvgam

summary

Should summary statistics be returned instead of the raw values? Default is TRUE..

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

...

ignored

Value

An array of randomized quantile residual values. If summary = FALSE the output resembles those of posterior_epred.mvgam and predict.mvgam.

If summary = TRUE the output is an n_observations x E matrix. The number of summary statistics E is equal to 2 + length(probs): The Estimate column contains point estimates (either mean or median depending on argument robust), while the Est.Error column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument robust). The remaining columns starting with Q contain quantile estimates as specified via argument probs.

Details

This method gives residuals as Dunn-Smyth (randomized quantile) residuals. Any observations that were missing (i.e. NA) in the original data will have missing values in the residuals

See also

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some data and fit a model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2,
             silent = 2)

# Extract posterior residuals
resids <- residuals(mod)
str(resids)
#>  num [1:75, 1:4] 0.7284 0.228 0.7504 -0.579 -0.0936 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"

# Or add them directly to the observed data, along with fitted values
augment(mod, robust = FALSE, probs = c(0.25, 0.75))
#> # A tibble: 75 × 14
#>        y season  year series    time .observed .fitted .fit.variability .fit.cred.low .fit.cred.high  .resid
#>    <int>  <int> <int> <fct>    <int>     <int>   <dbl>            <dbl>         <dbl>          <dbl>   <dbl>
#>  1     1      1     1 series_1     1         1   0.542            0.257         0.373          0.645  0.728 
#>  2     1      2     1 series_1     2         1   0.962            0.382         0.703          1.16   0.228 
#>  3     3      3     1 series_1     3         3   2.06             0.745         1.58           2.42   0.750 
#>  4     1      4     1 series_1     4         1   1.97             0.703         1.49           2.40  -0.579 
#>  5     1      5     1 series_1     5         1   1.32             0.497         0.965          1.62  -0.0936
#>  6     0      6     1 series_1     6         0   0.673            0.298         0.464          0.835 -0.736 
#>  7     0      7     1 series_1     7         0   0.510            0.240         0.346          0.620 -0.645 
#>  8     1      8     1 series_1     8         1   0.518            0.265         0.332          0.636  0.776 
#>  9     0      9     1 series_1     9         0   0.418            0.204         0.279          0.514 -0.583 
#> 10     0     10     1 series_1    10         0   0.381            0.187         0.253          0.487 -0.482 
#> # ℹ 65 more rows
#> # ℹ 3 more variables: .resid.variability <dbl>, .resid.cred.low <dbl>, .resid.cred.high <dbl>
# }
================================================ FILE: docs/reference/score.mvgam_forecast.html ================================================ Compute probabilistic forecast scores for mvgam models — score.mvgam_forecast • mvgam Skip to contents

Compute probabilistic forecast scores for mvgam models

Usage

# S3 method for class 'mvgam_forecast'
score(
  object,
  score = "crps",
  log = FALSE,
  weights,
  interval_width = 0.9,
  n_cores = 1,
  ...
)

score(object, ...)

Arguments

object

mvgam_forecast object. See forecast.mvgam().

score

character specifying the type of proper scoring rule to use for evaluation. Options are: sis (i.e. the Scaled Interval Score), energy, variogram, elpd (i.e. the Expected log pointwise Predictive Density), drps (i.e. the Discrete Rank Probability Score), crps (the Continuous Rank Probability Score) or brier (the latter of which is only applicable for bernoulli models. Note that when choosing elpd, the supplied object must have forecasts on the link scale so that expectations can be calculated prior to scoring. If choosing brier, the object must have forecasts on the expected scale (i.e. probability predictions). For all other scores, forecasts should be supplied on the response scale (i.e. posterior predictions)

log

logical. Should the forecasts and truths be logged prior to scoring? This is often appropriate for comparing performance of models when series vary in their observation ranges. Ignored if score = 'brier'

weights

optional vector of weights (where length(weights) == n_series) for weighting pairwise correlations when evaluating the variogram score for multivariate forecasts. Useful for down-weighting series that have larger magnitude observations or that are of less interest when forecasting. Ignored if score != 'variogram'

interval_width

proportional value on [0.05,0.95] defining the forecast interval for calculating coverage and, if score = 'sis', for calculating the interval score. Ignored if score = 'brier'

n_cores

integer specifying number of cores for calculating scores in parallel

...

Ignored

Value

a list containing scores and interval coverages per forecast horizon. If score %in% c('drps', 'crps', 'elpd', 'brier'), the list will also contain return the sum of all series-level scores per horizon. If score %in% c('energy','variogram'), no series-level scores are computed and the only score returned will be for all series. For all scores apart from elpd and brier, the in_interval column in each series-level slot is a binary indicator of whether or not the true value was within the forecast's corresponding posterior empirical quantiles. Intervals are not calculated when using elpd because forecasts will only contain the linear predictors

Examples

# \donttest{
# Simulate observations for three count-valued time series
data <- sim_mvgam()
# Fit a dynamic model using 'newdata' to automatically produce forecasts
mod <- mvgam(y ~ 1,
            trend_model = RW(),
            data = data$data_train,
            newdata = data$data_test,
            chains = 2)
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 3.2 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 3.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 3.5 seconds.
#> Total execution time: 3.8 seconds.
#> 

# Extract forecasts into a 'mvgam_forecast' object
fc <- forecast(mod)
plot(fc)
#> Out of sample DRPS:
#> 10.713994


# Compute Discrete Rank Probability Scores and 0.90 interval coverages
fc_scores <- score(fc, score = 'drps')
str(fc_scores)
#> List of 4
#>  $ series_1  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.267 0.285 0.955 0.266 0.309 ...
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 1 1 0 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "drps" "drps" "drps" "drps" ...
#>  $ series_2  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.0824 0.5192 1.2494 0.189 0.2144 ...
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "drps" "drps" "drps" "drps" ...
#>  $ series_3  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.268 0.287 1.905 0.274 0.287 ...
#>   ..$ in_interval   : num [1:25] 1 1 0 1 1 1 1 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "drps" "drps" "drps" "drps" ...
#>  $ all_series:'data.frame':	25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 0.617 1.092 4.109 0.729 0.81 ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "sum_drps" "sum_drps" "sum_drps" "sum_drps" ...

# An example using binary data
data <- sim_mvgam(family = bernoulli())
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
            trend_model = AR(),
            data = data$data_train,
            newdata = data$data_test,
            family = bernoulli(),
            chains = 2)
#> Your model may benefit from using "noncentred = TRUE"
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3b9c6685.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 1.9 seconds.
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 2.2 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.1 seconds.
#> Total execution time: 2.3 seconds.
#> 

# Extract forecasts on the expectation (probability) scale
fc <- forecast(mod, type = 'expected')
plot(fc)
#> Out of sample Brier:
#> 10.7769847272021


# Compute Brier scores
fc_scores <- score(fc, score = 'brier')
str(fc_scores)
#> List of 4
#>  $ series_1  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.51 0.452 0.36 0.333 0.479 ...
#>   ..$ in_interval   : num [1:25] NA NA NA NA NA NA NA NA NA NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "brier" "brier" "brier" "brier" ...
#>  $ series_2  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.323 0.301 0.362 0.209 0.193 ...
#>   ..$ in_interval   : num [1:25] NA NA NA NA NA NA NA NA NA NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "brier" "brier" "brier" "brier" ...
#>  $ series_3  :'data.frame':	25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.31 0.474 0.374 0.423 0.318 ...
#>   ..$ in_interval   : num [1:25] NA NA NA NA NA NA NA NA NA NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "brier" "brier" "brier" "brier" ...
#>  $ all_series:'data.frame':	25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 1.143 1.227 1.095 0.965 0.99 ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "sum_brier" "sum_brier" "sum_brier" "sum_brier" ...
# }
================================================ FILE: docs/reference/series_to_mvgam.html ================================================ Convert timeseries object to format necessary for mvgam models — series_to_mvgam • mvgam Skip to contents

This function converts univariate or multivariate time series (xts or ts objects) to the format necessary for mvgam

Usage

series_to_mvgam(series, freq, train_prop = 0.85)

Arguments

series

xts or ts object to be converted to mvgam format

freq

integer. The seasonal frequency of the series

train_prop

numeric stating the proportion of data to use for training. Should be between 0.25 and 0.95

Value

A list object containing outputs needed for mvgam, including 'data_train' and 'data_test'

Examples

# A ts object example
data("sunspots")
series <- cbind(sunspots, sunspots)
colnames(series) <- c('blood', 'bone')
head(series)
#>      blood bone
#> [1,]  58.0 58.0
#> [2,]  62.6 62.6
#> [3,]  70.0 70.0
#> [4,]  55.7 55.7
#> [5,]  85.0 85.0
#> [6,]  83.5 83.5
series_to_mvgam(series, frequency(series), 0.85)
#> $data_train
#>         y season year                date series time
#> 1    58.0      1 1749 1749-01-01 00:00:00  blood    1
#> 2    58.0      1 1749 1749-01-01 00:00:00   bone    1
#> 3    62.6      2 1749 1749-01-31 10:00:00  blood    2
#> 4    62.6      2 1749 1749-01-31 10:00:00   bone    2
#> 5    70.0      3 1749 1749-03-02 20:00:01  blood    3
#> 6    70.0      3 1749 1749-03-02 20:00:01   bone    3
#> 7    55.7      4 1749 1749-04-02 06:00:00  blood    4
#> 8    55.7      4 1749 1749-04-02 06:00:00   bone    4
#> 9    85.0      5 1749 1749-05-02 16:00:00  blood    5
#> 10   85.0      5 1749 1749-05-02 16:00:00   bone    5
#> 11   83.5      6 1749 1749-06-02 02:00:01  blood    6
#> 12   83.5      6 1749 1749-06-02 02:00:01   bone    6
#> 13   94.8      7 1749 1749-07-02 12:00:00  blood    7
#> 14   94.8      7 1749 1749-07-02 12:00:00   bone    7
#> 15   66.3      8 1749 1749-08-01 22:00:00  blood    8
#> 16   66.3      8 1749 1749-08-01 22:00:00   bone    8
#> 17   75.9      9 1749 1749-09-01 08:00:01  blood    9
#> 18   75.9      9 1749 1749-09-01 08:00:01   bone    9
#> 19   75.5     10 1749 1749-10-01 18:00:00  blood   10
#> 20   75.5     10 1749 1749-10-01 18:00:00   bone   10
#> 21  158.6     11 1749 1749-11-01 04:00:00  blood   11
#> 22  158.6     11 1749 1749-11-01 04:00:00   bone   11
#> 23   85.2     12 1749 1749-12-01 14:00:01  blood   12
#> 24   85.2     12 1749 1749-12-01 14:00:01   bone   12
#> 25   73.3      1 1750 1750-01-01 00:00:00  blood   13
#> 26   73.3      1 1750 1750-01-01 00:00:00   bone   13
#> 27   75.9      2 1750 1750-01-31 10:00:00  blood   14
#> 28   75.9      2 1750 1750-01-31 10:00:00   bone   14
#> 29   89.2      3 1750 1750-03-02 20:00:01  blood   15
#> 30   89.2      3 1750 1750-03-02 20:00:01   bone   15
#> 31   88.3      4 1750 1750-04-02 06:00:00  blood   16
#> 32   88.3      4 1750 1750-04-02 06:00:00   bone   16
#> 33   90.0      5 1750 1750-05-02 16:00:00  blood   17
#> 34   90.0      5 1750 1750-05-02 16:00:00   bone   17
#> 35  100.0      6 1750 1750-06-02 02:00:01  blood   18
#> 36  100.0      6 1750 1750-06-02 02:00:01   bone   18
#> 37   85.4      7 1750 1750-07-02 12:00:00  blood   19
#> 38   85.4      7 1750 1750-07-02 12:00:00   bone   19
#> 39  103.0      8 1750 1750-08-01 22:00:00  blood   20
#> 40  103.0      8 1750 1750-08-01 22:00:00   bone   20
#> 41   91.2      9 1750 1750-09-01 08:00:01  blood   21
#> 42   91.2      9 1750 1750-09-01 08:00:01   bone   21
#> 43   65.7     10 1750 1750-10-01 18:00:00  blood   22
#> 44   65.7     10 1750 1750-10-01 18:00:00   bone   22
#> 45   63.3     11 1750 1750-11-01 04:00:00  blood   23
#> 46   63.3     11 1750 1750-11-01 04:00:00   bone   23
#> 47   75.4     12 1750 1750-12-01 14:00:01  blood   24
#> 48   75.4     12 1750 1750-12-01 14:00:01   bone   24
#> 49   70.0      1 1751 1751-01-01 00:00:00  blood   25
#> 50   70.0      1 1751 1751-01-01 00:00:00   bone   25
#> 51   43.5      2 1751 1751-01-31 10:00:00  blood   26
#> 52   43.5      2 1751 1751-01-31 10:00:00   bone   26
#> 53   45.3      3 1751 1751-03-02 20:00:01  blood   27
#> 54   45.3      3 1751 1751-03-02 20:00:01   bone   27
#> 55   56.4      4 1751 1751-04-02 06:00:00  blood   28
#> 56   56.4      4 1751 1751-04-02 06:00:00   bone   28
#> 57   60.7      5 1751 1751-05-02 16:00:00  blood   29
#> 58   60.7      5 1751 1751-05-02 16:00:00   bone   29
#> 59   50.7      6 1751 1751-06-02 02:00:01  blood   30
#> 60   50.7      6 1751 1751-06-02 02:00:01   bone   30
#> 61   66.3      7 1751 1751-07-02 12:00:00  blood   31
#> 62   66.3      7 1751 1751-07-02 12:00:00   bone   31
#> 63   59.8      8 1751 1751-08-01 22:00:00  blood   32
#> 64   59.8      8 1751 1751-08-01 22:00:00   bone   32
#> 65   23.5      9 1751 1751-09-01 08:00:01  blood   33
#> 66   23.5      9 1751 1751-09-01 08:00:01   bone   33
#> 67   23.2     10 1751 1751-10-01 18:00:00  blood   34
#> 68   23.2     10 1751 1751-10-01 18:00:00   bone   34
#> 69   28.5     11 1751 1751-11-01 04:00:00  blood   35
#> 70   28.5     11 1751 1751-11-01 04:00:00   bone   35
#> 71   44.0     12 1751 1751-12-01 14:00:01  blood   36
#> 72   44.0     12 1751 1751-12-01 14:00:01   bone   36
#> 73   35.0      1 1752 1752-01-01 00:00:00  blood   37
#> 74   35.0      1 1752 1752-01-01 00:00:00   bone   37
#> 75   50.0      2 1752 1752-01-31 12:00:00  blood   38
#> 76   50.0      2 1752 1752-01-31 12:00:00   bone   38
#> 77   71.0      3 1752 1752-03-02 00:00:01  blood   39
#> 78   71.0      3 1752 1752-03-02 00:00:01   bone   39
#> 79   59.3      4 1752 1752-04-01 12:00:00  blood   40
#> 80   59.3      4 1752 1752-04-01 12:00:00   bone   40
#> 81   59.7      5 1752 1752-05-02 00:00:00  blood   41
#> 82   59.7      5 1752 1752-05-02 00:00:00   bone   41
#> 83   39.6      6 1752 1752-06-01 12:00:01  blood   42
#> 84   39.6      6 1752 1752-06-01 12:00:01   bone   42
#> 85   78.4      7 1752 1752-07-02 00:00:00  blood   43
#> 86   78.4      7 1752 1752-07-02 00:00:00   bone   43
#> 87   29.3      8 1752 1752-08-01 12:00:00  blood   44
#> 88   29.3      8 1752 1752-08-01 12:00:00   bone   44
#> 89   27.1      9 1752 1752-09-01 00:00:01  blood   45
#> 90   27.1      9 1752 1752-09-01 00:00:01   bone   45
#> 91   46.6     10 1752 1752-10-01 12:00:00  blood   46
#> 92   46.6     10 1752 1752-10-01 12:00:00   bone   46
#> 93   37.6     11 1752 1752-11-01 00:00:00  blood   47
#> 94   37.6     11 1752 1752-11-01 00:00:00   bone   47
#> 95   40.0     12 1752 1752-12-01 12:00:01  blood   48
#> 96   40.0     12 1752 1752-12-01 12:00:01   bone   48
#> 97   44.0      1 1753 1753-01-01 00:00:00  blood   49
#> 98   44.0      1 1753 1753-01-01 00:00:00   bone   49
#> 99   32.0      2 1753 1753-01-31 10:00:00  blood   50
#> 100  32.0      2 1753 1753-01-31 10:00:00   bone   50
#> 101  45.7      3 1753 1753-03-02 20:00:01  blood   51
#> 102  45.7      3 1753 1753-03-02 20:00:01   bone   51
#> 103  38.0      4 1753 1753-04-02 06:00:00  blood   52
#> 104  38.0      4 1753 1753-04-02 06:00:00   bone   52
#> 105  36.0      5 1753 1753-05-02 16:00:00  blood   53
#> 106  36.0      5 1753 1753-05-02 16:00:00   bone   53
#> 107  31.7      6 1753 1753-06-02 02:00:01  blood   54
#> 108  31.7      6 1753 1753-06-02 02:00:01   bone   54
#> 109  22.2      7 1753 1753-07-02 12:00:00  blood   55
#> 110  22.2      7 1753 1753-07-02 12:00:00   bone   55
#> 111  39.0      8 1753 1753-08-01 22:00:00  blood   56
#> 112  39.0      8 1753 1753-08-01 22:00:00   bone   56
#> 113  28.0      9 1753 1753-09-01 08:00:01  blood   57
#> 114  28.0      9 1753 1753-09-01 08:00:01   bone   57
#> 115  25.0     10 1753 1753-10-01 18:00:00  blood   58
#> 116  25.0     10 1753 1753-10-01 18:00:00   bone   58
#> 117  20.0     11 1753 1753-11-01 04:00:00  blood   59
#> 118  20.0     11 1753 1753-11-01 04:00:00   bone   59
#> 119   6.7     12 1753 1753-12-01 14:00:01  blood   60
#> 120   6.7     12 1753 1753-12-01 14:00:01   bone   60
#> 121   0.0      1 1754 1754-01-01 00:00:00  blood   61
#> 122   0.0      1 1754 1754-01-01 00:00:00   bone   61
#> 123   3.0      2 1754 1754-01-31 10:00:00  blood   62
#> 124   3.0      2 1754 1754-01-31 10:00:00   bone   62
#> 125   1.7      3 1754 1754-03-02 20:00:01  blood   63
#> 126   1.7      3 1754 1754-03-02 20:00:01   bone   63
#> 127  13.7      4 1754 1754-04-02 06:00:00  blood   64
#> 128  13.7      4 1754 1754-04-02 06:00:00   bone   64
#> 129  20.7      5 1754 1754-05-02 16:00:00  blood   65
#> 130  20.7      5 1754 1754-05-02 16:00:00   bone   65
#> 131  26.7      6 1754 1754-06-02 02:00:01  blood   66
#> 132  26.7      6 1754 1754-06-02 02:00:01   bone   66
#> 133  18.8      7 1754 1754-07-02 12:00:00  blood   67
#> 134  18.8      7 1754 1754-07-02 12:00:00   bone   67
#> 135  12.3      8 1754 1754-08-01 22:00:00  blood   68
#> 136  12.3      8 1754 1754-08-01 22:00:00   bone   68
#> 137   8.2      9 1754 1754-09-01 08:00:01  blood   69
#> 138   8.2      9 1754 1754-09-01 08:00:01   bone   69
#> 139  24.1     10 1754 1754-10-01 18:00:00  blood   70
#> 140  24.1     10 1754 1754-10-01 18:00:00   bone   70
#> 141  13.2     11 1754 1754-11-01 04:00:00  blood   71
#> 142  13.2     11 1754 1754-11-01 04:00:00   bone   71
#> 143   4.2     12 1754 1754-12-01 14:00:01  blood   72
#> 144   4.2     12 1754 1754-12-01 14:00:01   bone   72
#> 145  10.2      1 1755 1755-01-01 00:00:00  blood   73
#> 146  10.2      1 1755 1755-01-01 00:00:00   bone   73
#> 147  11.2      2 1755 1755-01-31 10:00:00  blood   74
#> 148  11.2      2 1755 1755-01-31 10:00:00   bone   74
#> 149   6.8      3 1755 1755-03-02 20:00:01  blood   75
#> 150   6.8      3 1755 1755-03-02 20:00:01   bone   75
#> 151   6.5      4 1755 1755-04-02 06:00:00  blood   76
#> 152   6.5      4 1755 1755-04-02 06:00:00   bone   76
#> 153   0.0      5 1755 1755-05-02 16:00:00  blood   77
#> 154   0.0      5 1755 1755-05-02 16:00:00   bone   77
#> 155   0.0      6 1755 1755-06-02 02:00:01  blood   78
#> 156   0.0      6 1755 1755-06-02 02:00:01   bone   78
#> 157   8.6      7 1755 1755-07-02 12:00:00  blood   79
#> 158   8.6      7 1755 1755-07-02 12:00:00   bone   79
#> 159   3.2      8 1755 1755-08-01 22:00:00  blood   80
#> 160   3.2      8 1755 1755-08-01 22:00:00   bone   80
#> 161  17.8      9 1755 1755-09-01 08:00:01  blood   81
#> 162  17.8      9 1755 1755-09-01 08:00:01   bone   81
#> 163  23.7     10 1755 1755-10-01 18:00:00  blood   82
#> 164  23.7     10 1755 1755-10-01 18:00:00   bone   82
#> 165   6.8     11 1755 1755-11-01 04:00:00  blood   83
#> 166   6.8     11 1755 1755-11-01 04:00:00   bone   83
#>  [ reached 'max' / getOption("max.print") -- omitted 4628 rows ]
#> 
#> $data_test
#>         y season year                date series time
#> 1   136.3     10 1948 1948-10-01 12:00:00  blood 2398
#> 2   136.3     10 1948 1948-10-01 12:00:00   bone 2398
#> 3    95.8     11 1948 1948-11-01 00:00:01  blood 2399
#> 4    95.8     11 1948 1948-11-01 00:00:01   bone 2399
#> 5   138.0     12 1948 1948-12-01 12:00:01  blood 2400
#> 6   138.0     12 1948 1948-12-01 12:00:01   bone 2400
#> 7   119.1      1 1949 1949-01-01 00:00:00  blood 2401
#> 8   119.1      1 1949 1949-01-01 00:00:00   bone 2401
#> 9   182.3      2 1949 1949-01-31 10:00:01  blood 2402
#> 10  182.3      2 1949 1949-01-31 10:00:01   bone 2402
#> 11  157.5      3 1949 1949-03-02 20:00:01  blood 2403
#> 12  157.5      3 1949 1949-03-02 20:00:01   bone 2403
#> 13  147.0      4 1949 1949-04-02 06:00:00  blood 2404
#> 14  147.0      4 1949 1949-04-02 06:00:00   bone 2404
#> 15  106.2      5 1949 1949-05-02 16:00:01  blood 2405
#> 16  106.2      5 1949 1949-05-02 16:00:01   bone 2405
#> 17  121.7      6 1949 1949-06-02 02:00:01  blood 2406
#> 18  121.7      6 1949 1949-06-02 02:00:01   bone 2406
#> 19  125.8      7 1949 1949-07-02 12:00:00  blood 2407
#> 20  125.8      7 1949 1949-07-02 12:00:00   bone 2407
#> 21  123.8      8 1949 1949-08-01 22:00:01  blood 2408
#> 22  123.8      8 1949 1949-08-01 22:00:01   bone 2408
#> 23  145.3      9 1949 1949-09-01 08:00:01  blood 2409
#> 24  145.3      9 1949 1949-09-01 08:00:01   bone 2409
#> 25  131.6     10 1949 1949-10-01 18:00:00  blood 2410
#> 26  131.6     10 1949 1949-10-01 18:00:00   bone 2410
#> 27  143.5     11 1949 1949-11-01 04:00:01  blood 2411
#> 28  143.5     11 1949 1949-11-01 04:00:01   bone 2411
#> 29  117.6     12 1949 1949-12-01 14:00:01  blood 2412
#> 30  117.6     12 1949 1949-12-01 14:00:01   bone 2412
#> 31  101.6      1 1950 1950-01-01 00:00:00  blood 2413
#> 32  101.6      1 1950 1950-01-01 00:00:00   bone 2413
#> 33   94.8      2 1950 1950-01-31 10:00:01  blood 2414
#> 34   94.8      2 1950 1950-01-31 10:00:01   bone 2414
#> 35  109.7      3 1950 1950-03-02 20:00:01  blood 2415
#> 36  109.7      3 1950 1950-03-02 20:00:01   bone 2415
#> 37  113.4      4 1950 1950-04-02 06:00:00  blood 2416
#> 38  113.4      4 1950 1950-04-02 06:00:00   bone 2416
#> 39  106.2      5 1950 1950-05-02 16:00:01  blood 2417
#> 40  106.2      5 1950 1950-05-02 16:00:01   bone 2417
#> 41   83.6      6 1950 1950-06-02 02:00:01  blood 2418
#> 42   83.6      6 1950 1950-06-02 02:00:01   bone 2418
#> 43   91.0      7 1950 1950-07-02 12:00:00  blood 2419
#> 44   91.0      7 1950 1950-07-02 12:00:00   bone 2419
#> 45   85.2      8 1950 1950-08-01 22:00:01  blood 2420
#> 46   85.2      8 1950 1950-08-01 22:00:01   bone 2420
#> 47   51.3      9 1950 1950-09-01 08:00:01  blood 2421
#> 48   51.3      9 1950 1950-09-01 08:00:01   bone 2421
#> 49   61.4     10 1950 1950-10-01 18:00:00  blood 2422
#> 50   61.4     10 1950 1950-10-01 18:00:00   bone 2422
#> 51   54.8     11 1950 1950-11-01 04:00:01  blood 2423
#> 52   54.8     11 1950 1950-11-01 04:00:01   bone 2423
#> 53   54.1     12 1950 1950-12-01 14:00:01  blood 2424
#> 54   54.1     12 1950 1950-12-01 14:00:01   bone 2424
#> 55   59.9      1 1951 1951-01-01 00:00:00  blood 2425
#> 56   59.9      1 1951 1951-01-01 00:00:00   bone 2425
#> 57   59.9      2 1951 1951-01-31 10:00:01  blood 2426
#> 58   59.9      2 1951 1951-01-31 10:00:01   bone 2426
#> 59   59.9      3 1951 1951-03-02 20:00:01  blood 2427
#> 60   59.9      3 1951 1951-03-02 20:00:01   bone 2427
#> 61   92.9      4 1951 1951-04-02 06:00:00  blood 2428
#> 62   92.9      4 1951 1951-04-02 06:00:00   bone 2428
#> 63  108.5      5 1951 1951-05-02 16:00:01  blood 2429
#> 64  108.5      5 1951 1951-05-02 16:00:01   bone 2429
#> 65  100.6      6 1951 1951-06-02 02:00:01  blood 2430
#> 66  100.6      6 1951 1951-06-02 02:00:01   bone 2430
#> 67   61.5      7 1951 1951-07-02 12:00:00  blood 2431
#> 68   61.5      7 1951 1951-07-02 12:00:00   bone 2431
#> 69   61.0      8 1951 1951-08-01 22:00:01  blood 2432
#> 70   61.0      8 1951 1951-08-01 22:00:01   bone 2432
#> 71   83.1      9 1951 1951-09-01 08:00:01  blood 2433
#> 72   83.1      9 1951 1951-09-01 08:00:01   bone 2433
#> 73   51.6     10 1951 1951-10-01 18:00:00  blood 2434
#> 74   51.6     10 1951 1951-10-01 18:00:00   bone 2434
#> 75   52.4     11 1951 1951-11-01 04:00:01  blood 2435
#> 76   52.4     11 1951 1951-11-01 04:00:01   bone 2435
#> 77   45.8     12 1951 1951-12-01 14:00:01  blood 2436
#> 78   45.8     12 1951 1951-12-01 14:00:01   bone 2436
#> 79   40.7      1 1952 1952-01-01 00:00:00  blood 2437
#> 80   40.7      1 1952 1952-01-01 00:00:00   bone 2437
#> 81   22.7      2 1952 1952-01-31 12:00:01  blood 2438
#> 82   22.7      2 1952 1952-01-31 12:00:01   bone 2438
#> 83   22.0      3 1952 1952-03-02 00:00:01  blood 2439
#> 84   22.0      3 1952 1952-03-02 00:00:01   bone 2439
#> 85   29.1      4 1952 1952-04-01 12:00:00  blood 2440
#> 86   29.1      4 1952 1952-04-01 12:00:00   bone 2440
#> 87   23.4      5 1952 1952-05-02 00:00:01  blood 2441
#> 88   23.4      5 1952 1952-05-02 00:00:01   bone 2441
#> 89   36.4      6 1952 1952-06-01 12:00:01  blood 2442
#> 90   36.4      6 1952 1952-06-01 12:00:01   bone 2442
#> 91   39.3      7 1952 1952-07-02 00:00:00  blood 2443
#> 92   39.3      7 1952 1952-07-02 00:00:00   bone 2443
#> 93   54.9      8 1952 1952-08-01 12:00:01  blood 2444
#> 94   54.9      8 1952 1952-08-01 12:00:01   bone 2444
#> 95   28.2      9 1952 1952-09-01 00:00:01  blood 2445
#> 96   28.2      9 1952 1952-09-01 00:00:01   bone 2445
#> 97   23.8     10 1952 1952-10-01 12:00:00  blood 2446
#> 98   23.8     10 1952 1952-10-01 12:00:00   bone 2446
#> 99   22.1     11 1952 1952-11-01 00:00:01  blood 2447
#> 100  22.1     11 1952 1952-11-01 00:00:01   bone 2447
#> 101  34.3     12 1952 1952-12-01 12:00:01  blood 2448
#> 102  34.3     12 1952 1952-12-01 12:00:01   bone 2448
#> 103  26.5      1 1953 1953-01-01 00:00:00  blood 2449
#> 104  26.5      1 1953 1953-01-01 00:00:00   bone 2449
#> 105   3.9      2 1953 1953-01-31 10:00:01  blood 2450
#> 106   3.9      2 1953 1953-01-31 10:00:01   bone 2450
#> 107  10.0      3 1953 1953-03-02 20:00:01  blood 2451
#> 108  10.0      3 1953 1953-03-02 20:00:01   bone 2451
#> 109  27.8      4 1953 1953-04-02 06:00:00  blood 2452
#> 110  27.8      4 1953 1953-04-02 06:00:00   bone 2452
#> 111  12.5      5 1953 1953-05-02 16:00:01  blood 2453
#> 112  12.5      5 1953 1953-05-02 16:00:01   bone 2453
#> 113  21.8      6 1953 1953-06-02 02:00:01  blood 2454
#> 114  21.8      6 1953 1953-06-02 02:00:01   bone 2454
#> 115   8.6      7 1953 1953-07-02 12:00:00  blood 2455
#> 116   8.6      7 1953 1953-07-02 12:00:00   bone 2455
#> 117  23.5      8 1953 1953-08-01 22:00:01  blood 2456
#> 118  23.5      8 1953 1953-08-01 22:00:01   bone 2456
#> 119  19.3      9 1953 1953-09-01 08:00:01  blood 2457
#> 120  19.3      9 1953 1953-09-01 08:00:01   bone 2457
#> 121   8.2     10 1953 1953-10-01 18:00:00  blood 2458
#> 122   8.2     10 1953 1953-10-01 18:00:00   bone 2458
#> 123   1.6     11 1953 1953-11-01 04:00:01  blood 2459
#> 124   1.6     11 1953 1953-11-01 04:00:01   bone 2459
#> 125   2.5     12 1953 1953-12-01 14:00:01  blood 2460
#> 126   2.5     12 1953 1953-12-01 14:00:01   bone 2460
#> 127   0.2      1 1954 1954-01-01 00:00:00  blood 2461
#> 128   0.2      1 1954 1954-01-01 00:00:00   bone 2461
#> 129   0.5      2 1954 1954-01-31 10:00:01  blood 2462
#> 130   0.5      2 1954 1954-01-31 10:00:01   bone 2462
#> 131  10.9      3 1954 1954-03-02 20:00:01  blood 2463
#> 132  10.9      3 1954 1954-03-02 20:00:01   bone 2463
#> 133   1.8      4 1954 1954-04-02 06:00:00  blood 2464
#> 134   1.8      4 1954 1954-04-02 06:00:00   bone 2464
#> 135   0.8      5 1954 1954-05-02 16:00:01  blood 2465
#> 136   0.8      5 1954 1954-05-02 16:00:01   bone 2465
#> 137   0.2      6 1954 1954-06-02 02:00:01  blood 2466
#> 138   0.2      6 1954 1954-06-02 02:00:01   bone 2466
#> 139   4.8      7 1954 1954-07-02 12:00:00  blood 2467
#> 140   4.8      7 1954 1954-07-02 12:00:00   bone 2467
#> 141   8.4      8 1954 1954-08-01 22:00:01  blood 2468
#> 142   8.4      8 1954 1954-08-01 22:00:01   bone 2468
#> 143   1.5      9 1954 1954-09-01 08:00:01  blood 2469
#> 144   1.5      9 1954 1954-09-01 08:00:01   bone 2469
#> 145   7.0     10 1954 1954-10-01 18:00:00  blood 2470
#> 146   7.0     10 1954 1954-10-01 18:00:00   bone 2470
#> 147   9.2     11 1954 1954-11-01 04:00:01  blood 2471
#> 148   9.2     11 1954 1954-11-01 04:00:01   bone 2471
#> 149   7.6     12 1954 1954-12-01 14:00:01  blood 2472
#> 150   7.6     12 1954 1954-12-01 14:00:01   bone 2472
#> 151  23.1      1 1955 1955-01-01 00:00:00  blood 2473
#> 152  23.1      1 1955 1955-01-01 00:00:00   bone 2473
#> 153  20.8      2 1955 1955-01-31 10:00:01  blood 2474
#> 154  20.8      2 1955 1955-01-31 10:00:01   bone 2474
#> 155   4.9      3 1955 1955-03-02 20:00:01  blood 2475
#> 156   4.9      3 1955 1955-03-02 20:00:01   bone 2475
#> 157  11.3      4 1955 1955-04-02 06:00:00  blood 2476
#> 158  11.3      4 1955 1955-04-02 06:00:00   bone 2476
#> 159  28.9      5 1955 1955-05-02 16:00:01  blood 2477
#> 160  28.9      5 1955 1955-05-02 16:00:01   bone 2477
#> 161  31.7      6 1955 1955-06-02 02:00:01  blood 2478
#> 162  31.7      6 1955 1955-06-02 02:00:01   bone 2478
#> 163  26.7      7 1955 1955-07-02 12:00:00  blood 2479
#> 164  26.7      7 1955 1955-07-02 12:00:00   bone 2479
#> 165  40.7      8 1955 1955-08-01 22:00:01  blood 2480
#> 166  40.7      8 1955 1955-08-01 22:00:01   bone 2480
#>  [ reached 'max' / getOption("max.print") -- omitted 680 rows ]
#> 

# An xts object example
library(xts)
#> Warning: package ‘xts’ was built under R version 4.3.3
#> Loading required package: zoo
#> Warning: package ‘zoo’ was built under R version 4.3.2
#> 
#> Attaching package: ‘zoo’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.Date, as.Date.numeric
dates <- seq(as.Date("2001-05-01"), length=30, by="quarter")
data  <- cbind(c(gas = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001)))),
c(oil = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001)))))
series <- xts(x = data, order.by = dates)
colnames(series) <- c('gas', 'oil')
head(series)
#>            gas oil
#> 2001-05-01   1   1
#> 2001-08-01   1   0
#> 2001-11-01   2   0
#> 2002-02-01   2   1
#> 2002-05-01   0   1
#> 2002-08-01   1   1
series_to_mvgam(series, freq = 4, train_prop = 0.85)
#> $data_train
#>    y season year       date series time
#> 1  1      2 2001 2001-05-01    gas    1
#> 2  1      2 2001 2001-05-01    oil    1
#> 3  1      3 2001 2001-08-01    gas    2
#> 4  0      3 2001 2001-08-01    oil    2
#> 5  2      4 2001 2001-11-01    gas    3
#> 6  0      4 2001 2001-11-01    oil    3
#> 7  2      1 2002 2002-02-01    gas    4
#> 8  1      1 2002 2002-02-01    oil    4
#> 9  0      2 2002 2002-05-01    gas    5
#> 10 1      2 2002 2002-05-01    oil    5
#> 11 1      3 2002 2002-08-01    gas    6
#> 12 1      3 2002 2002-08-01    oil    6
#> 13 2      4 2002 2002-11-01    gas    7
#> 14 1      4 2002 2002-11-01    oil    7
#> 15 1      1 2003 2003-02-01    gas    8
#> 16 2      1 2003 2003-02-01    oil    8
#> 17 4      2 2003 2003-05-01    gas    9
#> 18 1      2 2003 2003-05-01    oil    9
#> 19 2      3 2003 2003-08-01    gas   10
#> 20 2      3 2003 2003-08-01    oil   10
#> 21 1      4 2003 2003-11-01    gas   11
#> 22 1      4 2003 2003-11-01    oil   11
#> 23 2      1 2004 2004-02-01    gas   12
#> 24 1      1 2004 2004-02-01    oil   12
#> 25 1      2 2004 2004-05-01    gas   13
#> 26 1      2 2004 2004-05-01    oil   13
#> 27 2      3 2004 2004-08-01    gas   14
#> 28 3      3 2004 2004-08-01    oil   14
#> 29 1      4 2004 2004-11-01    gas   15
#> 30 2      4 2004 2004-11-01    oil   15
#> 31 1      1 2005 2005-02-01    gas   16
#> 32 1      1 2005 2005-02-01    oil   16
#> 33 1      2 2005 2005-05-01    gas   17
#> 34 1      2 2005 2005-05-01    oil   17
#> 35 1      3 2005 2005-08-01    gas   18
#> 36 1      3 2005 2005-08-01    oil   18
#> 37 2      4 2005 2005-11-01    gas   19
#> 38 1      4 2005 2005-11-01    oil   19
#> 39 1      1 2006 2006-02-01    gas   20
#> 40 3      1 2006 2006-02-01    oil   20
#> 41 2      2 2006 2006-05-01    gas   21
#> 42 2      2 2006 2006-05-01    oil   21
#> 43 2      3 2006 2006-08-01    gas   22
#> 44 3      3 2006 2006-08-01    oil   22
#> 45 0      4 2006 2006-11-01    gas   23
#> 46 0      4 2006 2006-11-01    oil   23
#> 47 2      1 2007 2007-02-01    gas   24
#> 48 1      1 2007 2007-02-01    oil   24
#> 49 1      2 2007 2007-05-01    gas   25
#> 50 0      2 2007 2007-05-01    oil   25
#> 
#> $data_test
#>    y season year       date series time
#> 1  1      3 2007 2007-08-01    gas   26
#> 2  0      3 2007 2007-08-01    oil   26
#> 3  0      4 2007 2007-11-01    gas   27
#> 4  2      4 2007 2007-11-01    oil   27
#> 5  1      1 2008 2008-02-01    gas   28
#> 6  0      1 2008 2008-02-01    oil   28
#> 7  1      2 2008 2008-05-01    gas   29
#> 8  2      2 2008 2008-05-01    oil   29
#> 9  2      3 2008 2008-08-01    gas   30
#> 10 1      3 2008 2008-08-01    oil   30
#> 

================================================ FILE: docs/reference/sim_mvgam.html ================================================ Simulate a set of time series for modelling in mvgam — sim_mvgam • mvgam Skip to contents

This function simulates sets of time series data for fitting a multivariate GAM that includes shared seasonality and dependence on state-space latent dynamic factors. Random dependencies among series, i.e. correlations in their long-term trends, are included in the form of correlated loadings on the latent dynamic factors

Usage

sim_mvgam(
  T = 100,
  n_series = 3,
  seasonality = "shared",
  use_lv = FALSE,
  n_lv = 0,
  trend_model = RW(),
  drift = FALSE,
  prop_trend = 0.2,
  trend_rel,
  freq = 12,
  family = poisson(),
  phi,
  shape,
  sigma,
  nu,
  mu,
  prop_missing = 0,
  prop_train = 0.85
)

Arguments

T

integer. Number of observations (timepoints)

n_series

integer. Number of discrete time series

seasonality

character. Either shared, meaning that all series share the exact same seasonal pattern, or hierarchical, meaning that there is a global seasonality but each series' pattern can deviate slightly

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. If FALSE, estimate independent latent trends for each series

n_lv

integer. Number of latent dynamic factors for generating the series' trends. Defaults to 0, meaning that dynamics are estimated independently for each series

trend_model

character specifying the time series dynamics for the latent trend. Options are:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • RW (random walk with possible drift)

  • AR1 (with possible drift)

  • AR2 (with possible drift)

  • AR3 (with possible drift)

  • VAR1 (contemporaneously uncorrelated VAR1)

  • VAR1cor (contemporaneously correlated VAR1)

  • GP (Gaussian Process with squared exponential kernel)

See mvgam_trends for more details

drift

logical, simulate a drift term for each trend

prop_trend

numeric. Relative importance of the trend for each series. Should be between 0 and 1

trend_rel

Deprecated. Use prop_trend instead

freq

integer. The seasonal frequency of the series

family

family specifying the exponential observation family for the series. Currently supported families are: nb(), poisson(), bernoulli(), tweedie(), gaussian(), betar(), lognormal(), student() and Gamma()

phi

vector of dispersion parameters for the series (i.e. size for nb() or phi for betar()). If length(phi) < n_series, the first element of phi will be replicated n_series times. Defaults to 5 for nb() and tweedie(); 10 for betar()

shape

vector of shape parameters for the series (i.e. shape for gamma()) If length(shape) < n_series, the first element of shape will be replicated n_series times. Defaults to 10

sigma

vector of scale parameters for the series (i.e. sd for gaussian() or student(), log(sd) for lognormal()). If length(sigma) < n_series, the first element of sigma will be replicated n_series times. Defaults to 0.5 for gaussian() and student(); 0.2 for lognormal()

nu

vector of degrees of freedom parameters for the series (i.e. nu for student()) If length(nu) < n_series, the first element of nu will be replicated n_series times. Defaults to 3

mu

vector of location parameters for the series. If length(mu) < n_series, the first element of mu will be replicated n_series times. Defaults to small random values between -0.5 and 0.5 on the link scale

prop_missing

numeric stating proportion of observations that are missing. Should be between 0 and 0.8, inclusive

prop_train

numeric stating the proportion of data to use for training. Should be between 0.2 and 1

Value

A list object containing outputs needed for mvgam, including 'data_train' and 'data_test', as well as some additional information about the simulated seasonality and trend dependencies

Examples

# Simulate series with observations bounded at 0 and 1 (Beta responses)
sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend = 0.6)
plot_mvgam_series(data = sim_data$data_train, series = 'all')


# Now simulate series with overdispersed discrete observations
sim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10)
plot_mvgam_series(data = sim_data$data_train, series = 'all')

================================================ FILE: docs/reference/stability.mvgam.html ================================================ Calculate measures of latent VAR community stability — stability.mvgam • mvgam Skip to contents

Compute reactivity, return rates and contributions of interactions to stationary forecast variance from mvgam models with Vector Autoregressive dynamics

Usage

stability(object, ...)

# S3 method for class 'mvgam'
stability(object, ...)

Arguments

object

list object of class mvgam resulting from a call to mvgam() that used a Vector Autoregressive latent process model (either as VAR(cor = FALSE) or VAR(cor = TRUE))

...

ignored

Value

A data.frame containing posterior draws for each stability metric.

Details

These measures of stability can be used to assess how important inter-series dependencies are to the variability of a multivariate system and to ask how systems are expected to respond to environmental perturbations. Using the formula for a latent VAR(1) as: $$ \mu_t \sim \text{MVNormal}(A(\mu_{t - 1}), \Sigma) \quad $$ this function will calculate the long-term stationary forecast distribution of the system, which has mean \(\mu_{\infty}\) and variance \(\Sigma_{\infty}\), to then calculate the following quantities:

  • prop_int: Proportion of the volume of the stationary forecast distribution that is attributable to lagged interactions (i.e. how important are the autoregressive interaction coefficients in \(A\) for explaining the shape of the stationary forecast distribution?): $$ det(A)^2 \quad $$

  • prop_int_adj: Same as prop_int but scaled by the number of series \(p\) to facilitate direct comparisons among systems with different numbers of interacting variables: $$ det(A)^{2/p} \quad $$

  • prop_int_offdiag: Sensitivity of prop_int to inter-series interactions (i.e. how important are the off-diagonals of the autoregressive coefficient matrix \(A\) for shaping prop_int?), calculated as the relative magnitude of the off-diagonals in the partial derivative matrix: $$ [2~det(A) (A^{-1})^T] \quad $$

  • prop_int_diag: Sensitivity of prop_int to intra-series interactions (i.e. how important are the diagonals of the autoregressive coefficient matrix \(A\) for shaping prop_int?), calculated as the relative magnitude of the diagonals in the partial derivative matrix: $$ [2~det(A) (A^{-1})^T] \quad $$

  • prop_cov_offdiag: Sensitivity of \(\Sigma_{\infty}\) to inter-series error correlations (i.e. how important are off-diagonal covariances in \(\Sigma\) for shaping \(\Sigma_{\infty}\)?), calculated as the relative magnitude of the off-diagonals in the partial derivative matrix: $$ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] \quad $$

  • prop_cov_diag: Sensitivity of \(\Sigma_{\infty}\) to error variances (i.e. how important are diagonal variances in \(\Sigma\) for shaping \(\Sigma_{\infty}\)?), calculated as the relative magnitude of the diagonals in the partial derivative matrix: $$ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] \quad $$

  • reactivity: A measure of the degree to which the system moves away from a stable equilibrium following a perturbation. Values > 0 suggest the system is reactive, whereby a perturbation of the system in one period can be amplified in the next period. If \(\sigma_{max}(A)\) is the largest singular value of \(A\), then reactivity is defined as: $$ log\sigma_{max}(A) \quad $$

  • mean_return_rate: Asymptotic (long-term) return rate of the mean of the transition distribution to the stationary mean, calculated using the largest eigenvalue of the matrix \(A\): $$ max(\lambda_{A}) \quad $$ Lower values suggest greater stability

  • var_return_rate: Asymptotic (long-term) return rate of the variance of the transition distribution to the stationary variance: $$ max(\lambda_{A \otimes{A}}) \quad $$ Again, lower values suggest greater stability

Major advantages of using mvgam to compute these metrics are that well-calibrated uncertainties are available and that VAR processes are forced to be stationary. These properties make it simple and insightful to calculate and inspect aspects of both long-term and short-term stability. But it is also possible to more directly inspect possible interactions among the time series in a latent VAR process. To do so, you can calculate and plot Generalized or Orthogonalized Impulse Response Functions using the irf function.

References

AR Ives, B Dennis, KL Cottingham & SR Carpenter (2003). Estimating community stability and ecological interactions from time-series data. Ecological Monographs. 73, 301-330.

See also

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some time series that follow a latent VAR(1) process
simdat <- sim_mvgam(family = gaussian(),
                    n_series = 4,
                    trend_model = VAR(cor = TRUE),
                    prop_trend = 1)
plot_mvgam_series(data = simdat$data_train, series = 'all')


# Fit a model that uses a latent VAR(1)
mod <- mvgam(y ~ -1,
             trend_formula = ~ 1,
             trend_model = VAR(cor = TRUE),
             family = gaussian(),
             data = simdat$data_train,
             chains = 2,
             silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c4c5d1635.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Calulate stability metrics for this system
metrics <- stability(mod)

# Proportion of stationary forecast distribution
# attributable to lagged interactions
hist(metrics$prop_int,
     xlim = c(0, 1),
     xlab = 'Prop_int',
     main = '',
     col = '#B97C7C',
     border = 'white')


# Within this contribution of interactions, how important
# are inter-series interactions (offdiagonals of the A matrix) vs
# intra-series density dependence (diagonals of the A matrix)?
layout(matrix(1:2, nrow = 2))
hist(metrics$prop_int_offdiag,
     xlim = c(0, 1),
     xlab = '',
     main = 'Inter-series interactions',
     col = '#B97C7C',
     border = 'white')

hist(metrics$prop_int_diag,
     xlim = c(0, 1),
     xlab = 'Contribution to interaction effect',
     main = 'Intra-series interactions (density dependence)',
     col = 'darkblue',
     border = 'white')

layout(1)

# How important are inter-series error covariances
# (offdiagonals of the Sigma matrix) vs
# intra-series variances (diagonals of the Sigma matrix) for explaining
# the variance of the stationary forecast distribution?
layout(matrix(1:2, nrow = 2))
hist(metrics$prop_cov_offdiag,
     xlim = c(0, 1),
     xlab = '',
     main = 'Inter-series covariances',
     col = '#B97C7C',
     border = 'white')

hist(metrics$prop_cov_diag,
     xlim = c(0, 1),
     xlab = 'Contribution to forecast variance',
     main = 'Intra-series variances',
     col = 'darkblue',
     border = 'white')

layout(1)

# Reactivity, i.e. degree to which the system moves
# away from a stable equilibrium following a perturbation
# (values > 1 suggest a more reactive, less stable system)
hist(metrics$reactivity,
     main = '',
     xlab = 'Reactivity',
     col = '#B97C7C',
     border = 'white',
     xlim = c(-1*max(abs(metrics$reactivity)),
              max(abs(metrics$reactivity))))
abline(v = 0, lwd = 2.5)

# }
================================================ FILE: docs/reference/summary.mvgam.html ================================================ Summary for a fitted mvgam models — summary.mvgam • mvgam Skip to contents

These functions take a fitted mvgam or jsdgam object and return various useful summaries

Usage

# S3 method for class 'mvgam'
summary(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...)

# S3 method for class 'mvgam_prefit'
summary(object, ...)

# S3 method for class 'mvgam'
coef(object, summarise = TRUE, ...)

Arguments

object

list object returned from mvgam

include_betas

Logical. Print a summary that includes posterior summaries of all linear predictor beta coefficients (including spline coefficients)? Defaults to TRUE but use FALSE for a more concise summary

smooth_test

Logical. Compute estimated degrees of freedom and approximate p-values for smooth terms? Defaults to TRUE, but users may wish to set to FALSE for complex models with many smooth or random effect terms

digits

The number of significant digits for printing out the summary; defaults to 2.

...

Ignored

summarise

logical. Summaries of coefficients will be returned if TRUE. Otherwise the full posterior distribution will be returned

Value

For summary.mvgam and summary.mvgam_prefit, a list is printed on-screen showing the summaries for the model

For coef.mvgam, either a matrix of posterior coefficient distributions (if summarise == FALSE or data.frame of coefficient summaries)

Details

summary.mvgam and summary.mvgam_prefit return brief summaries of the model's call, along with posterior intervals for some of the key parameters in the model. Note that some smooths have extra penalties on the null space, so summaries for the rho parameters may include more penalty terms than the number of smooths in the original model formula. Approximate p-values for smooth terms are also returned, with methods used for their calculation following those used for mgcv equivalents (see summary.gam for details). The Estimated Degrees of Freedom (edf) for smooth terms is computed using either edf.type = 1 for models with no trend component, or edf.type = 0 for models with trend components. These are described in the documentation for jagam. Experiments suggest these p-values tend to be more conservative than those that might be returned from an equivalent model fit with summary.gam using method = 'REML'

coef.mvgam returns either summaries or full posterior estimates for GAM component coefficients

Author

Nicholas J Clark

================================================ FILE: docs/reference/ti.html ================================================ Defining smooths in mvgam formulae — ti • mvgam Skip to contents

Functions used in definition of smooth terms within model formulae. The functions do not evaluate a (spline) smooth - they exist purely to help set up mvgam models using spline based smooths.

Usage

ti(...)

te(...)

Arguments

...

Arguments passed to mgcv::ti or mgcv::te

Details

The functions defined here are just simple wrappers of the respective functions of the mgcv package. When using them, please cite the appropriate references obtained via citation("mgcv").

See also

Examples

# \donttest{
# Simulate some data
dat <- mgcv::gamSim(1, n = 200, scale = 2)
#> Gu & Wahba 4 term additive model

# Fit univariate smooths for all predictors
fit1 <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3),
              data = dat, chains = 2, family = gaussian())
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 5.5 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 5.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 5.7 seconds.
#> Total execution time: 5.9 seconds.
#> 
summary(fit1)
#> GAM formula:
#> y ~ s(x0) + s(x1) + s(x2) + s(x3)
#> <environment: 0x0000013426d46318>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 200 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1]  1.9 2.1   2.4    1   957
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%     50%  97.5% Rhat n_eff
#> (Intercept)  7.500  7.8000  8.100 1.00  1593
#> s(x0).1     -0.330  0.1200  0.640 1.00   588
#> s(x0).2     -1.300  0.3400  2.500 1.00   282
#> s(x0).3     -0.380  0.0820  0.590 1.00   376
#> s(x0).4     -1.500 -0.3200  0.520 1.01   245
#> s(x0).5     -0.240  0.0760  0.430 1.00   387
#> s(x0).6     -1.300 -0.3400  0.430 1.00   260
#> s(x0).7     -0.150  0.0058  0.170 1.00   498
#> s(x0).8     -0.330  1.7000  5.000 1.01   206
#> s(x0).9     -0.430 -0.0044  0.430 1.00   581
#> s(x1).1     -0.300 -0.0095  0.190 1.00   647
#> s(x1).2     -0.410 -0.0100  0.320 1.00   592
#> s(x1).3     -0.077  0.0088  0.160 1.00   445
#> s(x1).4     -0.300 -0.0140  0.150 1.00   457
#> s(x1).5     -0.045  0.0042  0.093 1.00   507
#> s(x1).6     -0.120  0.0160  0.250 1.00   452
#> s(x1).7     -0.058  0.0076  0.120 1.00   466
#> s(x1).8     -0.980 -0.0840  0.340 1.00   328
#> s(x1).9      1.400  1.9000  2.200 1.00   818
#> s(x2).1      3.500  5.2000  6.900 1.00   321
#> s(x2).2      2.600 11.0000 19.000 1.00   223
#> s(x2).3     -4.600 -3.0000 -1.200 1.00   375
#> s(x2).4     -6.400  0.2100  6.000 1.00   242
#> s(x2).5     -1.500  0.8000  3.100 1.00   334
#> s(x2).6     -7.900 -2.0000  4.500 1.00   258
#> s(x2).7      0.650  2.3000  4.100 1.00   798
#> s(x2).8     -1.500 12.0000 24.000 1.00   228
#> s(x2).9     -0.520  0.0330  0.900 1.00   292
#> s(x3).1     -0.140  0.0150  0.230 1.01   468
#> s(x3).2     -0.330  0.0012  0.270 1.01   194
#> s(x3).3     -0.077 -0.0027  0.062 1.00   433
#> s(x3).4     -0.240 -0.0066  0.150 1.01   153
#> s(x3).5     -0.091 -0.0049  0.067 1.01   244
#> s(x3).6     -0.120  0.0077  0.190 1.00   147
#> s(x3).7     -0.048  0.0051  0.086 1.01   195
#> s(x3).8     -0.710 -0.0500  0.350 1.00   119
#> s(x3).9     -0.320 -0.0660  0.200 1.00   996
#> 
#> Approximate significance of GAM smooths:
#>        edf Ref.df Chi.sq p-value    
#> s(x0) 3.76      9   39.6    0.07 .  
#> s(x1) 2.33      9  653.7  <2e-16 ***
#> s(x2) 7.71      9 1213.2  <2e-16 ***
#> s(x3) 1.16      9    1.7    1.00    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 12 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Aug 29 11:16:04 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
conditional_effects(fit1)





# Fit a more complicated smooth model
fit2 <- mvgam(y ~ te(x0, x1) + s(x2, by = x3),
              data = dat, chains = 2, family = gaussian())
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 3.9 seconds.
#> Chain 2 finished in 3.9 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 3.9 seconds.
#> Total execution time: 4.0 seconds.
#> 
summary(fit2)
#> GAM formula:
#> y ~ te(x0, x1) + s(x2, by = x3)
#> <environment: 0x0000013426d46318>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 200 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1]  2.4 2.7     3    1  1067
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%    50%   97.5% Rhat n_eff
#> (Intercept)   7.50  8.200  8.9000    1   745
#> te(x0,x1).1  -3.20 -1.900 -0.7400    1   314
#> te(x0,x1).2  -2.10 -0.620  0.5900    1   406
#> te(x0,x1).3  -0.38  1.300  2.8000    1   582
#> te(x0,x1).4   0.63  2.700  4.7000    1   727
#> te(x0,x1).5  -3.50 -2.300 -1.0000    1   701
#> te(x0,x1).6  -1.20 -0.320  0.7300    1   404
#> te(x0,x1).7   0.19  1.100  2.1000    1   403
#> te(x0,x1).8   2.40  3.500  4.8000    1   393
#> te(x0,x1).9   3.10  4.400  5.6000    1   677
#> te(x0,x1).10 -3.70 -2.400 -1.0000    1   464
#> te(x0,x1).11 -1.20 -0.120  0.9900    1   286
#> te(x0,x1).12  0.45  1.500  2.5000    1   286
#> te(x0,x1).13  2.50  3.600  4.8000    1   316
#> te(x0,x1).14  3.50  4.700  6.2000    1   489
#> te(x0,x1).15 -4.40 -3.100 -1.9000    1   694
#> te(x0,x1).16 -1.80 -0.750  0.3400    1   475
#> te(x0,x1).17 -0.14  0.780  1.7000    1   497
#> te(x0,x1).18  2.30  3.400  4.6000    1   444
#> te(x0,x1).19  3.00  4.300  5.8000    1   519
#> te(x0,x1).20 -7.30 -4.800 -2.3000    1   715
#> te(x0,x1).21 -4.20 -2.700 -1.2000    1   491
#> te(x0,x1).22 -2.70 -1.200  0.0057    1   415
#> te(x0,x1).23 -0.58  1.100  2.7000    1   425
#> te(x0,x1).24 -0.21  2.400  4.7000    1   481
#> s(x2):x3.1   -1.40  2.100  5.3000    1   515
#> s(x2):x3.2    3.20  5.500  7.7000    1  1238
#> s(x2):x3.3   -0.10  3.400  6.7000    1   690
#> s(x2):x3.4   -4.10 -1.100  1.4000    1   690
#> s(x2):x3.5   -9.50 -5.300 -1.9000    1   621
#> s(x2):x3.6   -0.32  2.300  5.4000    1   971
#> s(x2):x3.7   -0.58  2.500  6.0000    1   882
#> s(x2):x3.8   -1.10  1.400  4.4000    1   887
#> s(x2):x3.9   -0.71  0.011  0.7800    1   531
#> s(x2):x3.10  -0.57  0.032  0.7900    1   482
#> 
#> Approximate significance of GAM smooths:
#>             edf Ref.df Chi.sq p-value    
#> te(x0,x1) 11.79     24    913  <2e-16 ***
#> s(x2):x3   7.74     10    640  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 12 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Thu Aug 29 11:16:53 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
conditional_effects(fit2)


# }

================================================ FILE: docs/reference/update.mvgam.html ================================================ Update an existing mvgam model object — update.mvgam • mvgam Skip to contents

This function allows a previously fitted mvgam model to be updated

Usage

# S3 method for class 'mvgam'
update(
  object,
  formula,
  trend_formula,
  knots,
  trend_knots,
  trend_model,
  family,
  share_obs_params,
  data,
  newdata,
  trend_map,
  use_lv,
  n_lv,
  priors,
  chains,
  burnin,
  samples,
  threads,
  algorithm,
  lfo = FALSE,
  ...
)

# S3 method for class 'jsdgam'
update(
  object,
  formula,
  factor_formula,
  knots,
  factor_knots,
  data,
  newdata,
  n_lv,
  family,
  share_obs_params,
  priors,
  chains,
  burnin,
  samples,
  threads,
  algorithm,
  lfo = FALSE,
  ...
)

Arguments

object

list object returned from mvgam. See mvgam()

formula

Optional new formula object. Note, mvgam currently does not support dynamic formula updates such as removal of specific terms with - term. When updating, the entire formula needs to be supplied

trend_formula

An optional formula object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. Should not have a response variable specified on the left-hand side of the formula (i.e. a valid option would be ~ season + s(year)). Also note that you should not use the identifier series in this formula to specify effects that vary across time series. Instead you should use trend. This will ensure that models in which a trend_map is supplied will still work consistently (i.e. by allowing effects to vary across process models, even when some time series share the same underlying process model). This feature is only currently available for RW(), AR() and VAR() trend models. In nmix() family models, the trend_formula is used to set up a linear predictor for the underlying latent abundance. Be aware that it can be very challenging to simultaneously estimate intercept parameters for both the observation mode (captured by formula) and the process model (captured by trend_formula). Users are recommended to drop one of these using the - 1 convention in the formula right hand side.

knots

An optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

trend_knots

As for knots above, this is an optional list of knot values for smooth functions within the trend_formula

trend_model

character or function specifying the time series dynamics for the latent trend. Options are:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • ZMVN or ZMVN() (Zero-Mean Multivariate Normal; only available in Stan)

  • 'RW' or RW()

  • 'AR1' or AR(p = 1)

  • 'AR2' or AR(p = 2)

  • 'AR3' or AR(p = 3)

  • 'CAR1' or CAR(p = 1)

  • 'VAR1' or VAR()(only available in Stan)

  • 'PWlogistic, 'PWlinear' or PW() (only available in Stan)

  • 'GP' or GP() (Gaussian Process with squared exponential kernel; only available in Stan)

For all trend types apart from ZMVN(), GP(), CAR() and PW(), moving average and/or correlated process error terms can also be estimated (for example, RW(cor = TRUE) will set up a multivariate Random Walk if n_series > 1). It is also possible for many multivariate trends to estimate hierarchical correlations if the data are structured among levels of a relevant grouping factor. See mvgam_trends for more details and see ZMVN for an example.

family

family specifying the exponential observation family for the series. Currently supported families are:

  • gaussian() for real-valued data

  • betar() for proportional data on (0,1)

  • lognormal() for non-negative real-valued data

  • student_t() for real-valued data

  • Gamma() for non-negative real-valued data

  • bernoulli() for binary data

  • poisson() for count data

  • nb() for overdispersed count data

  • binomial() for count data with imperfect detection when the number of trials is known; note that the cbind() function must be used to bind the discrete observations and the discrete number of trials

  • beta_binomial() as for binomial() but allows for overdispersion

  • nmix() for count data with imperfect detection when the number of trials is unknown and should be modeled via a State-Space N-Mixture model. The latent states are Poisson, capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. See mvgam_families for an example of how to use this family

Default is poisson(). See mvgam_families for more details

share_obs_params

logical. If TRUE and the family has additional family-specific observation parameters (e.g. variance components in student_t() or gaussian(), or dispersion parameters in nb() or betar()), these parameters will be shared across all outcome variables. This is handy if you have multiple outcomes (time series in most mvgam models) that you believe share some properties, such as being from the same species over different spatial units. Default is FALSE.

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and optional trend_formula. Most models should include columns:

  • series (a factor index of the series IDs; the number of levels should be identical to the number of unique series labels (i.e. n_series = length(levels(data$series))))

  • time (numeric or integer index of the time point for each observation). For most dynamic trend types available in mvgam (see argument trend_model), time should be measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)). However you can use irregularly spaced intervals if using trend_model = CAR(1), though note that any temporal intervals that are exactly 0 will be adjusted to a very small number (1e-12) to prevent sampling errors. See an example of CAR() trends in CAR

Note however that there are special cases where these identifiers are not needed. For example, models with hierarchical temporal correlation processes (e.g. AR(gr = region, subgr = species)) should NOT include a series identifier, as this will be constructed internally (see mvgam_trends and AR for details). mvgam can also fit models that do not include a time variable if there are no temporal dynamic structures included (i.e. trend_model = 'None' or trend_model = ZMVN()). data should also include any other variables to be included in the linear predictor of formula

newdata

Optional dataframe or list of test data containing the same variables as in data. If included, the observations in variable y will be set to NA when fitting the model so that posterior simulations can be obtained

trend_map

Optional data.frame specifying which series should depend on which latent trends. Useful for allowing multiple series to depend on the same latent trend process, but with different observation processes. If supplied, a latent factor model is set up by setting use_lv = TRUE and using the mapping to set up the shared trends. Needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each series in the data (names should perfectly match factor levels of the series variable in data). Note that if this is supplied, the intercept parameter in the process model will NOT be automatically suppressed. Not yet supported for models in wich the latent factors evolve in continuous time (CAR()). See examples for details

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for RW(), AR() and GP() trend models. Defaults to FALSE

n_lv

integer the number of latent dynamic factors to use if use_lv == TRUE. Cannot be > n_series. Defaults arbitrarily to min(2, floor(n_series / 2))

priors

An optional data.frame with prior definitions or, preferentially, a vector containing objects of class brmsprior (see. prior for details). See get_mvgam_priors and Details' for more information on changing default prior distributions

chains

integer specifying the number of parallel chains for the model. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

burnin

integer specifying the number of warmup iterations of the Markov chain to run to tune sampling algorithms. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

samples

integer specifying the number of post-warmup iterations of the Markov chain to run for sampling the posterior distribution

threads

integer Experimental option to use multithreading for within-chain parallelisation in Stan. We recommend its use only if you are experienced with Stan's reduce_sum function and have a slow running model that cannot be sped up by any other means. Currently works for all families apart from nmix() and when using Cmdstan as the backend

algorithm

Character string naming the estimation approach to use. Options are "sampling" for MCMC (the default), "meanfield" for variational inference with factorized normal distributions, "fullrank" for variational inference with a multivariate normal distribution, "laplace" for a Laplace approximation (only available when using cmdstanr as the backend) or "pathfinder" for the pathfinder algorithm (only currently available when using cmdstanr as the backend). Can be set globally for the current R session via the "brms.algorithm" option (see options). Limited testing suggests that "meanfield" performs best out of the non-MCMC approximations for dynamic GAMs, possibly because of the difficulties estimating covariances among the many spline parameters and latent trend parameters. But rigorous testing has not been carried out

lfo

Logical indicating whether this is part of a call to lfo_cv.mvgam. Returns a lighter version of the model with no residuals and fewer monitored parameters to speed up post-processing. But other downstream functions will not work properly, so users should always leave this set as FALSE

...

Other arguments to be passed to mvgam or jsdgam

factor_formula

Optional new formula object for the factor linear predictors

factor_knots

An optional list containing user specified knot values to be used for basis construction of any smooth terms in factor_formula. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

Value

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each outcome variable and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some data and fit a Poisson AR1 model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(y ~ s(season, bs = 'cc'),
             trend_model = AR(),
             noncentred = TRUE,
             data = simdat$data_train,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-1618324c4194.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |
#>                                 ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/
#> tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                             
#>                  \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 0.3 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.3 seconds.
#> Total execution time: 0.5 seconds.
#> 
summary(mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x0000019b13fe8088>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%    50%  97.5% Rhat n_eff
#> (Intercept) -0.900 -0.550 -0.220 1.01   563
#> s(season).1 -0.410  0.170  0.710 1.02   190
#> s(season).2 -0.045  0.540  1.100 1.00   675
#> s(season).3  0.280  0.810  1.400 1.00   721
#> s(season).4  0.570  1.100  1.700 1.01   398
#> s(season).5  0.190  0.730  1.300 1.01   519
#> s(season).6 -0.780 -0.095  0.560 1.00   548
#> s(season).7 -1.300 -0.550  0.093 1.00   260
#> s(season).8 -1.300 -0.690 -0.160 1.00   524
#> 
#> Approximate significance of GAM smooths:
#>            edf Ref.df Chi.sq p-value  
#> s(season) 2.62      8   33.4   0.049 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Latent trend parameter AR estimates:
#>           2.5%   50% 97.5% Rhat n_eff
#> ar1[1]   -0.92 0.045  0.88 1.00   623
#> sigma[1]  0.01 0.250  0.69 1.01   164
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 9:36:53 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'link')


# Update to an AR2 model
updated_mod <- update(mod, trend_model = AR(p = 2),
                      noncentred = TRUE)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-161848cf2c24.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in e
#> xpansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                           
#>    \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/t
#> bb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tb
#> b/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 0.3 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#> 
summary(updated_mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x0000019b13fe8088>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(p = 2)
#> 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50%  97.5% Rhat n_eff
#> (Intercept) -0.910 -0.53 -0.240    1   623
#> s(season).1 -0.330  0.18  0.720    1   895
#> s(season).2 -0.037  0.55  1.100    1   931
#> s(season).3  0.250  0.81  1.400    1   902
#> s(season).4  0.630  1.10  1.700    1   957
#> s(season).5  0.230  0.72  1.300    1  1065
#> s(season).6 -0.780 -0.11  0.490    1  1053
#> s(season).7 -1.300 -0.60  0.065    1   885
#> s(season).8 -1.300 -0.71 -0.150    1   966
#> 
#> Approximate significance of GAM smooths:
#>            edf Ref.df Chi.sq p-value   
#> s(season) 3.01      8   31.2  0.0036 **
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Latent trend AR parameter estimates:
#>             2.5%    50% 97.5% Rhat n_eff
#> ar1[1]   -0.9100 -0.014  0.90    1  1066
#> ar2[1]   -0.9000 -0.037  0.89    1  1032
#> sigma[1]  0.0086  0.180  0.58    1   370
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 9:37:24 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(updated_mod) to get started describing this model
conditional_effects(updated_mod, type = 'link')


# Now update to a Binomial AR1 by adding information on trials
# requires that we supply newdata that contains the 'trials' variable
simdat$data_train$trials <- max(simdat$data_train$y) + 15
updated_mod <- update(mod,
                      formula = cbind(y, trials) ~ s(season, bs = 'cc'),
                      noncentred = TRUE,
                      data = simdat$data_train,
                      family = binomial())
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/tbb_profiling.h:123,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task.h:36,
#>                  from stan/lib/stan_math/lib/tbb_2020.3/include/tbb/task_arena.h:23,
#>                  from stan/lib/stan_math/stan/math/prim/core/init_threadpool_tbb.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/core.hpp:4,
#>                  from stan/lib/stan_math/stan/math/rev/core/Eigen_NumTraits.hpp:5,
#>                  from stan/lib/stan_math/stan/math/rev/core/typedefs.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/core/chainable_object.hpp:6,
#>                  from stan/lib/stan_math/stan/math/rev/core.hpp:10,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmpodm0uo/model-16184b0a5512.hpp:2:
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   422 |     constexpr atomic<T>(const atomic<T>& rhs): internal::atomic_impl<T>(rhs) {}
#>       |                        ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:422:24: note: remove the '< >'
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:454:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   454 | __TBB_DECL_ATOMIC(__TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:455:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   455 | __TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++
#> 20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:459:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   459 | __TBB_DECL_ATOMIC(long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:460:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   460 | __TBB_DECL_ATOMIC(unsigned long)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:491:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   491 | __TBB_DECL_ATOMIC(unsigned)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:492:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   492 | __TBB_DECL_ATOMIC(int)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |
#>                                 ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:495:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   495 | __TBB_DECL_ATOMIC(unsigned short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:496:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   496 | __TBB_DECL_ATOMIC(short)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/
#> tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:497:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   497 | __TBB_DECL_ATOMIC(char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:498:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   498 | __TBB_DECL_ATOMIC(signed char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:499:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   499 | __TBB_DECL_ATOMIC(unsigned char)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:437:32: note: remove the '< >'
#>   437 |             constexpr atomic<T>(const atomic<T>& rhs):                                              \
#>       |                                ^
#> stan/lib/stan_math/lib/tbb_2020.3/include/tbb/atomic.h:502:1: note: in expansion of macro '__TBB_DECL_ATOMIC'
#>   502 | __TBB_DECL_ATOMIC(wchar_t)
#>       | ^~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/stan/math/prim/prob/normal_ccdf_log.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:243,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16:
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::normal_lccdf(const T_y&, const T_loc&, const T_scale&)':
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>    68 |     } else if (scaled_diff > 8.25 * INV_SQRT_TWO) {
#> stan/lib/stan_math/stan/math/prim/prob/normal_lccdf.hpp:68: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 0.3 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.3 seconds.
#> Total execution time: 0.5 seconds.
#> 
summary(updated_mod)
#> GAM formula:
#> cbind(y, trials) ~ s(season, bs = "cc")
#> <environment: 0x0000019b13fe8088>
#> 
#> Family:
#> binomial
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50%  97.5% Rhat n_eff
#> (Intercept) -3.800 -3.40 -3.100    1   555
#> s(season).1 -0.360  0.19  0.740    1   476
#> s(season).2  0.024  0.55  1.100    1   598
#> s(season).3  0.230  0.83  1.400    1   672
#> s(season).4  0.530  1.10  1.700    1   499
#> s(season).5  0.140  0.75  1.300    1   576
#> s(season).6 -0.780 -0.11  0.470    1   828
#> s(season).7 -1.300 -0.59  0.025    1   677
#> s(season).8 -1.400 -0.70 -0.094    1   520
#> 
#> Approximate significance of GAM smooths:
#>            edf Ref.df Chi.sq p-value  
#> s(season) 2.56      8   34.8   0.019 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Latent trend parameter AR estimates:
#>            2.5%   50% 97.5% Rhat n_eff
#> ar1[1]   -0.900 0.019  0.93    1   569
#> sigma[1]  0.012 0.250  0.73    1   269
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 26 9:37:54 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(updated_mod) to get started describing this model
conditional_effects(updated_mod, type = 'link')

# }
================================================ FILE: docs/search.json ================================================ [{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"required-long-data-format","dir":"Articles","previous_headings":"","what":"Required long data format","title":"Formatting data for use in mvgam","text":"Manipulating data ‘long’ format necessary modelling mvgam. ‘long’ format, mean series x time observation needs entry dataframe list object wish use data modelling. simple example can viewed simulating data using sim_mvgam function. See ?sim_mvgam details","code":"simdat <- sim_mvgam(n_series = 4, T = 24, prop_missing = 0.2) head(simdat$data_train, 16) ## y season year series time ## 1 0 1 1 series_1 1 ## 2 0 1 1 series_2 1 ## 3 1 1 1 series_3 1 ## 4 0 1 1 series_4 1 ## 5 NA 2 1 series_1 2 ## 6 2 2 1 series_2 2 ## 7 3 2 1 series_3 2 ## 8 1 2 1 series_4 2 ## 9 1 3 1 series_1 3 ## 10 1 3 1 series_2 3 ## 11 1 3 1 series_3 3 ## 12 5 3 1 series_4 3 ## 13 0 4 1 series_1 4 ## 14 3 4 1 series_2 4 ## 15 NA 4 1 series_3 4 ## 16 1 4 1 series_4 4"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"series-as-a-factor-variable","dir":"Articles","previous_headings":"Required long data format","what":"series as a factor variable","title":"Formatting data for use in mvgam","text":"Notice four different time series simulated data, identified series-level indicator factor variable. important number levels matches number unique series data ensure indexing across series works properly underlying modelling functions. Several main workhorse functions package (including mvgam() get_mvgam_priors()) give error case, may worth checking anyway: Note can technically supply data series indicator, package assume using single time series. , better included confusion.","code":"class(simdat$data_train$series) ## [1] \"factor\" levels(simdat$data_train$series) ## [1] \"series_1\" \"series_2\" \"series_3\" \"series_4\" all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series)) ## [1] TRUE"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"a-single-outcome-variable","dir":"Articles","previous_headings":"Required long data format","what":"A single outcome variable","title":"Formatting data for use in mvgam","text":"may also notices spread numeric / integer-classed outcome variable different columns. Rather, single column outcome variable, labelled y simulated data (though outcome labelled y). another important requirement mvgam, shouldn’t unfamiliar R users frequently use modelling packages lme4, mgcv, brms many regression modelling packages . advantage format now easy specify effects vary among time series: Depending observation families plan use building models, may restrictions need satisfied within outcome variable. example, Beta regression can handle proportional data, values >= 1 <= 0 allowed. Likewise, Poisson regression can handle non-negative integers. regression functions R assume user knows issue warnings errors choose wrong distribution, often ends leading unhelpful error optimizer difficult interpret diagnose. mvgam attempt provide errors something simply allowed. example, can simulate data zero-centred Gaussian distribution (ensuring values < 1) attempt Beta regression mvgam using betar family: call gam using mgcv package leads model actually fits (though give unhelpful warning message): call mvgam gives us something useful: Please see ?mvgam_families information types responses package can handle restrictions","code":"summary(glm(y ~ series + time, data = simdat$data_train, family = poisson())) ## ## Call: ## glm(formula = y ~ series + time, family = poisson(), data = simdat$data_train) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.115625 0.480029 -2.324 0.020121 * ## seriesseries_2 1.825006 0.481892 3.787 0.000152 *** ## seriesseries_3 1.886945 0.479838 3.932 8.41e-05 *** ## seriesseries_4 2.129507 0.473202 4.500 6.79e-06 *** ## time 0.001744 0.017862 0.098 0.922214 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 154.10 on 59 degrees of freedom ## Residual deviance: 118.11 on 55 degrees of freedom ## (12 observations deleted due to missingness) ## AIC: 233.48 ## ## Number of Fisher Scoring iterations: 5 summary(gam(y ~ series + s(time, by = series), data = simdat$data_train, family = poisson())) ## ## Family: poisson ## Link function: log ## ## Formula: ## y ~ series + s(time, by = series) ## ## Parametric coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.1233 0.4588 -2.448 0.01436 * ## seriesseries_2 1.6315 0.5050 3.231 0.00123 ** ## seriesseries_3 1.4089 0.5298 2.659 0.00783 ** ## seriesseries_4 1.9757 0.4923 4.013 5.99e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df Chi.sq p-value ## s(time):seriesseries_1 1.000 1.000 0.156 0.692804 ## s(time):seriesseries_2 3.658 4.514 13.424 0.013628 * ## s(time):seriesseries_3 4.043 4.970 23.958 0.000224 *** ## s(time):seriesseries_4 3.946 4.848 16.194 0.005763 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.713 Deviance explained = 70% ## UBRE = 0.32659 Scale est. = 1 n = 60 gauss_dat <- data.frame(outcome = rnorm(10), series = factor('series1', levels = 'series1'), time = 1:10) gauss_dat ## outcome series time ## 1 -0.6486480 series1 1 ## 2 -0.8429284 series1 2 ## 3 -0.1077338 series1 3 ## 4 1.3225989 series1 4 ## 5 0.2371594 series1 5 ## 6 1.0564309 series1 6 ## 7 -0.4418067 series1 7 ## 8 -0.1915395 series1 8 ## 9 -0.2401515 series1 9 ## 10 0.4005033 series1 10 gam(outcome ~ time, family = betar(), data = gauss_dat) ## Warning in family$saturated.ll(y, prior.weights, theta): saturated likelihood ## may be inaccurate ## ## Family: Beta regression(0.09) ## Link function: logit ## ## Formula: ## outcome ~ time ## Total model degrees of freedom 2 ## ## REML score: -205.1333 mvgam(outcome ~ time, family = betar(), data = gauss_dat) ## Error: Values <= 0 not allowed for beta responses"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"a-time-variable","dir":"Articles","previous_headings":"Required long data format","what":"A time variable","title":"Formatting data for use in mvgam","text":"requirement modelling mvgam numeric / integer-classed variable labelled time ensure modelling software knows arrange time series building models. setup still allows us formulate multivariate time series models. plan use autoregressive dynamic trend functions available mvgam (see ?mvgam_trends details available dynamic processes), need ensure time series entered fixed sampling interval (.e. time timesteps 1 2 time timesteps 2 3, etc…). note can missing observations () series. mvgam check , useful ensure missing timepoint x series combinations data. can generally simple dplyr call: Note models use dynamic components assume smaller values time older (.e. time = 1 came time = 2, etc…)","code":"# A function to ensure all timepoints within a sequence are identical all_times_avail = function(time, min_time, max_time){ identical(as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time))) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame(series = simdat$data_train$series, time = simdat$data_train$time) %>% dplyr::group_by(series) %>% dplyr::summarise(all_there = all_times_avail(time, min_time, max_time)) -> checked_times if(any(checked_times$all_there == FALSE)){ warning(\"One or more series in is missing observations for one or more timepoints\") } else { cat('All series have observations at all timepoints :)') } ## All series have observations at all timepoints :)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"checking-data-with-get_mvgam_priors","dir":"Articles","previous_headings":"","what":"Checking data with get_mvgam_priors","title":"Formatting data for use in mvgam","text":"get_mvgam_priors function designed return information parameters model whose prior distributions can modified user. , perform series checks ensure data formatted properly. can therefore useful new users ensuring isn’t anything strange going data setup. example, can replicate steps taken (check factor levels timepoint x series combinations) single call get_mvgam_priors. first simulate data timepoints time variable included data: Next call get_mvgam_priors simply specifying intercept-model, enough trigger checks: error useful tells us problem . many ways fill missing timepoints, correct way left user. don’t covariates, pretty easy using expand.grid: Now call get_mvgam_priors, using filled data, work: function also pick misaligned factor levels series variable. can check simulating, time adding additional factor level included data: Another call get_mvgam_priors brings useful error: Following message’s advice tells us level series_2 series variable, observations series data: Re-assigning levels fixes issue:","code":"bad_times <- data.frame(time = seq(1, 16, by = 2), series = factor('series_1'), outcome = rnorm(8)) bad_times ## time series outcome ## 1 1 series_1 -0.4906341 ## 2 3 series_1 0.1574320 ## 3 5 series_1 2.6049355 ## 4 7 series_1 0.0328311 ## 5 9 series_1 1.2738171 ## 6 11 series_1 1.4794145 ## 7 13 series_1 0.3193854 ## 8 15 series_1 0.1712443 get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian()) ## Error: One or more series in data is missing observations for one or more timepoints bad_times %>% dplyr::right_join(expand.grid(time = seq(min(bad_times$time), max(bad_times$time)), series = factor(unique(bad_times$series), levels = levels(bad_times$series)))) %>% dplyr::arrange(time) -> good_times ## Joining with `by = join_by(time, series)` good_times ## time series outcome ## 1 1 series_1 -0.4906341 ## 2 2 series_1 NA ## 3 3 series_1 0.1574320 ## 4 4 series_1 NA ## 5 5 series_1 2.6049355 ## 6 6 series_1 NA ## 7 7 series_1 0.0328311 ## 8 8 series_1 NA ## 9 9 series_1 1.2738171 ## 10 10 series_1 NA ## 11 11 series_1 1.4794145 ## 12 12 series_1 NA ## 13 13 series_1 0.3193854 ## 14 14 series_1 NA ## 15 15 series_1 0.1712443 get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian()) ## param_name param_length param_info ## 1 (Intercept) 1 (Intercept) ## 2 vector[n_series] sigma_obs; 1 observation error sd ## prior example_change ## 1 (Intercept) ~ student_t(3, 0.2, 2.5); (Intercept) ~ normal(0, 1); ## 2 sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(-0.49, 0.83); ## new_lowerbound new_upperbound ## 1 NA NA ## 2 NA NA bad_levels <- data.frame(time = 1:8, series = factor('series_1', levels = c('series_1', 'series_2')), outcome = rnorm(8)) levels(bad_levels$series) ## [1] \"series_1\" \"series_2\" get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian()) ## Error: Mismatch between factor levels of \"series\" and unique values of \"series\" ## Use ## `setdiff(levels(data$series), unique(data$series))` ## and ## `intersect(levels(data$series), unique(data$series))` ## for guidance setdiff(levels(bad_levels$series), unique(bad_levels$series)) ## [1] \"series_2\" bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ## [1] \"series_1\" get_mvgam_priors(outcome ~ 1, data = good_levels, family = gaussian()) ## param_name param_length param_info ## 1 (Intercept) 1 (Intercept) ## 2 vector[n_series] sigma_obs; 1 observation error sd ## prior example_change ## 1 (Intercept) ~ student_t(3, -0.3, 2.5); (Intercept) ~ normal(0, 1); ## 2 sigma_obs ~ student_t(3, 0, 2.5); sigma_obs ~ normal(-0.42, 0.95); ## new_lowerbound new_upperbound ## 1 NA NA ## 2 NA NA"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"covariates-with-no-nas","dir":"Articles","previous_headings":"Checking data with get_mvgam_priors","what":"Covariates with no NAs","title":"Formatting data for use in mvgam","text":"Covariates can used models just using mgcv (see ?formula.gam details formula syntax). although outcome variable can NAs, covariates . regression software silently drop raws model matrix NAs, helpful debugging. mvgam get_mvgam_priors functions run simple checks , hopefully return useful errors finds missing values: Just like mgcv package, mvgam can also accept data list object. useful want set linear functional predictors even distributed lag predictors. checks run mvgam still work data. change cov predictor matrix: call mvgam returns error:","code":"miss_dat <- data.frame(outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor('series1', levels = 'series1'), time = 1:10) miss_dat ## outcome cov series time ## 1 -0.15058064 NA series1 1 ## 2 0.03178451 0.7289831 series1 2 ## 3 1.00065562 0.9212862 series1 3 ## 4 -0.03331216 -0.5952961 series1 4 ## 5 -0.01451934 2.0583251 series1 5 ## 6 0.36674707 -0.6042471 series1 6 ## 7 0.67099411 1.2095204 series1 7 ## 8 -1.41582781 -0.6412307 series1 8 ## 9 -0.65945889 1.2445165 series1 9 ## 10 1.11464265 0.3423697 series1 10 get_mvgam_priors(outcome ~ cov, data = miss_dat, family = gaussian()) ## Error: Missing values found in data predictors: ## Error in na.fail.default(structure(list(outcome = c(-0.150580643557141, : missing values in object miss_dat <- list(outcome = rnorm(10), series = factor('series1', levels = 'series1'), time = 1:10) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2,3] <- NA get_mvgam_priors(outcome ~ cov, data = miss_dat, family = gaussian()) ## Error: Missing values found in data predictors: ## Error in na.fail.default(structure(list(outcome = c(-0.971128623892835, : missing values in object"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"plotting-with-plot_mvgam_series","dir":"Articles","previous_headings":"","what":"Plotting with plot_mvgam_series","title":"Formatting data for use in mvgam","text":"Plotting data useful way ensure everything looks ok, ’ve gone throug checks factor levels timepoint x series combinations. plot_mvgam_series function take supplied data plot either series line plots (choose series = '') set plots describe distribution single time series. example, plot time series data, highlight single series plot, can use: can look closely distribution first time series: split data training testing folds (.e. forecast evaluation), can include test data plots:","code":"plot_mvgam_series(data = simdat$data_train, y = 'y', series = 'all') plot_mvgam_series(data = simdat$data_train, y = 'y', series = 1) plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test, y = 'y', series = 1)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html","id":"example-with-neon-tick-data","dir":"Articles","previous_headings":"","what":"Example with NEON tick data","title":"Formatting data for use in mvgam","text":"give one example data can reformatted mvgam modelling, use observations National Ecological Observatory Network (NEON) tick drag cloth samples. Ixodes scapularis widespread tick species capable transmitting diversity parasites animals humans, many zoonotic. Due medical ecological importance tick species, common goal understand factors influence abundances. NEON field team carries standardised long-term monitoring tick abundances well important indicators ecological change. Nymphal abundance . scapularis routinely recorded across NEON plots using field sampling method called drag cloth sampling, common method sampling ticks landscape. Field researchers sample ticks dragging large cloth behind terrain suspected harboring ticks, usually working grid-like pattern. sites sampled since 2014, resulting rich dataset nymph abundance time series. tick time series show strong seasonality incorporate many challenging features associated ecological data including overdispersion, high proportions missingness irregular sampling time, making useful exploring utility dynamic GAMs. begin loading NEON tick data years 2014 - 2021, downloaded NEON prepared described Clark & Wells 2022. can read bit data using call ?all_neon_tick_data exercise, use epiWeek variable index seasonality, work observations sampling plots (labelled plotID column): Now can select target species want (. scapularis), filter correct plot IDs convert epiWeek variable character numeric: Now tricky part: need fill missing observations NAs. tick data sparse field observers go sample possible epiWeek. many particular weeks observations included data. can use expand.grid take care : Create series variable needed mvgam modelling: Now create time variable, needs track Year epiWeek unique series. n function dplyr often useful generating time index grouped dataframes: Check factor levels series: looks good, rigorous check using get_mvgam_priors: can also set model mvgam use run_model = FALSE ensure necessary steps creating modelling code objects run. recommended use cmdstanr backend possible, auto-formatting options available package useful checking package-generated Stan code inefficiencies can fixed lead sampling performance improvements: call runs without issue, resulting object now contains model code data objects needed initiate sampling:","code":"data(\"all_neon_tick_data\") str(dplyr::ungroup(all_neon_tick_data)) ## tibble [3,505 × 24] (S3: tbl_df/tbl/data.frame) ## $ Year : num [1:3505] 2015 2015 2015 2015 2015 ... ## $ epiWeek : chr [1:3505] \"37\" \"38\" \"39\" \"40\" ... ## $ yearWeek : chr [1:3505] \"201537\" \"201538\" \"201539\" \"201540\" ... ## $ plotID : chr [1:3505] \"BLAN_005\" \"BLAN_005\" \"BLAN_005\" \"BLAN_005\" ... ## $ siteID : chr [1:3505] \"BLAN\" \"BLAN\" \"BLAN\" \"BLAN\" ... ## $ nlcdClass : chr [1:3505] \"deciduousForest\" \"deciduousForest\" \"deciduousForest\" \"deciduousForest\" ... ## $ decimalLatitude : num [1:3505] 39.1 39.1 39.1 39.1 39.1 ... ## $ decimalLongitude : num [1:3505] -78 -78 -78 -78 -78 ... ## $ elevation : num [1:3505] 168 168 168 168 168 ... ## $ totalSampledArea : num [1:3505] 162 NA NA NA 162 NA NA NA NA 164 ... ## $ amblyomma_americanum: num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ ixodes_scapularis : num [1:3505] 2 NA NA NA 0 NA NA NA NA 0 ... ## $ time : Date[1:3505], format: \"2015-09-13\" \"2015-09-20\" ... ## $ RHMin_precent : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ RHMin_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ RHMax_precent : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ RHMax_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ airTempMin_degC : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ airTempMin_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ airTempMax_degC : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ airTempMax_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ... ## $ soi : num [1:3505] -18.4 -17.9 -23.5 -28.4 -25.9 ... ## $ cum_sdd : num [1:3505] 173 173 173 173 173 ... ## $ cum_gdd : num [1:3505] 1129 1129 1129 1129 1129 ... plotIDs <- c('SCBI_013','SCBI_002', 'SERC_001','SERC_005', 'SERC_006','SERC_012', 'BLAN_012','BLAN_005') model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid(plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52))) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join(all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct()) -> model_dat ## Joining with `by = join_by(Year, epiWeek, plotID)` ## Joining with `by = join_by(plotID)` model_dat %>% dplyr::mutate(series = plotID, y = target) %>% dplyr::mutate(siteID = factor(siteID), series = factor(series)) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat levels(model_dat$series) ## [1] \"BLAN_005\" \"BLAN_012\" \"SCBI_002\" \"SCBI_013\" \"SERC_001\" \"SERC_005\" \"SERC_006\" ## [8] \"SERC_012\" get_mvgam_priors(y ~ 1, data = model_dat, family = poisson()) ## param_name param_length param_info prior ## 1 (Intercept) 1 (Intercept) (Intercept) ~ student_t(3, -2.3, 2.5); ## example_change new_lowerbound new_upperbound ## 1 (Intercept) ~ normal(0, 1); NA NA testmod <- mvgam(y ~ s(epiWeek, by = series, bs = 'cc') + s(series, bs = 're'), trend_model = 'AR1', data = model_dat, backend = 'cmdstanr', run_model = FALSE) str(testmod$model_data) ## List of 25 ## $ y : num [1:416, 1:8] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... ## $ n : int 416 ## $ X : num [1:3328, 1:73] 1 1 1 1 1 1 1 1 1 1 ... ## ..- attr(*, \"dimnames\")=List of 2 ## .. ..$ : chr [1:3328] \"1\" \"2\" \"3\" \"4\" ... ## .. ..$ : chr [1:73] \"X.Intercept.\" \"V2\" \"V3\" \"V4\" ... ## $ S1 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ zero : num [1:73] 0 0 0 0 0 0 0 0 0 0 ... ## $ S2 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S3 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S4 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S5 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S6 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S7 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ S8 : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ... ## $ p_coefs : Named num 0.747 ## ..- attr(*, \"names\")= chr \"(Intercept)\" ## $ p_taus : num 424 ## $ ytimes : int [1:416, 1:8] 1 9 17 25 33 41 49 57 65 73 ... ## $ n_series : int 8 ## $ sp : Named num [1:9] 5.81 0.965 4.058 2.407 13.975 ... ## ..- attr(*, \"names\")= chr [1:9] \"s(epiWeek):seriesBLAN_005\" \"s(epiWeek):seriesBLAN_012\" \"s(epiWeek):seriesSCBI_002\" \"s(epiWeek):seriesSCBI_013\" ... ## $ y_observed : num [1:416, 1:8] 0 0 0 0 0 0 0 0 0 0 ... ## $ total_obs : int 3328 ## $ num_basis : int 73 ## $ n_sp : num 9 ## $ n_nonmissing: int 400 ## $ obs_ind : int [1:400] 89 93 98 101 115 118 121 124 127 130 ... ## $ flat_ys : num [1:400] 2 0 0 0 0 0 0 25 36 14 ... ## $ flat_xs : num [1:400, 1:73] 1 1 1 1 1 1 1 1 1 1 ... ## ..- attr(*, \"dimnames\")=List of 2 ## .. ..$ : chr [1:400] \"705\" \"737\" \"777\" \"801\" ... ## .. ..$ : chr [1:73] \"X.Intercept.\" \"V2\" \"V3\" \"V4\" ... ## - attr(*, \"trend_model\")= chr \"AR1\" code(testmod) ## // Stan model code generated by package mvgam ## data { ## int total_obs; // total number of observations ## int n; // number of timepoints per series ## int n_sp; // number of smoothing parameters ## int n_series; // number of series ## int num_basis; // total number of basis coefficients ## vector[num_basis] zero; // prior locations for basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## matrix[8, 8] S1; // mgcv smooth penalty matrix S1 ## matrix[8, 8] S2; // mgcv smooth penalty matrix S2 ## matrix[8, 8] S3; // mgcv smooth penalty matrix S3 ## matrix[8, 8] S4; // mgcv smooth penalty matrix S4 ## matrix[8, 8] S5; // mgcv smooth penalty matrix S5 ## matrix[8, 8] S6; // mgcv smooth penalty matrix S6 ## matrix[8, 8] S7; // mgcv smooth penalty matrix S7 ## matrix[8, 8] S8; // mgcv smooth penalty matrix S8 ## int n_nonmissing; // number of nonmissing observations ## array[n_nonmissing] int flat_ys; // flattened nonmissing observations ## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations ## array[n_nonmissing] int obs_ind; // indices of nonmissing observations ## } ## parameters { ## // raw basis coefficients ## vector[num_basis] b_raw; ## ## // random effect variances ## vector[1] sigma_raw; ## ## // random effect means ## vector[1] mu_raw; ## ## // latent trend AR1 terms ## vector[n_series] ar1; ## ## // latent trend variance parameters ## vector[n_series] sigma; ## ## // latent trends ## matrix[n, n_series] trend; ## ## // smoothing parameters ## vector[n_sp] lambda; ## } ## transformed parameters { ## // basis coefficients ## vector[num_basis] b; ## b[1 : 65] = b_raw[1 : 65]; ## b[66 : 73] = mu_raw[1] + b_raw[66 : 73] * sigma_raw[1]; ## } ## model { ## // prior for random effect population variances ## sigma_raw ~ student_t(3, 0, 2.5); ## ## // prior for random effect population means ## mu_raw ~ std_normal(); ## ## // prior for (Intercept)... ## b_raw[1] ~ student_t(3, -2.3, 2.5); ## ## // prior for s(epiWeek):seriesBLAN_005... ## b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]); ## ## // prior for s(epiWeek):seriesBLAN_012... ## b_raw[10 : 17] ~ multi_normal_prec(zero[10 : 17], ## S2[1 : 8, 1 : 8] * lambda[2]); ## ## // prior for s(epiWeek):seriesSCBI_002... ## b_raw[18 : 25] ~ multi_normal_prec(zero[18 : 25], ## S3[1 : 8, 1 : 8] * lambda[3]); ## ## // prior for s(epiWeek):seriesSCBI_013... ## b_raw[26 : 33] ~ multi_normal_prec(zero[26 : 33], ## S4[1 : 8, 1 : 8] * lambda[4]); ## ## // prior for s(epiWeek):seriesSERC_001... ## b_raw[34 : 41] ~ multi_normal_prec(zero[34 : 41], ## S5[1 : 8, 1 : 8] * lambda[5]); ## ## // prior for s(epiWeek):seriesSERC_005... ## b_raw[42 : 49] ~ multi_normal_prec(zero[42 : 49], ## S6[1 : 8, 1 : 8] * lambda[6]); ## ## // prior for s(epiWeek):seriesSERC_006... ## b_raw[50 : 57] ~ multi_normal_prec(zero[50 : 57], ## S7[1 : 8, 1 : 8] * lambda[7]); ## ## // prior for s(epiWeek):seriesSERC_012... ## b_raw[58 : 65] ~ multi_normal_prec(zero[58 : 65], ## S8[1 : 8, 1 : 8] * lambda[8]); ## ## // prior (non-centred) for s(series)... ## b_raw[66 : 73] ~ std_normal(); ## ## // priors for AR parameters ## ar1 ~ std_normal(); ## ## // priors for smoothing parameters ## lambda ~ normal(5, 30); ## ## // priors for latent trend variance parameters ## sigma ~ student_t(3, 0, 2.5); ## ## // trend estimates ## trend[1, 1 : n_series] ~ normal(0, sigma); ## for (s in 1 : n_series) { ## trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]); ## } ## { ## // likelihood functions ## vector[n_nonmissing] flat_trends; ## flat_trends = to_vector(trend)[obs_ind]; ## flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0, ## append_row(b, 1.0)); ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## vector[n_sp] rho; ## vector[n_series] tau; ## array[n, n_series] int ypred; ## rho = log(lambda); ## for (s in 1 : n_series) { ## tau[s] = pow(sigma[s], -2.0); ## } ## ## // posterior predictions ## eta = X * b; ## for (s in 1 : n_series) { ## mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; ## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); ## } ## }"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"simulating-discrete-time-series","dir":"Articles","previous_headings":"","what":"Simulating discrete time series","title":"Forecasting and forecast evaluation in mvgam","text":"begin simulating data show forecasts computed evaluated mvgam. sim_mvgam() function can used simulate series come variety response distributions well seasonal patterns /dynamic temporal patterns. simulate collection three time count-valued series. series share seasonal pattern different temporal dynamics. setting trend_model = 'GP' prop_trend = 0.75, generating time series smooth underlying temporal trends (evolving Gaussian Processes squared exponential kernel) moderate seasonal patterns. observations Poisson-distributed allow 10% observations missing. returned object list containing training testing data (sim_mvgam() automatically splits data folds us) together information data generating process used simulate data series case shared seasonal pattern, can visualise: resulting time series similar might encounter dealing count-valued data can take small counts: individual series, can plot training testing data, well specific features observed data:","code":"set.seed(2345) simdat <- sim_mvgam(T = 100, n_series = 3, trend_model = 'GP', prop_trend = 0.75, family = poisson(), prop_missing = 0.10) str(simdat) ## List of 6 ## $ data_train :'data.frame': 225 obs. of 5 variables: ## ..$ y : int [1:225] 0 1 3 0 0 0 1 0 3 1 ... ## ..$ season: int [1:225] 1 1 1 2 2 2 3 3 3 4 ... ## ..$ year : int [1:225] 1 1 1 1 1 1 1 1 1 1 ... ## ..$ series: Factor w/ 3 levels \"series_1\",\"series_2\",..: 1 2 3 1 2 3 1 2 3 1 ... ## ..$ time : int [1:225] 1 1 1 2 2 2 3 3 3 4 ... ## $ data_test :'data.frame': 75 obs. of 5 variables: ## ..$ y : int [1:75] 0 1 1 0 0 0 2 2 0 NA ... ## ..$ season: int [1:75] 4 4 4 5 5 5 6 6 6 7 ... ## ..$ year : int [1:75] 7 7 7 7 7 7 7 7 7 7 ... ## ..$ series: Factor w/ 3 levels \"series_1\",\"series_2\",..: 1 2 3 1 2 3 1 2 3 1 ... ## ..$ time : int [1:75] 76 76 76 77 77 77 78 78 78 79 ... ## $ true_corrs : num [1:3, 1:3] 1 0.465 -0.577 0.465 1 ... ## $ true_trends : num [1:100, 1:3] -1.45 -1.54 -1.61 -1.67 -1.73 ... ## $ global_seasonality: num [1:100] 0.0559 0.6249 1.3746 1.6805 0.5246 ... ## $ trend_params :List of 2 ## ..$ alpha: num [1:3] 0.767 0.988 0.897 ## ..$ rho : num [1:3] 6.02 6.94 5.04 plot(simdat$global_seasonality[1:12], type = 'l', lwd = 2, ylab = 'Relative effect', xlab = 'Season', bty = 'l') plot_mvgam_series(data = simdat$data_train, series = 'all') plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test, series = 1) plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test, series = 2) plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test, series = 3)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"modelling-dynamics-with-splines","dir":"Articles","previous_headings":"Simulating discrete time series","what":"Modelling dynamics with splines","title":"Forecasting and forecast evaluation in mvgam","text":"first model fit uses shared cyclic spline capture repeated seasonality, well series-specific splines time capture long-term dynamics. allow temporal splines fairly complex can capture much temporal variation possible: model fits without issue: can plot partial effects splines see estimated highly nonlinear","code":"mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 8) + s(time, by = series, bs = 'cr', k = 20), knots = list(season = c(0.5, 12.5)), trend_model = 'None', data = simdat$data_train) summary(mod1, include_betas = FALSE) ## GAM formula: ## y ~ s(season, bs = \"cc\", k = 8) + s(time, by = series, bs = \"cr\", ## k = 20) ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## None ## ## N series: ## 3 ## ## N timepoints: ## 75 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) -0.41 -0.21 -0.039 1 813 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(season) 3.77 9.48 0.01603 * ## s(time):seriesseries_1 6.50 13.64 0.09218 . ## s(time):seriesseries_2 9.49 256.09 0.00021 *** ## s(time):seriesseries_3 5.93 16.79 0.04680 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:39:55 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(mod1, type = 'smooths')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"modelling-dynamics-with-gps","dir":"Articles","previous_headings":"Simulating discrete time series","what":"Modelling dynamics with GPs","title":"Forecasting and forecast evaluation in mvgam","text":"showing produce evaluate forecasts, fit second model data two models can compared. model equivalent , except now use Gaussian Processes model series-specific dynamics. makes use gp() function brms, can fit Hilbert space approximate GPs. See ?brms::gp details. summary model now contains information GP parameters time series: can plot posteriors parameters, parameter matter, using bayesplot routines. First marginal deviation (\\(\\alpha\\)) parameters: now length scale (\\(\\rho\\)) parameters: can also plot nonlinear effects : can also plotted using marginaleffects utilities: estimates temporal trends fairly similar two models, see produce similar forecasts","code":"mod2 <- mvgam(y ~ s(season, bs = 'cc', k = 8) + gp(time, by = series, c = 5/4, k = 20, scale = FALSE), knots = list(season = c(0.5, 12.5)), trend_model = 'None', data = simdat$data_train) summary(mod2, include_betas = FALSE) ## GAM formula: ## y ~ s(season, bs = \"cc\", k = 8) + gp(time, by = series, c = 5/4, ## k = 20, scale = FALSE) ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## None ## ## N series: ## 3 ## ## N timepoints: ## 75 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) -1.1 -0.52 0.31 1 768 ## ## GAM gp term marginal deviation (alpha) and length scale (rho) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## alpha_gp(time):seriesseries_1 0.21 0.8 2.1 1.01 763 ## alpha_gp(time):seriesseries_2 0.74 1.4 2.9 1.00 1028 ## alpha_gp(time):seriesseries_3 0.50 1.1 2.8 1.00 1026 ## rho_gp(time):seriesseries_1 1.20 5.1 23.0 1.00 681 ## rho_gp(time):seriesseries_2 2.20 10.0 17.0 1.00 644 ## rho_gp(time):seriesseries_3 1.50 8.8 23.0 1.00 819 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(season) 6 25 0.00016 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 4 of 2000 iterations ended with a divergence (0.2%) ## *Try running with larger adapt_delta to remove the divergences ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:40:58 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) mcmc_plot(mod2, variable = c('alpha_gp'), regex = TRUE, type = 'areas') mcmc_plot(mod2, variable = c('rho_gp'), regex = TRUE, type = 'areas') plot(mod2, type = 'smooths') require('ggplot2') plot_predictions(mod2, condition = c('time', 'series', 'series'), type = 'link') + theme(legend.position = 'none')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"forecasting-with-the-forecast-function","dir":"Articles","previous_headings":"","what":"Forecasting with the forecast() function","title":"Forecasting and forecast evaluation in mvgam","text":"Probabilistic forecasts can computed two main ways mvgam. first take model fit training data (two example models) produce temporal predictions posterior predictive distribution feeding newdata forecast() function. crucial newdata fed forecast() function follows sequentially data used fit model (internally checked package might headache data supplied specific time-order). calling forecast() function, option generate different kinds predictions (.e. predicting link scale, response scale produce expectations; see ?forecast.mvgam details). use default produce forecasts response scale, common way evaluate forecast distributions objects created class mvgam_forecast, contain information hindcast distributions, forecast distributions true observations series data: can plot forecasts series model using S3 plot method objects class: Clearly two models produce equivalent forecasts. come back scoring forecasts moment.","code":"fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) str(fc_mod1) ## List of 16 ## $ call :Class 'formula' language y ~ s(season, bs = \"cc\", k = 8) + s(time, by = series, bs = \"cr\", k = 20) ## .. ..- attr(*, \".Environment\")= ## $ trend_call : NULL ## $ family : chr \"poisson\" ## $ family_pars : NULL ## $ trend_model : chr \"None\" ## $ drift : logi FALSE ## $ use_lv : logi FALSE ## $ fit_engine : chr \"stan\" ## $ type : chr \"response\" ## $ series_names : Factor w/ 3 levels \"series_1\",\"series_2\",..: 1 2 3 ## $ train_observations:List of 3 ## ..$ series_1: int [1:75] 0 0 1 1 0 0 0 0 0 0 ... ## ..$ series_2: int [1:75] 1 0 0 1 1 0 1 0 1 2 ... ## ..$ series_3: int [1:75] 3 0 3 NA 2 1 1 1 1 3 ... ## $ train_times : int [1:75] 1 2 3 4 5 6 7 8 9 10 ... ## $ test_observations :List of 3 ## ..$ series_1: int [1:25] 0 0 2 NA 0 2 2 1 1 1 ... ## ..$ series_2: int [1:25] 1 0 2 1 1 3 0 1 0 NA ... ## ..$ series_3: int [1:25] 1 0 0 1 0 0 1 0 1 0 ... ## $ test_times : int [1:25] 76 77 78 79 80 81 82 83 84 85 ... ## $ hindcasts :List of 3 ## ..$ series_1: num [1:2000, 1:75] 1 1 0 0 0 1 1 1 0 0 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:75] \"ypred[1,1]\" \"ypred[2,1]\" \"ypred[3,1]\" \"ypred[4,1]\" ... ## ..$ series_2: num [1:2000, 1:75] 0 0 0 0 0 0 0 1 0 0 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:75] \"ypred[1,2]\" \"ypred[2,2]\" \"ypred[3,2]\" \"ypred[4,2]\" ... ## ..$ series_3: num [1:2000, 1:75] 3 0 2 1 0 1 2 1 5 1 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:75] \"ypred[1,3]\" \"ypred[2,3]\" \"ypred[3,3]\" \"ypred[4,3]\" ... ## $ forecasts :List of 3 ## ..$ series_1: num [1:2000, 1:25] 1 3 2 1 0 0 1 1 0 0 ... ## ..$ series_2: num [1:2000, 1:25] 6 0 0 0 0 2 0 0 0 0 ... ## ..$ series_3: num [1:2000, 1:25] 0 1 1 3 3 1 3 2 4 2 ... ## - attr(*, \"class\")= chr \"mvgam_forecast\" plot(fc_mod1, series = 1) ## Out of sample CRPS: ## [1] 14.62964 plot(fc_mod2, series = 1) ## Out of sample DRPS: ## [1] 10.92516 plot(fc_mod1, series = 2) ## Out of sample CRPS: ## [1] 84201962708 plot(fc_mod2, series = 2) ## Out of sample DRPS: ## [1] 14.31168 plot(fc_mod1, series = 3) ## Out of sample CRPS: ## [1] 32.44136 plot(fc_mod2, series = 3) ## Out of sample DRPS: ## [1] 15.44332"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"forecasting-with-newdata-in-mvgam","dir":"Articles","previous_headings":"","what":"Forecasting with newdata in mvgam()","title":"Forecasting and forecast evaluation in mvgam","text":"second way can produce forecasts mvgam feed testing data directly mvgam() function newdata. include testing data missing observations automatically predicted posterior predictive distribution using generated quantities block Stan. example, can refit mod2 include testing data automatic forecasts: model already contains forecast distribution, need feed newdata forecast() function: forecasts nearly identical calculated previously:","code":"mod2 <- mvgam(y ~ s(season, bs = 'cc', k = 8) + gp(time, by = series, c = 5/4, k = 20, scale = FALSE), knots = list(season = c(0.5, 12.5)), trend_model = 'None', data = simdat$data_train, newdata = simdat$data_test) fc_mod2 <- forecast(mod2) plot(fc_mod2, series = 1) ## Out of sample DRPS: ## [1] 10.85762"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"scoring-forecast-distributions","dir":"Articles","previous_headings":"","what":"Scoring forecast distributions","title":"Forecasting and forecast evaluation in mvgam","text":"primary purpose mvgam_forecast class readily allow forecast evaluations series data, using variety possible scoring functions. See ?mvgam::score.mvgam_forecast view types scores available. useful scoring metric Continuous Rank Probability Score (CRPS). CRPS value similar might get calculated weighted absolute error using full forecast distribution. returned list contains data.frame series data shows CRPS score evaluation testing data, along useful information fit forecast distribution. particular, given logical value (1s 0s) telling us whether true value within pre-specified credible interval (.e. coverage forecast distribution). default interval width 0.9, hope values in_interval column take 1 approximately 90% time. value can changed wish compute different coverages, say using 60% interval: can also compare forecasts sample observations using Expected Log Predictive Density (ELPD; also known log score). ELPD strictly proper scoring rule can applied distributional forecast, compute need predictions link scale rather outcome scale. advantageous change type prediction can get using forecast() function: Finally, multiple time series may also make sense use multivariate proper scoring rule. mvgam offers two options: Energy score Variogram score. first penalizes forecast distributions less well calibrated truth, second penalizes forecasts capture observed true correlation structure. score use depends goals, easy compute: returned object still provides information interval coverage individual series, single score per horizon now (provided all_series slot): can use score(s) choice compare different models. example, can compute plot difference CRPS scores series data. , negative value means Gaussian Process model (mod2) better, positive value means spline model (mod1) better. GP model consistently gives better forecasts, difference scores grows quickly forecast horizon increases. unexpected given way splines linearly extrapolate outside range training data","code":"crps_mod1 <- score(fc_mod1, score = 'crps') str(crps_mod1) ## List of 4 ## $ series_1 :'data.frame': 25 obs. of 5 variables: ## ..$ score : num [1:25] 0.1938 0.1366 1.355 NA 0.0348 ... ## ..$ in_interval : num [1:25] 1 1 1 NA 1 1 1 1 1 1 ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ score_type : chr [1:25] \"crps\" \"crps\" \"crps\" \"crps\" ... ## $ series_2 :'data.frame': 25 obs. of 5 variables: ## ..$ score : num [1:25] 0.379 0.306 0.941 0.5 0.573 ... ## ..$ in_interval : num [1:25] 1 1 1 1 1 1 1 1 1 NA ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ score_type : chr [1:25] \"crps\" \"crps\" \"crps\" \"crps\" ... ## $ series_3 :'data.frame': 25 obs. of 5 variables: ## ..$ score : num [1:25] 0.32 0.556 0.379 0.362 0.219 ... ## ..$ in_interval : num [1:25] 1 1 1 1 1 1 1 1 1 1 ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ score_type : chr [1:25] \"crps\" \"crps\" \"crps\" \"crps\" ... ## $ all_series:'data.frame': 25 obs. of 3 variables: ## ..$ score : num [1:25] 0.892 0.999 2.675 NA 0.827 ... ## ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ score_type : chr [1:25] \"sum_crps\" \"sum_crps\" \"sum_crps\" \"sum_crps\" ... crps_mod1$series_1 ## score in_interval interval_width eval_horizon score_type ## 1 0.19375525 1 0.9 1 crps ## 2 0.13663925 1 0.9 2 crps ## 3 1.35502175 1 0.9 3 crps ## 4 NA NA 0.9 4 crps ## 5 0.03482775 1 0.9 5 crps ## 6 1.55416700 1 0.9 6 crps ## 7 1.51028900 1 0.9 7 crps ## 8 0.62121225 1 0.9 8 crps ## 9 0.62630125 1 0.9 9 crps ## 10 0.59853100 1 0.9 10 crps ## 11 1.30998625 1 0.9 11 crps ## 12 2.04829775 1 0.9 12 crps ## 13 0.61251800 1 0.9 13 crps ## 14 0.14052300 1 0.9 14 crps ## 15 0.65110800 1 0.9 15 crps ## 16 0.07973125 1 0.9 16 crps ## 17 0.07675600 1 0.9 17 crps ## 18 0.09382375 1 0.9 18 crps ## 19 0.12356725 1 0.9 19 crps ## 20 NA NA 0.9 20 crps ## 21 0.20173600 1 0.9 21 crps ## 22 0.84066825 1 0.9 22 crps ## 23 NA NA 0.9 23 crps ## 24 1.06489225 1 0.9 24 crps ## 25 0.75528825 1 0.9 25 crps crps_mod1 <- score(fc_mod1, score = 'crps', interval_width = 0.6) crps_mod1$series_1 ## score in_interval interval_width eval_horizon score_type ## 1 0.19375525 1 0.6 1 crps ## 2 0.13663925 1 0.6 2 crps ## 3 1.35502175 0 0.6 3 crps ## 4 NA NA 0.6 4 crps ## 5 0.03482775 1 0.6 5 crps ## 6 1.55416700 0 0.6 6 crps ## 7 1.51028900 0 0.6 7 crps ## 8 0.62121225 1 0.6 8 crps ## 9 0.62630125 1 0.6 9 crps ## 10 0.59853100 1 0.6 10 crps ## 11 1.30998625 0 0.6 11 crps ## 12 2.04829775 0 0.6 12 crps ## 13 0.61251800 1 0.6 13 crps ## 14 0.14052300 1 0.6 14 crps ## 15 0.65110800 1 0.6 15 crps ## 16 0.07973125 1 0.6 16 crps ## 17 0.07675600 1 0.6 17 crps ## 18 0.09382375 1 0.6 18 crps ## 19 0.12356725 1 0.6 19 crps ## 20 NA NA 0.6 20 crps ## 21 0.20173600 1 0.6 21 crps ## 22 0.84066825 1 0.6 22 crps ## 23 NA NA 0.6 23 crps ## 24 1.06489225 1 0.6 24 crps ## 25 0.75528825 1 0.6 25 crps link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = 'link') score(link_mod1, score = 'elpd')$series_1 ## score eval_horizon score_type ## 1 -0.5304414 1 elpd ## 2 -0.4298955 2 elpd ## 3 -2.9617583 3 elpd ## 4 NA 4 elpd ## 5 -0.2007644 5 elpd ## 6 -3.3781408 6 elpd ## 7 -3.2729088 7 elpd ## 8 -2.0363750 8 elpd ## 9 -2.0670612 9 elpd ## 10 -2.0844818 10 elpd ## 11 -3.0576463 11 elpd ## 12 -3.6291058 12 elpd ## 13 -2.1692669 13 elpd ## 14 -0.2960899 14 elpd ## 15 -2.3738851 15 elpd ## 16 -0.2160804 16 elpd ## 17 -0.2036782 17 elpd ## 18 -0.2115539 18 elpd ## 19 -0.2235072 19 elpd ## 20 NA 20 elpd ## 21 -0.2413680 21 elpd ## 22 -2.6791984 22 elpd ## 23 NA 23 elpd ## 24 -2.6851981 24 elpd ## 25 -0.2836901 25 elpd energy_mod2 <- score(fc_mod2, score = 'energy') str(energy_mod2) ## List of 4 ## $ series_1 :'data.frame': 25 obs. of 3 variables: ## ..$ in_interval : num [1:25] 1 1 1 NA 1 1 1 1 1 1 ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## $ series_2 :'data.frame': 25 obs. of 3 variables: ## ..$ in_interval : num [1:25] 1 1 1 1 1 1 1 1 1 NA ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## $ series_3 :'data.frame': 25 obs. of 3 variables: ## ..$ in_interval : num [1:25] 1 1 1 1 1 1 1 1 1 1 ... ## ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ... ## ..$ eval_horizon : int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## $ all_series:'data.frame': 25 obs. of 3 variables: ## ..$ score : num [1:25] 0.773 1.147 1.226 NA 0.458 ... ## ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ score_type : chr [1:25] \"energy\" \"energy\" \"energy\" \"energy\" ... energy_mod2$all_series ## score eval_horizon score_type ## 1 0.7728517 1 energy ## 2 1.1469836 2 energy ## 3 1.2258781 3 energy ## 4 NA 4 energy ## 5 0.4577536 5 energy ## 6 1.8094487 6 energy ## 7 1.4887317 7 energy ## 8 0.7651593 8 energy ## 9 1.1180634 9 energy ## 10 NA 10 energy ## 11 1.5008324 11 energy ## 12 3.2142460 12 energy ## 13 1.6129732 13 energy ## 14 1.2704438 14 energy ## 15 1.1335958 15 energy ## 16 1.8717420 16 energy ## 17 NA 17 energy ## 18 0.7953392 18 energy ## 19 0.9919119 19 energy ## 20 NA 20 energy ## 21 1.2461964 21 energy ## 22 1.5170615 22 energy ## 23 NA 23 energy ## 24 2.3824552 24 energy ## 25 1.5314557 25 energy crps_mod1 <- score(fc_mod1, score = 'crps') crps_mod2 <- score(fc_mod2, score = 'crps') diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(CRPS[GP]~-~CRPS[spline])) abline(h = 0, lty = 'dashed', lwd = 2) gp_better <- length(which(diff_scores < 0)) title(main = paste0('GP better in ', gp_better, ' of 25 evaluations', '\\nMean difference = ', round(mean(diff_scores, na.rm = TRUE), 2))) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(CRPS[GP]~-~CRPS[spline])) abline(h = 0, lty = 'dashed', lwd = 2) gp_better <- length(which(diff_scores < 0)) title(main = paste0('GP better in ', gp_better, ' of 25 evaluations', '\\nMean difference = ', round(mean(diff_scores, na.rm = TRUE), 2))) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(CRPS[GP]~-~CRPS[spline])) abline(h = 0, lty = 'dashed', lwd = 2) gp_better <- length(which(diff_scores < 0)) title(main = paste0('GP better in ', gp_better, ' of 25 evaluations', '\\nMean difference = ', round(mean(diff_scores, na.rm = TRUE), 2)))"},{"path":"https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html","id":"further-reading","dir":"Articles","previous_headings":"","what":"Further reading","title":"Forecasting and forecast evaluation in mvgam","text":"following papers resources offer useful material Bayesian forecasting proper scoring rules: Hyndman, Rob J., George Athanasopoulos. Forecasting: principles practice. OTexts, 2018. Gneiting, Tilmann, Adrian E. Raftery. Strictly proper scoring rules, prediction, estimation Journal American statistical Association 102.477 (2007) 359-378. Simonis, Juniper L., Ethan P. White, SK Morgan Ernest. Evaluating probabilistic ecological forecasts Ecology 102.8 (2021) e03431.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"dynamic-gams","dir":"Articles","previous_headings":"","what":"Dynamic GAMs","title":"Overview of the mvgam package","text":"mvgam designed propagate unobserved temporal processes capture latent dynamics observed time series. works state-space format, temporal trend evolving independently observation process. introduction package worked examples also shown seminar: Ecological Forecasting Dynamic Generalized Additive Models. Briefly, assume \\(\\tilde{\\boldsymbol{y}}_{,t}\\) conditional expectation response variable \\(\\boldsymbol{}\\) time \\(\\boldsymbol{t}\\). Assuming \\(\\boldsymbol{y_i}\\) drawn exponential distribution invertible link function, linear predictor multivariate Dynamic GAM can written : \\[~~~1:N_{series}~...\\] \\[~t~~1:N_{timepoints}~...\\] \\[g^{-1}(\\tilde{\\boldsymbol{y}}_{,t})=\\alpha_{}+\\sum\\limits_{j=1}^J\\boldsymbol{s}_{,j,t}\\boldsymbol{x}_{j,t}+\\boldsymbol{z}_{,t}\\,,\\] \\(\\alpha\\) unknown intercepts, \\(\\boldsymbol{s}\\)’s unknown smooth functions covariates (\\(\\boldsymbol{x}\\)’s), can potentially vary among response series, \\(\\boldsymbol{z}\\) dynamic latent processes. smooth function \\(\\boldsymbol{s_j}\\) composed basis expansions whose coefficients, must estimated, control functional relationship \\(\\boldsymbol{x}_{j}\\) \\(g^{-1}(\\tilde{\\boldsymbol{y}})\\). size basis expansion limits smooth’s potential complexity. larger set basis functions allows greater flexibility. Several advantages GAMs can model diversity response families, including discrete distributions (.e. Poisson, Negative Binomial, Gamma) accommodate common ecological features zero-inflation overdispersion, can formulated include hierarchical smoothing multivariate responses. mvgam supports number different observation families, summarized :","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"supported-observation-families","dir":"Articles","previous_headings":"","what":"Supported observation families","title":"Overview of the mvgam package","text":"supported observation families, extra parameters need estimated (.e. \\(\\sigma\\) Gaussian model \\(\\phi\\) Negative Binomial model) estimated independently series. Note default link functions currently changed mvgam.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"supported-temporal-dynamic-processes","dir":"Articles","previous_headings":"","what":"Supported temporal dynamic processes","title":"Overview of the mvgam package","text":"dynamic processes can take wide variety forms, can multivariate allow different time series interact correlated. using mvgam() function, user chooses different process models trend_model argument. Available process models described detail .","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"independent-random-walks","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Independent Random Walks","title":"Overview of the mvgam package","text":"Use trend_model = 'RW' trend_model = RW() set model series data independent latent temporal dynamics form: \\[\\begin{align*} z_{,t} & \\sim \\text{Normal}(z_{,t-1}, \\sigma_i) \\end{align*}\\] Process error parameters \\(\\sigma\\) modeled independently series. moving average process required, use trend_model = RW(ma = TRUE) set following: \\[\\begin{align*} z_{,t} & = z_{,t-1} + \\theta_i * error_{,t-1} + error_{,t} \\\\ error_{,t} & \\sim \\text{Normal}(0, \\sigma_i) \\end{align*}\\] Moving average coefficients \\(\\theta\\) independently estimated series forced stationary default \\((abs(\\theta)<1)\\). moving averages order \\(q=1\\) currently allowed.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"multivariate-random-walks","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Multivariate Random Walks","title":"Overview of the mvgam package","text":"one series included data \\((N_{series} > 1)\\), multivariate Random Walk can set using trend_model = RW(cor = TRUE), resulting following: \\[\\begin{align*} z_{t} & \\sim \\text{MVNormal}(z_{t-1}, \\Sigma) \\end{align*}\\] latent process estimate \\(z_t\\) now takes form vector. covariance matrix \\(\\Sigma\\) capture contemporaneously correlated process errors. parameterised using Cholesky factorization, requires priors series-level variances \\(\\sigma\\) strength correlations using Stan’s lkj_corr_cholesky distribution. Moving average terms can also included multivariate random walks, case moving average coefficients \\(\\theta\\) parameterised \\(N_{series} * N_{series}\\) matrix","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"autoregressive-processes","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Autoregressive processes","title":"Overview of the mvgam package","text":"Autoregressive models \\(p=3\\), autoregressive coefficients estimated independently series, can used specifying trend_model = 'AR1', trend_model = 'AR2', trend_model = 'AR3', trend_model = AR(p = 1, 2, 3). example, univariate AR(1) model takes form: \\[\\begin{align*} z_{,t} & \\sim \\text{Normal}(ar1_i * z_{,t-1}, \\sigma_i) \\end{align*}\\] options Random Walks, additional options available placing priors autoregressive coefficients. default, coefficients forced stationarity, users can impose restriction changing upper lower bounds priors. See ?get_mvgam_priors details.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"vector-autoregressive-processes","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Vector Autoregressive processes","title":"Overview of the mvgam package","text":"Vector Autoregression order \\(p=1\\) can specified \\(N_{series} > 1\\) using trend_model = 'VAR1' trend_model = VAR(). VAR(1) model takes form: \\[\\begin{align*} z_{t} & \\sim \\text{Normal}(* z_{t-1}, \\Sigma) \\end{align*}\\] \\(\\) \\(N_{series} * N_{series}\\) matrix autoregressive coefficients diagonals capture lagged self-dependence (.e. effect process time \\(t\\) estimate time \\(t+1\\)), -diagonals capture lagged cross-dependence (.e. effect process time \\(t\\) process another series time \\(t+1\\)). default, covariance matrix \\(\\Sigma\\) assume process error covariance fixing -diagonals \\(0\\). allow correlated errors, use trend_model = 'VAR1cor' trend_model = VAR(cor = TRUE). moving average order \\(q=1\\) can also included using trend_model = VAR(ma = TRUE, cor = TRUE). Note VAR models, stationarity process enforced structured prior distribution described detail Heaps 2022 Heaps, Sarah E. “Enforcing stationarity prior vector autoregressions.” Journal Computational Graphical Statistics 32.1 (2023): 74-83.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"gaussian-processes","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Gaussian Processes","title":"Overview of the mvgam package","text":"final option modelling temporal dynamics use Gaussian Process squared exponential kernel. set independently series (currently multivariate GP option), using trend_model = 'GP'. dynamics latent process modelled : \\[\\begin{align*} z & \\sim \\text{MVNormal}(0, \\Sigma_{error}) \\\\ \\Sigma_{error}[t_i, t_j] & = \\alpha^2 * exp(-0.5 * ((|t_i - t_j| / \\rho))^2) \\end{align*}\\] latent dynamic process evolves complex, high-dimensional Multivariate Normal distribution depends \\(\\rho\\) (often called length scale parameter) control quickly correlations model’s errors decay function time. models, covariance decays exponentially fast squared distance (time) observations. functions also depend parameter \\(\\alpha\\), controls marginal variability temporal function points; words controls much GP term contributes linear predictor. mvgam capitalizes advances allow GPs approximated using Hilbert space basis functions, considerably speed computation little cost accuracy prediction performance.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"piecewise-logistic-and-linear-trends","dir":"Articles","previous_headings":"Supported temporal dynamic processes","what":"Piecewise logistic and linear trends","title":"Overview of the mvgam package","text":"Modeling growth many types time series often similar modeling population growth natural ecosystems, series exhibits nonlinear growth saturates particular carrying capacity. logistic trend model available {mvgam} allows time-varying capacity \\(C(t)\\) well non-constant growth rate. Changes base growth rate \\(k\\) incorporated explicitly defining changepoints throughout training period growth rate allowed vary. changepoint vector \\(\\) represented vector 1s 0s, rate growth time \\(t\\) represented \\(k+(t)^T\\delta\\). Potential changepoints selected uniformly across training period, number changepoints, well flexibility potential rate changes changepoints, can controlled using trend_model = PW(). full piecewise logistic growth model : \\[\\begin{align*} z_t & = \\frac{C_t}{1 + \\exp(-(k+(t)^T\\delta)(t-(m+(t)^T\\gamma)))} \\end{align*}\\] time series appear exhibit saturating growth, piece-wise constant rate growth can often provide useful trend model. piecewise linear trend defined : \\[\\begin{align*} z_t & = (k+(t)^T\\delta)t + (m+(t)^T\\gamma) \\end{align*}\\] trend models, \\(m\\) offset parameter controls trend intercept. parameter, recommended include intercept observation formula identifiable. can read full description piecewise linear logistic trends paper Taylor Letham. Sean J. Taylor Benjamin Letham. “Forecasting scale.” American Statistician 72.1 (2018): 37-45.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"regression-formulae","dir":"Articles","previous_headings":"","what":"Regression formulae","title":"Overview of the mvgam package","text":"mvgam supports observation model regression formula, built mvgcv package, well optional process model regression formula. formulae supplied exactly like supplied glm() except smooth terms, s(), te(), ti() t2(), time-varying effects using dynamic(), monotonically increasing (using s(x, bs = 'moi')) decreasing splines (using s(x, bs = 'mod'); see ?smooth.construct.moi.smooth.spec details), well Gaussian Process functions using gp(), can added right hand side (. supported mvgam formulae). See ?mvgam_formulae guidance. setting State-Space models, optional process model formula can used (see State-Space model vignette shared latent states vignette guidance using trend formulae).","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"example-time-series-data","dir":"Articles","previous_headings":"","what":"Example time series data","title":"Overview of the mvgam package","text":"‘portal_data’ object contains time series rodent captures Portal Project, long-term monitoring study based near town Portal, Arizona. Researchers operating standardized set baited traps within 24 experimental plots site since 1970’s. Sampling follows lunar monthly cycle, observations occurring average 28 days apart. However, missing observations occur due difficulties accessing site (weather events, COVID disruptions etc…). can read full sampling protocol preprint Ernest et al Biorxiv. data come pre-loaded mvgam package, can read little help page using ?portal_data. working data, important inspect data structured, first using head: glimpse function dplyr also useful understanding variables structured focus analyses time series captures one specific rodent species, Desert Pocket Mouse Chaetodipus penicillatus. species interesting goes kind “hibernation” colder months, leading low captures winter period","code":"data(\"portal_data\") head(portal_data) ## moon DM DO PP OT year month mintemp precipitation ndvi ## 1 329 10 6 0 2 2004 1 -9.710 37.8 1.465889 ## 2 330 14 8 1 0 2004 2 -5.924 8.7 1.558507 ## 3 331 9 1 2 1 2004 3 -0.220 43.5 1.337817 ## 4 332 NA NA NA NA 2004 4 1.931 23.9 1.658913 ## 5 333 15 8 10 1 2004 5 6.568 0.9 1.853656 ## 6 334 NA NA NA NA 2004 6 11.590 1.4 1.761330 dplyr::glimpse(portal_data) ## Rows: 199 ## Columns: 10 ## $ moon 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 3… ## $ DM 10, 14, 9, NA, 15, NA, NA, 9, 5, 8, NA, 14, 7, NA, NA, 9… ## $ DO 6, 8, 1, NA, 8, NA, NA, 3, 3, 4, NA, 3, 8, NA, NA, 3, NA… ## $ PP 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 1… ## $ OT 2, 0, 1, NA, 1, NA, NA, 1, 0, 0, NA, 2, 1, NA, NA, 1, NA… ## $ year 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20… ## $ month 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,… ## $ mintemp -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16… ## $ precipitation 37.8, 8.7, 43.5, 23.9, 0.9, 1.4, 20.3, 91.0, 60.5, 25.2,… ## $ ndvi 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1…"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"manipulating-data-for-modelling","dir":"Articles","previous_headings":"","what":"Manipulating data for modelling","title":"Overview of the mvgam package","text":"Manipulating data ‘long’ format necessary modelling mvgam. ‘long’ format, mean series x time observation needs entry dataframe list object wish use data modelling. simple example can viewed simulating data using sim_mvgam function. See ?sim_mvgam details Notice four different time series simulated data, spread outcome values different columns. Rather, single column outcome variable, labelled y simulated data. also must supply variable labelled time ensure modelling software knows arrange time series building models. setup still allows us formulate multivariate time series models, can see State-Space vignette. steps needed shape portal_data object correct form. First, create time variable, select column representing counts target species (PP), select appropriate variables can use predictors data now contain six variables:series, factor indexing time series observation belongs toyear, year samplingtime, indicator time step observation belongs tocount, response variable representing number captures species PP sampling observationmintemp, monthly average minimum temperature time stepndvi, monthly average Normalized Difference Vegetation Index time step Now check data structure can also summarize multiple variables, helpful search data ranges identify missing values NAs response variable count. Let’s visualize data heatmap get sense distributed (NAs shown red bars plot) observations generally thrown modelling packages . see work tutorials, mvgam keeps data predictions can automatically returned full dataset. time series descriptive features can plotted using plot_mvgam_series():","code":"data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ## y season year series time ## 1 1 1 1 series_1 1 ## 2 3 1 1 series_2 1 ## 3 2 1 1 series_3 1 ## 4 1 1 1 series_4 1 ## 5 0 2 1 series_1 2 ## 6 1 2 1 series_2 2 ## 7 1 2 1 series_3 2 ## 8 0 2 1 series_4 2 ## 9 0 3 1 series_1 3 ## 10 1 3 1 series_2 3 ## 11 0 3 1 series_3 3 ## 12 1 3 1 series_4 3 portal_data %>% # mvgam requires a 'time' variable be present in the data to index # the temporal observations. This is especially important when tracking # multiple time series. In the Portal data, the 'moon' variable indexes the # lunar monthly timestep of the trapping sessions dplyr::mutate(time = moon - (min(moon)) + 1) %>% # We can also provide a more informative name for the outcome variable, which # is counts of the 'PP' species (Chaetodipus penicillatus) across all control # plots dplyr::mutate(count = PP) %>% # The other requirement for mvgam is a 'series' variable, which needs to be a # factor variable to index which time series each row in the data belongs to. # Again, this is more useful when you have multiple time series in the data dplyr::mutate(series = as.factor('PP')) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi) -> model_data head(model_data) ## series year time count mintemp ndvi ## 1 PP 2004 1 0 -9.710 1.465889 ## 2 PP 2004 2 1 -5.924 1.558507 ## 3 PP 2004 3 2 -0.220 1.337817 ## 4 PP 2004 4 NA 1.931 1.658913 ## 5 PP 2004 5 10 6.568 1.853656 ## 6 PP 2004 6 NA 11.590 1.761330 dplyr::glimpse(model_data) ## Rows: 199 ## Columns: 6 ## $ series PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP… ## $ year 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20… ## $ time 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,… ## $ count 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA,… ## $ mintemp -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520, … ## $ ndvi 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.76132… summary(model_data) ## series year time count mintemp ## PP:199 Min. :2004 Min. : 1.0 Min. : 0.00 Min. :-24.000 ## 1st Qu.:2008 1st Qu.: 50.5 1st Qu.: 2.50 1st Qu.: -3.884 ## Median :2012 Median :100.0 Median :12.00 Median : 2.130 ## Mean :2012 Mean :100.0 Mean :15.14 Mean : 3.504 ## 3rd Qu.:2016 3rd Qu.:149.5 3rd Qu.:24.00 3rd Qu.: 12.310 ## Max. :2020 Max. :199.0 Max. :65.00 Max. : 18.140 ## NA's :36 ## ndvi ## Min. :0.2817 ## 1st Qu.:1.0741 ## Median :1.3501 ## Mean :1.4709 ## 3rd Qu.:1.8178 ## Max. :3.9126 ## image(is.na(t(model_data %>% dplyr::arrange(dplyr::desc(time)))), axes = F, col = c('grey80', 'darkred')) axis(3, at = seq(0,1, len = NCOL(model_data)), labels = colnames(model_data)) plot_mvgam_series(data = model_data, series = 1, y = 'count')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"glms-with-temporal-random-effects","dir":"Articles","previous_headings":"","what":"GLMs with temporal random effects","title":"Overview of the mvgam package","text":"first task fit Generalized Linear Model (GLM) can adequately capture features count observations (integer data, lower bound zero, missing values) also attempting model temporal variation. almost ready fit first model, GLM Poisson observations, log link function random (hierarchical) intercepts year. allow us capture prior belief , although year unique, sampled population effects, years connected thus might contain valuable information one another. done capitalizing partial pooling properties hierarchical models. Hierarchical (also known random) effects offer many advantages modelling data grouping structures (.e. multiple species, locations, years etc…). ability incorporate time series models huge advantage traditional models ARIMA Exponential Smoothing. fit model, need convert year factor can use random effect basis mvgam. See ?smooth.terms ?smooth.construct.re.smooth.spec details re basis construction used mvgam mgcv Preview dataset ensure year now factor unique factor level year data now ready first mvgam model. syntax familiar users previously built models mgcv. refresher, see ?formula.gam examples ?gam. Random effects can specified using s wrapper re basis. Note can also suppress primary intercept using usual R formula syntax - 1. mvgam number possible observation families can used, see ?mvgam_families information. use Stan fitting engine, deploys Hamiltonian Monte Carlo (HMC) full Bayesian inference. default, 4 HMC chains run using warmup 500 iterations collecting 500 posterior samples chain. package also aim use Cmdstan backend possible, recommended users --date installation Cmdstan associated cmdstanr interface machines (note can set backend using backend argument: see ?mvgam details). Interested users consult Stan user’s guide information software enormous variety models can tackled HMC. model can described mathematically timepoint \\(t\\) follows: \\[\\begin{align*} \\boldsymbol{count}_t & \\sim \\text{Poisson}(\\lambda_t) \\\\ log(\\lambda_t) & = \\beta_{year[year_t]} \\\\ \\beta_{year} & \\sim \\text{Normal}(\\mu_{year}, \\sigma_{year}) \\end{align*}\\] \\(\\beta_{year}\\) effects drawn population distribution parameterized common mean \\((\\mu_{year})\\) variance \\((\\sigma_{year})\\). Priors model parameters can interrogated changed using similar functionality options available brms. example, default priors \\((\\mu_{year})\\) \\((\\sigma_{year})\\) can viewed using following code: See examples ?get_mvgam_priors find different ways priors can altered. model finished, first step inspect summary ensure major diagnostic warnings produced quickly summarise posterior distributions key parameters diagnostic messages bottom summary show HMC sampler encounter problems difficult posterior spaces. good sign. Posterior distributions model parameters can extracted way object class brmsfit can (see ?mvgam::mvgam_draws details). example, can extract coefficients related GAM linear predictor (.e. \\(\\beta\\)’s) data.frame using: model fitted mvgam, underlying Stan code can viewed using code function:","code":"model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data dplyr::glimpse(model_data) ## Rows: 199 ## Columns: 7 ## $ series PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, P… ## $ year 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2… ## $ time 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18… ## $ count 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA… ## $ mintemp -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520,… ## $ ndvi 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.7613… ## $ year_fac 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2… levels(model_data$year_fac) ## [1] \"2004\" \"2005\" \"2006\" \"2007\" \"2008\" \"2009\" \"2010\" \"2011\" \"2012\" \"2013\" ## [11] \"2014\" \"2015\" \"2016\" \"2017\" \"2018\" \"2019\" \"2020\" model1 <- mvgam(count ~ s(year_fac, bs = 're') - 1, family = poisson(), data = model_data) get_mvgam_priors(count ~ s(year_fac, bs = 're') - 1, family = poisson(), data = model_data) ## param_name param_length param_info ## 1 vector[1] mu_raw; 1 s(year_fac) pop mean ## 2 vector[1] sigma_raw; 1 s(year_fac) pop sd ## prior example_change ## 1 mu_raw ~ std_normal(); mu_raw ~ normal(0.36, 0.77); ## 2 sigma_raw ~ student_t(3, 0, 2.5); sigma_raw ~ exponential(0.55); ## new_lowerbound new_upperbound ## 1 NA NA ## 2 NA NA summary(model1) ## GAM formula: ## count ~ s(year_fac, bs = \"re\") - 1 ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## None ## ## N series: ## 1 ## ## N timepoints: ## 199 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## s(year_fac).1 1.80 2.1 2.3 1.00 2955 ## s(year_fac).2 2.50 2.7 2.8 1.00 3205 ## s(year_fac).3 3.00 3.1 3.2 1.00 2966 ## s(year_fac).4 3.10 3.3 3.4 1.00 2869 ## s(year_fac).5 1.90 2.1 2.3 1.00 3059 ## s(year_fac).6 1.50 1.8 2.0 1.00 2470 ## s(year_fac).7 1.80 2.0 2.3 1.00 2527 ## s(year_fac).8 2.80 3.0 3.1 1.00 3373 ## s(year_fac).9 3.10 3.3 3.4 1.00 2748 ## s(year_fac).10 2.60 2.8 2.9 1.00 2731 ## s(year_fac).11 3.00 3.1 3.2 1.00 3316 ## s(year_fac).12 3.10 3.2 3.3 1.00 2637 ## s(year_fac).13 2.00 2.2 2.5 1.00 2355 ## s(year_fac).14 2.50 2.6 2.8 1.00 3042 ## s(year_fac).15 1.90 2.2 2.4 1.00 2649 ## s(year_fac).16 1.90 2.1 2.3 1.00 3010 ## s(year_fac).17 -0.35 1.1 1.9 1.01 336 ## ## GAM group-level estimates: ## 2.5% 50% 97.5% Rhat n_eff ## mean(s(year_fac)) 2.00 2.40 2.8 1.05 83 ## sd(s(year_fac)) 0.45 0.67 1.2 1.03 174 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(year_fac) 13.5 24694 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhats above 1.05 found for 1 parameters ## *Diagnose further to investigate why the chains have not mixed ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:42:16 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) beta_post <- as.data.frame(model1, variable = 'betas') dplyr::glimpse(beta_post) ## Rows: 2,000 ## Columns: 17 ## $ `s(year_fac).1` 2.02205, 2.16924, 2.10487, 2.13665, 1.86345, 2.24392,… ## $ `s(year_fac).2` 2.73095, 2.71062, 2.75107, 2.65906, 2.80667, 2.54545,… ## $ `s(year_fac).3` 3.18207, 2.95422, 3.11397, 3.06990, 3.11219, 2.99228,… ## $ `s(year_fac).4` 3.21977, 3.13672, 3.16082, 3.15234, 3.26815, 3.32370,… ## $ `s(year_fac).5` 2.20712, 2.01768, 2.22813, 2.09890, 2.30518, 1.99217,… ## $ `s(year_fac).6` 1.81741, 1.73328, 1.86433, 1.70224, 1.84045, 1.74424,… ## $ `s(year_fac).7` 2.05394, 1.88082, 1.88431, 2.09144, 1.90318, 2.16052,… ## $ `s(year_fac).8` 2.97590, 2.85870, 2.99140, 2.96427, 3.05537, 2.89148,… ## $ `s(year_fac).9` 3.04001, 3.34879, 3.29032, 3.25937, 3.21418, 3.28780,… ## $ `s(year_fac).10` 2.71676, 2.74633, 2.85746, 2.89187, 2.55747, 2.94397,… ## $ `s(year_fac).11` 3.04156, 3.02516, 3.05404, 3.08662, 3.11559, 3.01671,… ## $ `s(year_fac).12` 3.16241, 3.20121, 3.16180, 3.24250, 3.21637, 3.09950,… ## $ `s(year_fac).13` 1.97840, 2.22098, 2.25286, 2.24841, 2.18589, 2.26861,… ## $ `s(year_fac).14` 2.42902, 2.59880, 2.59650, 2.74709, 2.54397, 2.69848,… ## $ `s(year_fac).15` 2.02345, 2.19903, 2.24971, 2.06536, 2.10172, 2.25418,… ## $ `s(year_fac).16` 2.06523, 2.00776, 2.01068, 1.98364, 1.91786, 2.20502,… ## $ `s(year_fac).17` 0.5891590, 0.6284760, 1.3589400, 1.0220400, -0.613518… code(model1) ## // Stan model code generated by package mvgam ## data { ## int total_obs; // total number of observations ## int n; // number of timepoints per series ## int n_series; // number of series ## int num_basis; // total number of basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## int n_nonmissing; // number of nonmissing observations ## array[n_nonmissing] int flat_ys; // flattened nonmissing observations ## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations ## array[n_nonmissing] int obs_ind; // indices of nonmissing observations ## } ## parameters { ## // raw basis coefficients ## vector[num_basis] b_raw; ## ## // random effect variances ## vector[1] sigma_raw; ## ## // random effect means ## vector[1] mu_raw; ## } ## transformed parameters { ## // basis coefficients ## vector[num_basis] b; ## b[1 : 17] = mu_raw[1] + b_raw[1 : 17] * sigma_raw[1]; ## } ## model { ## // prior for random effect population variances ## sigma_raw ~ student_t(3, 0, 2.5); ## ## // prior for random effect population means ## mu_raw ~ std_normal(); ## ## // prior (non-centred) for s(year_fac)... ## b_raw[1 : 17] ~ std_normal(); ## { ## // likelihood functions ## flat_ys ~ poisson_log_glm(flat_xs, 0.0, b); ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## array[n, n_series] int ypred; ## ## // posterior predictions ## eta = X * b; ## for (s in 1 : n_series) { ## mus[1 : n, s] = eta[ytimes[1 : n, s]]; ## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); ## } ## }"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"plotting-effects-and-residuals","dir":"Articles","previous_headings":"GLMs with temporal random effects","what":"Plotting effects and residuals","title":"Overview of the mvgam package","text":"Now interrogating model. can get sense variation yearly intercepts summary , easier understand using targeted plots. Plot posterior distributions temporal random effects using plot.mvgam type = 're'. See ?plot.mvgam details types plots can produced fitted mvgam objects","code":"plot(model1, type = 're')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"bayesplot-support","dir":"Articles","previous_headings":"GLMs with temporal random effects","what":"bayesplot support","title":"Overview of the mvgam package","text":"can also capitalize useful MCMC plotting functions bayesplot package visualize posterior distributions diagnostics (see ?mvgam::mcmc_plot.mvgam details): clearly variation yearly intercept estimates. translate time-varying predictions? understand , can plot posterior hindcasts model training period using plot.mvgam type = 'forecast' wish extract hindcasts downstream analyses, hindcast function can used. return list object class mvgam_forecast. hindcasts slot, matrix posterior retrodictions returned series data (one series example): can also extract hindcasts linear predictor scale, case log scale (Poisson GLM used log link function). Sometimes can useful asking targeted questions drivers variation: Objects class mvgam_forecast associated plot function well: plot can look bit confusing seems like linear interpolation end one year start next. just due way lines automatically connected base plots regression analysis, key question whether residuals show patterns can indicative un-modelled sources variation. GLMs, can use modified residual called Dunn-Smyth, randomized quantile, residual. Inspect Dunn-Smyth residuals model using plot.mvgam type = 'residuals'","code":"mcmc_plot(object = model1, variable = 'betas', type = 'areas') plot(model1, type = 'forecast') hc <- hindcast(model1) str(hc) ## List of 15 ## $ call :Class 'formula' language count ~ s(year_fac, bs = \"re\") - 1 ## .. ..- attr(*, \".Environment\")= ## $ trend_call : NULL ## $ family : chr \"poisson\" ## $ trend_model : chr \"None\" ## $ drift : logi FALSE ## $ use_lv : logi FALSE ## $ fit_engine : chr \"stan\" ## $ type : chr \"response\" ## $ series_names : chr \"PP\" ## $ train_observations:List of 1 ## ..$ PP: int [1:199] 0 1 2 NA 10 NA NA 16 18 12 ... ## $ train_times : num [1:199] 1 2 3 4 5 6 7 8 9 10 ... ## $ test_observations : NULL ## $ test_times : NULL ## $ hindcasts :List of 1 ## ..$ PP: num [1:2000, 1:199] 7 8 7 8 9 6 7 8 8 7 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:199] \"ypred[1,1]\" \"ypred[2,1]\" \"ypred[3,1]\" \"ypred[4,1]\" ... ## $ forecasts : NULL ## - attr(*, \"class\")= chr \"mvgam_forecast\" hc <- hindcast(model1, type = 'link') range(hc$hindcasts$PP) ## [1] -1.57385 3.46274 plot(hc) plot(model1, type = 'residuals')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"automatic-forecasting-for-new-data","dir":"Articles","previous_headings":"","what":"Automatic forecasting for new data","title":"Overview of the mvgam package","text":"temporal random effects sense “time”. , yearly random intercept restricted way similar previous yearly intercept. drawback becomes evident predict new year. , can repeat exercise time split data training testing sets re-running model. can supply test set newdata. splitting, make use filter function dplyr Repeating plots gives insight model’s hierarchical prior formulation provides structure needed sample values un-modelled years can also view test data forecast plot see predictions capture temporal variation test set hindcast function, can use forecast function automatically extract posterior distributions predictions. also returns object class mvgam_forecast, now contain hindcasts forecasts series data:","code":"model_data %>% dplyr::filter(time <= 160) -> data_train model_data %>% dplyr::filter(time > 160) -> data_test model1b <- mvgam(count ~ s(year_fac, bs = 're') - 1, family = poisson(), data = data_train, newdata = data_test) plot(model1b, type = 're') plot(model1b, type = 'forecast') ## Out of sample DRPS: ## [1] 183.3916 plot(model1b, type = 'forecast', newdata = data_test) ## Out of sample DRPS: ## [1] 183.3916 fc <- forecast(model1b) str(fc) ## List of 16 ## $ call :Class 'formula' language count ~ s(year_fac, bs = \"re\") - 1 ## .. ..- attr(*, \".Environment\")= ## $ trend_call : NULL ## $ family : chr \"poisson\" ## $ family_pars : NULL ## $ trend_model : chr \"None\" ## $ drift : logi FALSE ## $ use_lv : logi FALSE ## $ fit_engine : chr \"stan\" ## $ type : chr \"response\" ## $ series_names : Factor w/ 1 level \"PP\": 1 ## $ train_observations:List of 1 ## ..$ PP: int [1:160] 0 1 2 NA 10 NA NA 16 18 12 ... ## $ train_times : num [1:160] 1 2 3 4 5 6 7 8 9 10 ... ## $ test_observations :List of 1 ## ..$ PP: int [1:39] NA 0 0 10 3 14 18 NA 28 46 ... ## $ test_times : num [1:39] 161 162 163 164 165 166 167 168 169 170 ... ## $ hindcasts :List of 1 ## ..$ PP: num [1:2000, 1:160] 13 14 8 8 8 9 5 10 8 11 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:160] \"ypred[1,1]\" \"ypred[2,1]\" \"ypred[3,1]\" \"ypred[4,1]\" ... ## $ forecasts :List of 1 ## ..$ PP: num [1:2000, 1:39] 12 8 14 7 5 7 9 13 18 18 ... ## .. ..- attr(*, \"dimnames\")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:39] \"ypred[161,1]\" \"ypred[162,1]\" \"ypred[163,1]\" \"ypred[164,1]\" ... ## - attr(*, \"class\")= chr \"mvgam_forecast\""},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"adding-predictors-as-fixed-effects","dir":"Articles","previous_headings":"","what":"Adding predictors as “fixed” effects","title":"Overview of the mvgam package","text":"users familiar GLMs know nearly always wish include predictor variables may explain variation observations. Predictors easily incorporated GLMs / GAMs. , update model including parametric (fixed) effect ndvi linear predictor: model can described mathematically follows: \\[\\begin{align*} \\boldsymbol{count}_t & \\sim \\text{Poisson}(\\lambda_t) \\\\ log(\\lambda_t) & = \\beta_{year[year_t]} + \\beta_{ndvi} * \\boldsymbol{ndvi}_t \\\\ \\beta_{year} & \\sim \\text{Normal}(\\mu_{year}, \\sigma_{year}) \\\\ \\beta_{ndvi} & \\sim \\text{Normal}(0, 1) \\end{align*}\\] \\(\\beta_{year}\\) effects now another predictor \\((\\beta_{ndvi})\\) applies ndvi value timepoint \\(t\\). Inspect summary model Rather printing summary time, can also quickly look posterior empirical quantiles fixed effect ndvi (linear predictor coefficients) using coef: Look estimated effect ndvi using plot.mvgam type = 'pterms' plot indicates positive linear effect ndvi log(counts). may easier visualise using histogram, especially parametric (linear) effects. can done first extracting posterior coefficients first example: posterior distribution effect ndvi stored ndvi column. quick histogram confirms inference log(counts) respond positively increases ndvi:","code":"model2 <- mvgam(count ~ s(year_fac, bs = 're') + ndvi - 1, family = poisson(), data = data_train, newdata = data_test) summary(model2) ## GAM formula: ## count ~ ndvi + s(year_fac, bs = \"re\") - 1 ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## None ## ## N series: ## 1 ## ## N timepoints: ## 160 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## ndvi 0.33 0.39 0.46 1 1756 ## s(year_fac).1 1.10 1.40 1.70 1 2498 ## s(year_fac).2 1.80 2.00 2.20 1 1899 ## s(year_fac).3 2.20 2.40 2.60 1 2121 ## s(year_fac).4 2.30 2.50 2.70 1 1856 ## s(year_fac).5 1.20 1.40 1.60 1 2233 ## s(year_fac).6 1.00 1.30 1.50 1 2526 ## s(year_fac).7 1.10 1.40 1.70 1 2680 ## s(year_fac).8 2.10 2.30 2.50 1 2153 ## s(year_fac).9 2.70 2.90 3.00 1 1972 ## s(year_fac).10 2.00 2.20 2.40 1 2604 ## s(year_fac).11 2.30 2.40 2.60 1 1854 ## s(year_fac).12 2.50 2.70 2.80 1 1903 ## s(year_fac).13 1.40 1.60 1.80 1 2583 ## s(year_fac).14 0.65 1.90 3.20 1 1119 ## s(year_fac).15 0.64 2.00 3.20 1 1504 ## s(year_fac).16 0.62 1.90 3.20 1 1567 ## s(year_fac).17 0.67 1.90 3.20 1 1558 ## ## GAM group-level estimates: ## 2.5% 50% 97.5% Rhat n_eff ## mean(s(year_fac)) 1.60 2.00 2.30 1 324 ## sd(s(year_fac)) 0.41 0.59 0.97 1 566 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(year_fac) 11.1 2810 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:43:17 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) coef(model2) ## 2.5% 50% 97.5% Rhat n_eff ## ndvi 0.3271847 0.3909765 0.4616437 1 1756 ## s(year_fac).1 1.1187235 1.3992000 1.6530420 1 2498 ## s(year_fac).2 1.7857993 1.9953950 2.1962138 1 1899 ## s(year_fac).3 2.1821677 2.3778000 2.5536588 1 2121 ## s(year_fac).4 2.3121757 2.5055300 2.6798547 1 1856 ## s(year_fac).5 1.1794423 1.4222500 1.6405437 1 2233 ## s(year_fac).6 1.0224750 1.2704150 1.5046850 1 2526 ## s(year_fac).7 1.1412090 1.4067450 1.6811802 1 2680 ## s(year_fac).8 2.0761087 2.2678950 2.4518065 1 2153 ## s(year_fac).9 2.7186492 2.8531300 2.9811495 1 1972 ## s(year_fac).10 1.9734253 2.1819550 2.3638582 1 2604 ## s(year_fac).11 2.2608057 2.4348950 2.5893710 1 1854 ## s(year_fac).12 2.5372990 2.6877500 2.8456120 1 1903 ## s(year_fac).13 1.3677138 1.6185450 1.8419897 1 2583 ## s(year_fac).14 0.6503578 1.9153950 3.1729187 1 1119 ## s(year_fac).15 0.6358526 1.9786850 3.2192620 1 1504 ## s(year_fac).16 0.6237738 1.9464300 3.2242947 1 1567 ## s(year_fac).17 0.6657770 1.9497650 3.2148592 1 1558 plot(model2, type = 'pterms') beta_post <- as.data.frame(model2, variable = 'betas') dplyr::glimpse(beta_post) ## Rows: 2,000 ## Columns: 18 ## $ ndvi 0.360717, 0.411538, 0.413833, 0.480704, 0.298674, 0.4… ## $ `s(year_fac).1` 1.49041, 1.43493, 1.24131, 1.33732, 1.56602, 1.23166,… ## $ `s(year_fac).2` 1.96377, 2.11210, 1.77387, 1.93891, 1.96497, 1.94449,… ## $ `s(year_fac).3` 2.52795, 2.23799, 2.27527, 2.16808, 2.61478, 2.21272,… ## $ `s(year_fac).4` 2.52199, 2.48918, 2.45488, 2.48689, 2.79409, 2.32144,… ## $ `s(year_fac).5` 1.52233, 1.39382, 1.39718, 1.29478, 1.61762, 1.35901,… ## $ `s(year_fac).6` 1.43621, 1.18094, 1.38381, 1.01295, 1.35752, 1.20577,… ## $ `s(year_fac).7` 1.55584, 1.43209, 1.36560, 1.26249, 1.50926, 1.29049,… ## $ `s(year_fac).8` 2.28523, 2.26367, 2.18857, 2.11143, 2.38672, 2.06916,… ## $ `s(year_fac).9` 2.99371, 2.77475, 2.91705, 2.68196, 2.93845, 2.74811,… ## $ `s(year_fac).10` 2.21681, 2.08268, 2.21817, 1.98852, 2.31722, 1.99311,… ## $ `s(year_fac).11` 2.56472, 2.29675, 2.42560, 2.31213, 2.58424, 2.13256,… ## $ `s(year_fac).12` 2.66356, 2.64247, 2.80074, 2.54992, 2.72354, 2.59844,… ## $ `s(year_fac).13` 1.61022, 1.66395, 1.51550, 1.44828, 1.73121, 1.49267,… ## $ `s(year_fac).14` 1.444900, 2.214580, 0.623583, 2.347250, 2.071320, 1.4… ## $ `s(year_fac).15` 2.69107, 1.49073, 2.14958, 2.48082, 1.90408, 1.67007,… ## $ `s(year_fac).16` 3.213370, 2.255890, 1.332510, 1.902480, 2.153700, 1.3… ## $ `s(year_fac).17` 2.515630, 1.667650, 2.299560, 1.601190, 2.895750, 1.5… hist(beta_post$ndvi, xlim = c(-1 * max(abs(beta_post$ndvi)), max(abs(beta_post$ndvi))), col = 'darkred', border = 'white', xlab = expression(beta[NDVI]), ylab = '', yaxt = 'n', main = '', lwd = 2) abline(v = 0, lwd = 2.5)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"marginaleffects-support","dir":"Articles","previous_headings":"Adding predictors as “fixed” effects","what":"marginaleffects support","title":"Overview of the mvgam package","text":"Given model used nonlinear link function (log link example), can still difficult fully understand relationship model estimating predictor response. Fortunately, marginaleffects package makes relatively straightforward. Objects class mvgam can used marginaleffects inspect contrasts, scenario-based predictions, conditional marginal effects, outcome scale. use plot_predictions function marginaleffects inspect conditional effect ndvi (use ?plot_predictions guidance modify plots): Now easier get sense nonlinear positive relationship estimated ndvi count. Like brms, mvgam simple conditional_effects function make quick informative plots main effects. likely go-function quickly understanding patterns fitted mvgam models","code":"plot_predictions(model2, condition = \"ndvi\", # include the observed count values # as points, and show rugs for the observed # ndvi and count values on the axes points = 0.5, rug = TRUE) plot(conditional_effects(model2), ask = FALSE)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"adding-predictors-as-smooths","dir":"Articles","previous_headings":"","what":"Adding predictors as smooths","title":"Overview of the mvgam package","text":"Smooth functions, using penalized splines, major feature mvgam. Nonlinear splines commonly viewed variations random effects coefficients control shape spline drawn joint, penalized distribution. strategy often used ecological time series analysis capture smooth temporal variation processes seek study. construct smoothing splines, workhorse package mgcv calculate set basis functions collectively control shape complexity resulting spline. often helpful visualize basis functions get better sense splines work. ’ll create set 6 basis functions represent possible variation effect time outcome.addition constructing basis functions, mgcv also creates penalty matrix \\(S\\), contains known coefficients work constrain wiggliness resulting smooth function. fitting GAM data, must estimate smoothing parameters (\\(\\lambda\\)) penalize matrices, resulting constrained basis coefficients smoother functions less likely overfit data. key fitting GAMs Bayesian framework, can jointly estimate \\(\\lambda\\)’s using informative priors prevent overfitting expand complexity models can tackle. see practice, can now fit model replaces yearly random effects smooth function time. need reasonably complex function (large k) try accommodate temporal variation observations. Following useful advice Gavin Simpson, use b-spline basis temporal smooth. longer intercepts year, also retain primary intercept term model (-1 formula now): model can described mathematically follows: \\[\\begin{align*} \\boldsymbol{count}_t & \\sim \\text{Poisson}(\\lambda_t) \\\\ log(\\lambda_t) & = f(\\boldsymbol{time})_t + \\beta_{ndvi} * \\boldsymbol{ndvi}_t \\\\ f(\\boldsymbol{time}) & = \\sum_{k=1}^{K}b * \\beta_{smooth} \\\\ \\beta_{smooth} & \\sim \\text{MVNormal}(0, (\\Omega * \\lambda)^{-1}) \\\\ \\beta_{ndvi} & \\sim \\text{Normal}(0, 1) \\end{align*}\\] smooth function \\(f_{time}\\) built summing across set weighted basis functions. basis functions \\((b)\\) constructed using thin plate regression basis mgcv. weights \\((\\beta_{smooth})\\) drawn penalized multivariate normal distribution precision matrix \\((\\Omega\\)) multiplied smoothing penalty \\((\\lambda)\\). \\(\\lambda\\) becomes large, acts squeeze covariances among weights \\((\\beta_{smooth})\\), leading less wiggly spline. Note sometimes multiple smoothing penalties contribute covariance matrix, showing one simplicity. View summary summary now contains posterior estimates smoothing parameters well basis coefficients nonlinear effect time. can visualize conditional time effect using plot function type = 'smooths': default plots shows posterior empirical quantiles, can also helpful view realizations underlying function (, line different potential curve drawn posterior possible curves):","code":"model3 <- mvgam(count ~ s(time, bs = 'bs', k = 15) + ndvi, family = poisson(), data = data_train, newdata = data_test) summary(model3) ## GAM formula: ## count ~ s(time, bs = \"bs\", k = 15) + ndvi ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## None ## ## N series: ## 1 ## ## N timepoints: ## 160 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 2.00 2.10 2.200 1.00 950 ## ndvi 0.26 0.33 0.400 1.00 943 ## s(time).1 -2.10 -1.00 0.048 1.00 409 ## s(time).2 0.44 1.30 2.300 1.01 282 ## s(time).3 -0.56 0.50 1.500 1.01 289 ## s(time).4 1.50 2.50 3.600 1.01 265 ## s(time).5 -1.20 -0.16 0.880 1.01 286 ## s(time).6 -0.59 0.40 1.500 1.01 250 ## s(time).7 -1.50 -0.48 0.560 1.01 285 ## s(time).8 0.54 1.50 2.600 1.01 282 ## s(time).9 1.10 2.10 3.200 1.01 253 ## s(time).10 -0.40 0.58 1.600 1.01 276 ## s(time).11 0.80 1.80 2.900 1.01 262 ## s(time).12 0.61 1.50 2.500 1.01 277 ## s(time).13 -1.20 -0.30 0.680 1.01 337 ## s(time).14 -7.50 -4.20 -1.000 1.01 334 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(time) 9.8 774 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:44:04 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(model3, type = 'smooths') plot(model3, type = 'smooths', realisations = TRUE, n_realisations = 30)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"derivatives-of-smooths","dir":"Articles","previous_headings":"Adding predictors as smooths","what":"Derivatives of smooths","title":"Overview of the mvgam package","text":"useful question modelling using GAMs identify function changing rapidly. address , can plot estimated 1st derivatives spline: , values >0 indicate function increasing time point, values <0 indicate function declining. rapid declines appear happening around timepoints 50 toward end training period, example.","code":"plot(model3, type = 'smooths', derivatives = TRUE)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"conditional-effects","dir":"Articles","previous_headings":"Adding predictors as smooths","what":"Conditional effects","title":"Overview of the mvgam package","text":"Use conditional_effects useful plots outcome scale: link scale: Inspect underlying Stan code gain idea spline penalized: line // prior s(time)... shows spline basis coefficients drawn zero-centred multivariate normal distribution. precision matrix \\(S\\) penalized two different smoothing parameters (\\(\\lambda\\)’s) enforce smoothness reduce overfitting","code":"plot(conditional_effects(model3), ask = FALSE) plot(conditional_effects(model3, type = 'link'), ask = FALSE) code(model3) ## // Stan model code generated by package mvgam ## data { ## int total_obs; // total number of observations ## int n; // number of timepoints per series ## int n_sp; // number of smoothing parameters ## int n_series; // number of series ## int num_basis; // total number of basis coefficients ## vector[num_basis] zero; // prior locations for basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## matrix[14, 28] S1; // mgcv smooth penalty matrix S1 ## int n_nonmissing; // number of nonmissing observations ## array[n_nonmissing] int flat_ys; // flattened nonmissing observations ## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations ## array[n_nonmissing] int obs_ind; // indices of nonmissing observations ## } ## parameters { ## // raw basis coefficients ## vector[num_basis] b_raw; ## ## // smoothing parameters ## vector[n_sp] lambda; ## } ## transformed parameters { ## // basis coefficients ## vector[num_basis] b; ## b[1 : num_basis] = b_raw[1 : num_basis]; ## } ## model { ## // prior for (Intercept)... ## b_raw[1] ~ student_t(3, 2.6, 2.5); ## ## // prior for ndvi... ## b_raw[2] ~ student_t(3, 0, 2); ## ## // prior for s(time)... ## b_raw[3 : 16] ~ multi_normal_prec(zero[3 : 16], ## S1[1 : 14, 1 : 14] * lambda[1] ## + S1[1 : 14, 15 : 28] * lambda[2]); ## ## // priors for smoothing parameters ## lambda ~ normal(5, 30); ## { ## // likelihood functions ## flat_ys ~ poisson_log_glm(flat_xs, 0.0, b); ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## vector[n_sp] rho; ## array[n, n_series] int ypred; ## rho = log(lambda); ## ## // posterior predictions ## eta = X * b; ## for (s in 1 : n_series) { ## mus[1 : n, s] = eta[ytimes[1 : n, s]]; ## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); ## } ## }"},{"path":"https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html","id":"latent-dynamics-in-mvgam","dir":"Articles","previous_headings":"","what":"Latent dynamics in mvgam","title":"Overview of the mvgam package","text":"Forecasts model ideal: happening? forecasts driven almost entirely variation temporal spline, extrapolating linearly forever beyond edge training data. slight wiggles near end training set result wildly different forecasts. visualize , can plot extrapolated temporal functions --sample test set two models. extrapolated functions first model, 15 basis functions: model well. Clearly need somehow account strong temporal autocorrelation modelling data without using smooth function time. Now onto another prominent feature mvgam: ability include (possibly latent) autocorrelated residuals regression models. , use trend_model argument (see ?mvgam_trends details different dynamic trend models supported). model use separate sub-model latent residuals evolve AR1 process (.e. error current time point function error previous time point, plus stochastic noise). also include smooth function ndvi model, rather parametric term used , showcase mvgam can include combinations smooths dynamic components: model can described mathematically follows: \\[\\begin{align*} \\boldsymbol{count}_t & \\sim \\text{Poisson}(\\lambda_t) \\\\ log(\\lambda_t) & = f(\\boldsymbol{ndvi})_t + z_t \\\\ z_t & \\sim \\text{Normal}(ar1 * z_{t-1}, \\sigma_{error}) \\\\ ar1 & \\sim \\text{Normal}(0, 1)[-1, 1] \\\\ \\sigma_{error} & \\sim \\text{Exponential}(2) \\\\ f(\\boldsymbol{ndvi}) & = \\sum_{k=1}^{K}b * \\beta_{smooth} \\\\ \\beta_{smooth} & \\sim \\text{MVNormal}(0, (\\Omega * \\lambda)^{-1}) \\end{align*}\\] term \\(z_t\\) captures autocorrelated latent residuals, modelled using AR1 process. can also notice model estimating autocorrelated errors full time period, even though time points missing observations. useful getting realistic estimates residual autocorrelation parameters. Summarise model see now returns posterior summaries latent AR1 process: View conditional smooths ndvi effect: View posterior hindcasts / forecasts compare sample test data trend evolving AR1 process, can also view: -sample model performance can interrogated using leave-one-cross-validation utilities loo package (higher value preferred metric): higher estimated log predictive density (ELPD) value dynamic model suggests provides better fit -sample data. Though obvious model provides better forecasts, can quantify forecast performance models 3 4 using forecast score functions. compare models based Discrete Ranked Probability Scores (lower value preferred metric) strongly negative value suggests score dynamic model (model 4) much smaller score model smooth function time (model 3)","code":"plot(model3, type = 'forecast', newdata = data_test) ## Out of sample DRPS: ## [1] 286.9079 plot_mvgam_smooth(model3, smooth = 's(time)', # feed newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame(time = 1:max(data_test$time), ndvi = 0)) abline(v = max(data_train$time), lty = 'dashed', lwd = 2) model4 <- mvgam(count ~ s(ndvi, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = 'AR1') summary(model4) ## GAM formula: ## count ~ s(ndvi, k = 6) ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## AR1 ## ## N series: ## 1 ## ## N timepoints: ## 160 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 1.000 1.900 2.600 1.04 59 ## s(ndvi).1 -0.074 0.011 0.170 1.02 378 ## s(ndvi).2 -0.120 0.015 0.360 1.01 241 ## s(ndvi).3 -0.053 -0.002 0.049 1.00 872 ## s(ndvi).4 -0.240 0.120 1.500 1.02 183 ## s(ndvi).5 -0.097 0.150 0.350 1.01 491 ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(ndvi) 1.22 80 0.072 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Latent trend parameter AR estimates: ## 2.5% 50% 97.5% Rhat n_eff ## ar1[1] 0.70 0.82 0.92 1.01 313 ## sigma[1] 0.67 0.80 0.96 1.01 476 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:45:08 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot_predictions(model4, condition = \"ndvi\", points = 0.5, rug = TRUE) plot(model4, type = 'forecast', newdata = data_test) ## Out of sample DRPS: ## [1] 150.4985 plot(model4, type = 'trend', newdata = data_test) loo_compare(model3, model4) ## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details. ## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details. ## elpd_diff se_diff ## model4 0.0 0.0 ## model3 -560.4 66.2 fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = 'drps') score_mod4 <- score(fc_mod4, score = 'drps') sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ## [1] -136.4094"},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"n-mixture-models","dir":"Articles","previous_headings":"","what":"N-mixture models","title":"N-mixtures in mvgam","text":"N-mixture model fairly recent addition ecological modeller’s toolkit designed make inferences variation abundance species observations imperfect (Royle 2004). Briefly, assume \\(\\boldsymbol{Y_{,r}}\\) number individuals recorded site \\(\\) replicate sampling observation \\(r\\) (recorded non-negative integer). multiple replicate surveys done within short enough period satisfy assumption population remained closed (.e. substantial change true population size replicate surveys), can account fact observations aren’t perfect. done assuming replicate observations Binomial random variables parameterized true “latent” abundance \\(N\\) detection probability \\(p\\): \\[\\begin{align*} \\boldsymbol{Y_{,r}} & \\sim \\text{Binomial}(N_i, p_r) \\\\ N_{} & \\sim \\text{Poisson}(\\lambda_i) \\end{align*}\\] Using set linear predictors, can estimate effects covariates \\(\\boldsymbol{X}\\) expected latent abundance (log link \\(\\lambda\\)) , jointly, effects possibly different covariates (call \\(\\boldsymbol{Q}\\)) detection probability (logit link \\(p\\)): \\[\\begin{align*} log(\\lambda) & = \\beta \\boldsymbol{X} \\\\ logit(p) & = \\gamma \\boldsymbol{Q}\\end{align*}\\] mvgam can handle type model designed propagate unobserved temporal processes evolve independently observation process State-space format. setup adapts well N-mixture models can thought State-space models latent state discrete variable representing “true” unknown population size. convenient can incorporate package’s diverse effect types (.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc…) linear predictors. required work marginalization trick allows Stan’s sampling algorithms handle discrete parameters (see method “integrating ” discrete parameters works nice blog post Maxwell Joseph). family nmix() used set N-mixture models mvgam, still need little bit data wrangling ensure data set correct format (especially true one replicate survey per time period). important aspects : (1) set observation series trend_map arguments ensure replicate surveys mapped correct latent abundance model (2) inclusion cap variable defines maximum possible integer value use observation estimating latent abundance. two examples give reasonable overview can done.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"example-1-a-two-species-system-with-nonlinear-trends","dir":"Articles","previous_headings":"","what":"Example 1: a two-species system with nonlinear trends","title":"N-mixtures in mvgam","text":"First use simple simulation multiple replicate observations taken timepoint two different species. simulation produces observations single site six years, five replicate surveys per year. species simulated different nonlinear temporal trends different detection probabilities. now, detection probability fixed (.e. change time association covariates). Notice add cap variable, need static, define maximum possible value think latent abundance timepoint. simply needs large enough get reasonable idea latent N values likely, without adding much computational cost: data format isn’t difficult set , differ traditional multidimensional array setup commonly used fitting N-mixture models software packages. Next ensure species series IDs included factor variables, case ’d like allow certain effects vary species Preview dataset get idea structured:","code":"set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame(site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c(rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5)), # observations are taken with detection prob = 0.7 obs = c(rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7))) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100) %>% dplyr::select(- replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat = testdat %>% dplyr::bind_rows(data.frame(site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c(rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5)), obs = c(rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45))) %>% dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), cap = 50) %>% dplyr::select(-replicate)) testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) dplyr::glimpse(testdat) ## Rows: 60 ## Columns: 7 ## $ site 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ time 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5,… ## $ species sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp… ## $ truth 28, 28, 28, 28, 28, 26, 26, 26, 26, 26, 23, 23, 23, 23, 23, 16… ## $ obs 20, 19, 23, 17, 18, 21, 18, 21, 19, 18, 17, 16, 20, 11, 19, 9,… ## $ series site_1_sp_1_rep_1, site_1_sp_1_rep_2, site_1_sp_1_rep_3, site_… ## $ cap 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 10… head(testdat, 12) ## site time species truth obs series cap ## 1 1 1 sp_1 28 20 site_1_sp_1_rep_1 100 ## 2 1 1 sp_1 28 19 site_1_sp_1_rep_2 100 ## 3 1 1 sp_1 28 23 site_1_sp_1_rep_3 100 ## 4 1 1 sp_1 28 17 site_1_sp_1_rep_4 100 ## 5 1 1 sp_1 28 18 site_1_sp_1_rep_5 100 ## 6 1 2 sp_1 26 21 site_1_sp_1_rep_1 100 ## 7 1 2 sp_1 26 18 site_1_sp_1_rep_2 100 ## 8 1 2 sp_1 26 21 site_1_sp_1_rep_3 100 ## 9 1 2 sp_1 26 19 site_1_sp_1_rep_4 100 ## 10 1 2 sp_1 26 18 site_1_sp_1_rep_5 100 ## 11 1 3 sp_1 23 17 site_1_sp_1_rep_1 100 ## 12 1 3 sp_1 23 16 site_1_sp_1_rep_2 100"},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"setting-up-the-trend_map","dir":"Articles","previous_headings":"Example 1: a two-species system with nonlinear trends","what":"Setting up the trend_map","title":"N-mixtures in mvgam","text":"Finally, need set trend_map object. crucial allowing multiple observations linked latent process model (see information argument Shared latent states vignette. case, mapping operates species site state set replicate observations time point share exact latent abundance model: Notice replicates species 1 site 1 share process (.e. trend). ensure replicates Binomial draws latent N.","code":"testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ## trend series ## 1 1 site_1_sp_1_rep_1 ## 2 1 site_1_sp_1_rep_2 ## 3 1 site_1_sp_1_rep_3 ## 4 1 site_1_sp_1_rep_4 ## 5 1 site_1_sp_1_rep_5 ## 6 2 site_1_sp_2_rep_1 ## 7 2 site_1_sp_2_rep_2 ## 8 2 site_1_sp_2_rep_3 ## 9 2 site_1_sp_2_rep_4 ## 10 2 site_1_sp_2_rep_5"},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"modelling-with-the-nmix-family","dir":"Articles","previous_headings":"Example 1: a two-species system with nonlinear trends","what":"Modelling with the nmix() family","title":"N-mixtures in mvgam","text":"Now ready fit model using mvgam(). model allow species different detection probabilities different temporal trends. use Cmdstan backend, default use Hamiltonian Monte Carlo full Bayesian inference View automatically-generated Stan code get sense marginalization latent N works summary model shows converged nicely loo() functionality works just mvgam models aid model comparison / selection Plot estimated smooths time species’ latent abundance process (log scale) marginaleffects support allows useful prediction-based interrogations different scales. Objects use family nmix() additional prediction scales can used (.e. link, response, detection latent_N). example, estimated detection probabilities per species, shows model -estimated detection probability species 2 (originally simulated 0.45): common goal N-mixture modelling estimate true latent abundance. model automatically generated predictions latent abundance conditional observations. can extract produce decent plots using small function Latent abundance plots vs simulated truths species shown . , red points show imperfect observations, black line shows true latent abundance, ribbons show credible intervals estimates: can see estimates species correctly captured true temporal variation abundance. However, also apparent low detection probabilities (like species 2) make difficult accurately estimate latent abundance. likely improve estimates additional information inform estimates detection probability, covariates reflect ability take accurate measurements","code":"mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c(prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend))) code(mod) ## // Stan model code generated by package mvgam ## functions { ## /* Functions to return the log probability of a Poisson Binomial Mixture */ ## ## /* see Bollen et al 2023 for details (https://doi.org/10.1002/ece3.10595)*/ ## real poisbin_lpmf(array[] int count, int k, array[] real lambda, ## array[] real p) { ## if (max(count) > k) { ## return negative_infinity(); ## } ## return poisson_log_lpmf(k | lambda) + binomial_logit_lpmf(count | k, p); ## } ## vector pb_logp(array[] int count, int max_k, array[] real lambda, ## array[] real p) { ## int c_max = max(count); ## if (max_k < c_max) { ## reject(\"cap variable max_k must be >= observed counts\"); ## } ## vector[max_k + 1] lp; ## for (k in 0 : (c_max - 1)) { ## lp[k + 1] = negative_infinity(); ## } ## for (k in c_max : max_k) { ## lp[k + 1] = poisbin_lpmf(count | k, lambda, p); ## } ## return lp; ## } ## real pb_lpmf(array[] int count, array[] int max_k, array[] real lambda, ## array[] real p) { ## // Take maximum of all supplied caps, in case they vary for some reason ## int max_k_max = max(max_k); ## vector[max_k_max + 1] lp; ## lp = pb_logp(count, max_k_max, lambda, p); ## return log_sum_exp(lp); ## } ## /* Functions to generate truncated Poisson variates */ ## array[] int nmix_rng(array[] int count, array[] int max_k, ## array[] real lambda, array[] real p) { ## // Take maximum of all supplied caps, in case they vary for some reason ## int max_k_max = max(max_k); ## vector[max_k_max + 1] lp; ## lp = pb_logp(count, max_k_max, lambda, p); ## return rep_array(categorical_rng(softmax(lp)) - 1, size(count)); ## } ## int trunc_pois_rng(int max_k, real lambda) { ## real p_ub = poisson_cdf(max_k | lambda); ## if (p_ub < 1e-9) { ## return max_k; ## } ## real u = uniform_rng(0, p_ub); ## int i = 0; ## int X = 0; ## real p = exp(-lambda); ## real F = p; ## while (1) { ## if (u < F) { ## X = i; ## break; ## } ## i = i + 1; ## p = lambda * p / i; ## F = F + p; ## } ## return X; ## } ## } ## data { ## int total_obs; // total number of observations ## int n; // number of timepoints per series ## int n_sp_trend; // number of trend smoothing parameters ## int n_lv; // number of dynamic factors ## int n_series; // number of series ## matrix[n_series, n_lv] Z; // matrix mapping series to latent states ## int num_basis; // total number of basis coefficients ## int num_basis_trend; // number of trend basis coefficients ## vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix ## array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## array[n, n_lv] int ytimes_trend; ## int n_nonmissing; // number of nonmissing observations ## array[total_obs] int cap; // upper limits of latent abundances ## array[total_obs] int ytimes_array; // sorted ytimes ## array[n, n_series] int ytimes_pred; // time-ordered matrix for prediction ## int K_groups; // number of unique replicated observations ## int K_reps; // maximum number of replicate observations ## array[K_groups] int K_starts; // col of K_inds where each group starts ## array[K_groups] int K_stops; // col of K_inds where each group ends ## array[K_groups, K_reps] int K_inds; // indices of replicated observations ## matrix[3, 6] S_trend1; // mgcv smooth penalty matrix S_trend1 ## matrix[3, 6] S_trend2; // mgcv smooth penalty matrix S_trend2 ## array[total_obs] int flat_ys; // flattened observations ## } ## transformed data { ## ## } ## parameters { ## // raw basis coefficients ## vector[num_basis] b_raw; ## vector[num_basis_trend] b_raw_trend; ## ## // smoothing parameters ## vector[n_sp_trend] lambda_trend; ## } ## transformed parameters { ## // detection probability ## vector[total_obs] p; ## ## // latent states ## matrix[n, n_lv] LV; ## ## // latent states and loading matrix ## vector[n * n_lv] trend_mus; ## matrix[n, n_series] trend; ## matrix[n_series, n_lv] lv_coefs; ## ## // basis coefficients ## vector[num_basis] b; ## vector[num_basis_trend] b_trend; ## ## // observation model basis coefficients ## b[1 : num_basis] = b_raw[1 : num_basis]; ## ## // process model basis coefficients ## b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend]; ## ## // detection probability ## p = X[ytimes_array, : ] * b; ## ## // latent process linear predictors ## trend_mus = X_trend * b_trend; ## for (j in 1 : n_lv) { ## LV[1 : n, j] = trend_mus[ytimes_trend[1 : n, j]]; ## } ## ## // derived latent states ## lv_coefs = Z; ## for (i in 1 : n) { ## for (s in 1 : n_series) { ## trend[i, s] = dot_product(lv_coefs[s, : ], LV[i, : ]); ## } ## } ## } ## model { ## // prior for speciessp_1... ## b_raw[1] ~ std_normal(); ## ## // prior for speciessp_2... ## b_raw[2] ~ std_normal(); ## ## // dynamic process models ## ## // prior for (Intercept)_trend... ## b_raw_trend[1] ~ normal(1, 1.5); ## ## // prior for speciessp_2_trend... ## b_raw_trend[2] ~ std_normal(); ## ## // prior for s(time):trendtrend1_trend... ## b_raw_trend[3 : 5] ~ multi_normal_prec(zero_trend[3 : 5], ## S_trend1[1 : 3, 1 : 3] ## * lambda_trend[1] ## + S_trend1[1 : 3, 4 : 6] ## * lambda_trend[2]); ## ## // prior for s(time):trendtrend2_trend... ## b_raw_trend[6 : 8] ~ multi_normal_prec(zero_trend[6 : 8], ## S_trend2[1 : 3, 1 : 3] ## * lambda_trend[3] ## + S_trend2[1 : 3, 4 : 6] ## * lambda_trend[4]); ## lambda_trend ~ normal(5, 30); ## { ## // likelihood functions ## vector[total_obs] flat_trends; ## flat_trends = to_vector(trend); ## for (k in 1 : K_groups) { ## target += pb_lpmf(flat_ys[K_inds[k, K_starts[k] : K_stops[k]]] | cap[K_inds[k, K_starts[k] : K_stops[k]]], to_array_1d(flat_trends[K_inds[k, K_starts[k] : K_stops[k]]]), to_array_1d(p[K_inds[k, K_starts[k] : K_stops[k]]])); ## } ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## vector[n_sp_trend] rho_trend; ## vector[n_lv] penalty; ## array[n, n_series] int ypred; ## array[n, n_series] int latent_ypred; ## array[total_obs] int latent_truncpred; ## vector[total_obs] flat_trends; ## vector[total_obs] detprob; ## detprob = inv_logit(p); ## penalty = rep_vector(1e12, n_lv); ## rho_trend = log(lambda_trend); ## ## // posterior predictions ## eta = X * b; ## { ## flat_trends = to_vector(trend); ## ## // prediction for all timepoints that ignore detection prob ## for (i in 1 : total_obs) { ## latent_truncpred[i] = trunc_pois_rng(cap[i], exp(flat_trends[i])); ## } ## ## // prediction for the nonmissing timepoints using actual obs ## for (k in 1 : K_groups) { ## latent_truncpred[K_inds[k, K_starts[k] : K_stops[k]]] = nmix_rng(flat_ys[K_inds[k, K_starts[k] : K_stops[k]]], ## cap[K_inds[k, K_starts[k] : K_stops[k]]], ## to_array_1d( ## flat_trends[K_inds[k, K_starts[k] : K_stops[k]]]), ## to_array_1d( ## p[K_inds[k, K_starts[k] : K_stops[k]]])); ## } ## for (s in 1 : n_series) { ## for (i in 1 : n) { ## // true latent abundance ## latent_ypred[i, s] = latent_truncpred[ytimes_pred[i, s]]; ## ## // observed abundance ## ypred[i, s] = binomial_rng(latent_ypred[i, s], ## detprob[ytimes_pred[i, s]]); ## ## // expected values ## mus[i, s] = detprob[ytimes[i, s]] * latent_ypred[i, s]; ## } ## } ## } ## } summary(mod) ## GAM observation formula: ## obs ~ species - 1 ## ## GAM process formula: ## ~s(time, by = trend, k = 4) + species ## ## Family: ## nmix ## ## Link function: ## log ## ## Trend model: ## None ## ## N process models: ## 2 ## ## N series: ## 10 ## ## N timepoints: ## 6 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## speciessp_1 0.520 1.10 1.6 1 982 ## speciessp_2 0.031 0.71 1.2 1 1302 ## ## GAM process model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept)_trend 2.700 2.900 3.100 1 972 ## speciessp_2_trend -1.100 -0.820 -0.530 1 881 ## s(time):trendtrend1.1_trend -0.061 0.027 0.220 1 821 ## s(time):trendtrend1.2_trend -0.150 0.028 0.250 1 1523 ## s(time):trendtrend1.3_trend -0.410 -0.280 -0.094 1 1102 ## s(time):trendtrend2.1_trend -0.310 -0.021 0.092 1 481 ## s(time):trendtrend2.2_trend -0.110 0.110 0.750 1 481 ## s(time):trendtrend2.3_trend 0.170 0.410 0.630 1 917 ## ## Approximate significance of GAM process smooths: ## edf F p-value ## s(time):seriestrend1 0.596 0.26 0.0013 ** ## s(time):seriestrend2 0.881 0.41 0.0269 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:46:52 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) loo(mod) ## Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details. ## ## Computed from 2000 by 60 log-likelihood matrix ## ## Estimate SE ## elpd_loo -140.0 3.2 ## p_loo 4.2 0.7 ## looic 280.0 6.3 ## ------ ## Monte Carlo SE of elpd_loo is 0.1. ## ## Pareto k diagnostic values: ## Count Pct. Min. n_eff ## (-Inf, 0.5] (good) 57 95.0% 386 ## (0.5, 0.7] (ok) 3 5.0% 567 ## (0.7, 1] (bad) 0 0.0% ## (1, Inf) (very bad) 0 0.0% ## ## All Pareto k estimates are ok (k < 0.7). ## See help('pareto-k-diagnostic') for details. plot(mod, type = 'smooths', trend_effects = TRUE) plot_predictions(mod, condition = 'species', type = 'detection') + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') hc <- hindcast(mod, type = 'latent_N') # Function to plot latent abundance estimates vs truth plot_latentN = function(hindcasts, data, species = 'sp_1'){ all_series <- unique(data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series)) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call(rbind, lapply(all_series, function(x){ ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] })) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) quantile(x, probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95))))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = \"#DCBCBC\") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = \"#B97C7C\") + geom_line(aes(x = time, y = truth), colour = 'black', linewidth = 1) + geom_point(aes(x = time, y = truth), shape = 21, colour = 'white', fill = 'black', size = 2.5) + geom_jitter(data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = 'darkred', colour = 'white', size = 2.5) + labs(y = 'Latent abundance (N)', x = 'Time', title = species) } plot_latentN(hc, testdat, species = 'sp_1') plot_latentN(hc, testdat, species = 'sp_2')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"example-2-a-two-species-system-with-nonlinear-trends","dir":"Articles","previous_headings":"","what":"Example 2: a two-species system with nonlinear trends","title":"N-mixtures in mvgam","text":"Now another example larger dataset. use data Jeff Doser’s simulation example wonderful spAbundance package. simulated data include one continuous site-level covariate, one factor site-level covariate two continuous sample-level covariates. example allow us examine can include possibly nonlinear effects latent process detection probability models. Download data grab observations / covariate measurements one species Next wrangle appropriate ‘long’ data format, adding indicators time series working mvgam. also add cap variable represent maximum latent N marginalize observation data include observations 225 sites three replicates per site, though observations missing final step data preparation course trend_map, sets mapping observation replicates latent abundance models. done way example Now ready fit model using mvgam(). use penalized splines continuous covariate effects detect possible nonlinear associations. also showcase mvgam can make use different approximation algorithms available Stan using meanfield variational Bayes approximator (reduces computation time substantially) Inspect model summary don’t bother looking estimates individual spline coefficients. Notice longer receive information convergence use MCMC sampling model can make use marginaleffects support interrogating model targeted predictions. First, can inspect estimated average detection probability Next investigate estimated effects covariates latent abundance using conditional_effects() function specifying type = 'link'; return plots expectation scale effect continuous covariate expected latent abundance effect factor covariate expected latent abundance, estimated hierarchical random effect Now can investigate estimated effects covariates detection probability using type = 'detection' covariate smooths estimated somewhat nonlinear logit scale according model summary (based approximate significances). inspecting conditional effects covariate probability scale intuitive useful targeted predictions also easy marginaleffects support. example, can ask: detection probability change change detection covariates? model found support important covariate effects, course ’d want interrogate well model predicts think possible spatial effects capture unmodelled variation latent abundance.","code":"# Date link load(url('https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda')) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[,] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) mod_data <- do.call(rbind, lapply(1:NROW(data.one.sp$y), function(x){ data.frame(y = data.one.sp$y[x,], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x,], det_cov2 = det.cov2[x,], replicate = 1:NCOL(data.one.sp$y), site = paste0('site', x)) })) %>% dplyr::mutate(species = 'sp_1', series = as.factor(paste0(site, '_', species, '_', replicate))) %>% dplyr::mutate(site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20) NROW(mod_data) ## [1] 675 dplyr::glimpse(mod_data) ## Rows: 675 ## Columns: 11 ## $ y 1, NA, NA, NA, 2, 2, NA, 1, NA, NA, 0, 1, 0, 0, 0, 0, NA, NA… ## $ abund_cov -0.3734384, -0.3734384, -0.3734384, 0.7064305, 0.7064305, 0.… ## $ abund_fac 3, 3, 3, 4, 4, 4, 9, 9, 9, 2, 2, 2, 3, 3, 3, 2, 2, 2, 1, 1, … ## $ det_cov -1.2827999, -0.6412036, 1.7083192, 0.7640157, 0.1954809, 0.9… ## $ det_cov2 2.030473137, 0.151511085, -0.439251153, -1.481393226, 1.0455… ## $ replicate 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, … ## $ site site1, site1, site1, site2, site2, site2, site3, site3, site… ## $ species sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, … ## $ series site1_sp_1_1, site1_sp_1_2, site1_sp_1_3, site2_sp_1_1, site… ## $ time 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … ## $ cap 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, … head(mod_data) ## y abund_cov abund_fac det_cov det_cov2 replicate site species ## 1 1 -0.3734384 3 -1.2827999 2.0304731 1 site1 sp_1 ## 2 NA -0.3734384 3 -0.6412036 0.1515111 2 site1 sp_1 ## 3 NA -0.3734384 3 1.7083192 -0.4392512 3 site1 sp_1 ## 4 NA 0.7064305 4 0.7640157 -1.4813932 1 site2 sp_1 ## 5 2 0.7064305 4 0.1954809 1.0455536 2 site2 sp_1 ## 6 2 0.7064305 4 0.9673034 1.9197118 3 site2 sp_1 ## series time cap ## 1 site1_sp_1_1 1 33 ## 2 site1_sp_1_2 1 33 ## 3 site1_sp_1_3 1 33 ## 4 site2_sp_1_1 1 33 ## 5 site2_sp_1_2 1 33 ## 6 site2_sp_1_3 1 33 mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ## trend series ## 1 1 site100_sp_1_1 ## 2 1 site100_sp_1_2 ## 3 1 site100_sp_1_3 ## 4 2 site101_sp_1_1 ## 5 2 site101_sp_1_2 ## 6 2 site101_sp_1_3 ## 7 3 site102_sp_1_1 ## 8 3 site102_sp_1_2 ## 9 3 site102_sp_1_3 ## 10 4 site103_sp_1_1 ## 11 4 site103_sp_1_2 ## 12 4 site103_sp_1_3 mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = 're'), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c(prior(std_normal(), class = 'b'), prior(std_normal(), class = 'Intercept'), prior(std_normal(), class = 'Intercept_trend')), # use Stan's variational inference for quicker results algorithm = 'meanfield', samples = 1000) summary(mod, include_betas = FALSE) ## GAM observation formula: ## y ~ s(det_cov, k = 3) + s(det_cov2, k = 3) ## ## GAM process formula: ## ~s(abund_cov, k = 3) + s(abund_fac, bs = \"re\") ## ## Family: ## nmix ## ## Link function: ## log ## ## Trend model: ## None ## ## N process models: ## 225 ## ## N series: ## 675 ## ## N timepoints: ## 1 ## ## Status: ## Fitted using Stan ## 1 chains, each with iter = 1000; warmup = ; thin = 1 ## Total post-warmup draws = 1000 ## ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n.eff ## (Intercept) 0.35 0.75 1.2 NaN NaN ## ## Approximate significance of GAM observation smooths: ## edf Chi.sq p-value ## s(det_cov) 1.99 86.7 0.00086 *** ## s(det_cov2) 2.00 359.2 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## GAM process model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n.eff ## (Intercept)_trend 0.91 1.2 1.4 NaN NaN ## ## GAM process model group-level estimates: ## 2.5% 50% 97.5% Rhat n.eff ## mean(s(abund_fac))_trend -1.70 -1.40 -1.20 NaN NaN ## sd(s(abund_fac))_trend 0.17 0.28 0.48 NaN NaN ## ## Approximate significance of GAM process smooths: ## edf F p-value ## s(abund_cov) 1.90 2.13 0.978 ## s(abund_fac) 8.87 1.28 0.039 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Posterior approximation used: no diagnostics to compute avg_predictions(mod, type = 'detection') ## ## Estimate 2.5 % 97.5 % ## 0.647 0.568 0.721 ## ## Columns: estimate, conf.low, conf.high ## Type: detection abund_plots <- plot(conditional_effects(mod, type = 'link', effects = c('abund_cov', 'abund_fac')), plot = FALSE) abund_plots[[1]] + ylab('Expected latent abundance') abund_plots[[2]] + ylab('Expected latent abundance') det_plots <- plot(conditional_effects(mod, type = 'detection', effects = c('det_cov', 'det_cov2')), plot = FALSE) det_plots[[1]] + ylab('Pr(detection)') det_plots[[2]] + ylab('Pr(detection)') fivenum_round = function(x)round(fivenum(x, na.rm = TRUE), 2) plot_predictions(mod, newdata = datagrid(det_cov = unique, det_cov2 = fivenum_round), by = c('det_cov', 'det_cov2'), type = 'detection') + theme_classic() + ylab('Pr(detection)')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/nmixtures.html","id":"further-reading","dir":"Articles","previous_headings":"","what":"Further reading","title":"N-mixtures in mvgam","text":"following papers resources offer useful material N-mixture models ecological population dynamics investigations: Guélat, Jérôme, Kéry, Marc. “Effects Spatial Autocorrelation Imperfect Detection Species Distribution Models.” Methods Ecology Evolution 9 (2018): 1614–25. Kéry, Marc, Royle Andrew J. “Applied hierarchical modeling ecology: Analysis distribution, abundance species richness R BUGS: Volume 2: Dynamic advanced models”. London, UK: Academic Press (2020). Royle, Andrew J. “N‐mixture models estimating population size spatially replicated counts.” Biometrics 60.1 (2004): 108-115.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"the-trend_map-argument","dir":"Articles","previous_headings":"","what":"The trend_map argument","title":"Shared latent states in mvgam","text":"trend_map argument mvgam() function optional data.frame can used specify series depend latent process models (called “trends” mvgam). can particularly useful wish force multiple observed time series depend latent trend process, different observation processes. argument supplied, latent factor model set setting use_lv = TRUE using supplied trend_map set shared trends. Users familiar MARSS family packages recognize way specifying \\(Z\\) matrix. data.frame needs column names series trend, integer values trend column state trend series depend . series column single unique entry time series data, names perfectly match factor levels series variable data). example, simulate collection three integer-valued time series (using sim_mvgam), following trend_map force first two series share latent trend process: can see factor levels trend_map match data:","code":"set.seed(122) simdat <- sim_mvgam(trend_model = 'AR1', prop_trend = 0.6, mu = c(0, 1, 2), family = poisson()) trend_map <- data.frame(series = unique(simdat$data_train$series), trend = c(1, 1, 2)) trend_map ## series trend ## 1 series_1 1 ## 2 series_2 1 ## 3 series_3 2 all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ## [1] TRUE"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"checking-trend_map-with-run_model-false","dir":"Articles","previous_headings":"The trend_map argument","what":"Checking trend_map with run_model = FALSE","title":"Shared latent states in mvgam","text":"Supplying trend_map mvgam function simple model, setting run_model = FALSE, allows us inspect constructed Stan code data objects used condition model. set model series different observation process (different intercept per series case), two latent dynamic process models evolve independent AR1 processes also contain shared nonlinear smooth function capture repeated seasonality. model complicated show can learn shared independent effects collections time series mvgam framework: Inspecting Stan code shows model dynamic factor model loadings constructed reflect supplied trend_map: Notice line states “lv_coefs = Z;”. uses supplied \\(Z\\) matrix construct loading coefficients. supplied matrix now looks exactly like ’d use create similar model MARSS package:","code":"fake_mod <- mvgam(y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = 'cc', k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics trend_model = 'AR1', # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE) code(fake_mod) ## // Stan model code generated by package mvgam ## data { ## int total_obs; // total number of observations ## int n; // number of timepoints per series ## int n_sp_trend; // number of trend smoothing parameters ## int n_lv; // number of dynamic factors ## int n_series; // number of series ## matrix[n_series, n_lv] Z; // matrix mapping series to latent states ## int num_basis; // total number of basis coefficients ## int num_basis_trend; // number of trend basis coefficients ## vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix ## array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## array[n, n_lv] int ytimes_trend; ## int n_nonmissing; // number of nonmissing observations ## matrix[4, 4] S_trend1; // mgcv smooth penalty matrix S_trend1 ## array[n_nonmissing] int flat_ys; // flattened nonmissing observations ## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations ## array[n_nonmissing] int obs_ind; // indices of nonmissing observations ## } ## transformed data { ## ## } ## parameters { ## // raw basis coefficients ## vector[num_basis] b_raw; ## vector[num_basis_trend] b_raw_trend; ## ## // latent state SD terms ## vector[n_lv] sigma; ## ## // latent state AR1 terms ## vector[n_lv] ar1; ## ## // latent states ## matrix[n, n_lv] LV; ## ## // smoothing parameters ## vector[n_sp_trend] lambda_trend; ## } ## transformed parameters { ## // latent states and loading matrix ## vector[n * n_lv] trend_mus; ## matrix[n, n_series] trend; ## matrix[n_series, n_lv] lv_coefs; ## ## // basis coefficients ## vector[num_basis] b; ## vector[num_basis_trend] b_trend; ## ## // observation model basis coefficients ## b[1 : num_basis] = b_raw[1 : num_basis]; ## ## // process model basis coefficients ## b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend]; ## ## // latent process linear predictors ## trend_mus = X_trend * b_trend; ## ## // derived latent states ## lv_coefs = Z; ## for (i in 1 : n) { ## for (s in 1 : n_series) { ## trend[i, s] = dot_product(lv_coefs[s, : ], LV[i, : ]); ## } ## } ## } ## model { ## // prior for seriesseries_1... ## b_raw[1] ~ student_t(3, 0, 2); ## ## // prior for seriesseries_2... ## b_raw[2] ~ student_t(3, 0, 2); ## ## // prior for seriesseries_3... ## b_raw[3] ~ student_t(3, 0, 2); ## ## // priors for AR parameters ## ar1 ~ std_normal(); ## ## // priors for latent state SD parameters ## sigma ~ student_t(3, 0, 2.5); ## ## // dynamic process models ## ## // prior for s(season)_trend... ## b_raw_trend[1 : 4] ~ multi_normal_prec(zero_trend[1 : 4], ## S_trend1[1 : 4, 1 : 4] ## * lambda_trend[1]); ## lambda_trend ~ normal(5, 30); ## for (j in 1 : n_lv) { ## LV[1, j] ~ normal(trend_mus[ytimes_trend[1, j]], sigma[j]); ## for (i in 2 : n) { ## LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]] ## + ar1[j] ## * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]), ## sigma[j]); ## } ## } ## { ## // likelihood functions ## vector[n_nonmissing] flat_trends; ## flat_trends = to_vector(trend)[obs_ind]; ## flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0, ## append_row(b, 1.0)); ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## vector[n_sp_trend] rho_trend; ## vector[n_lv] penalty; ## array[n, n_series] int ypred; ## penalty = 1.0 / (sigma .* sigma); ## rho_trend = log(lambda_trend); ## ## // posterior predictions ## eta = X * b; ## for (s in 1 : n_series) { ## mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; ## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); ## } ## } fake_mod$model_data$Z ## [,1] [,2] ## [1,] 1 0 ## [2,] 1 0 ## [3,] 0 1"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"fitting-and-inspecting-the-model","dir":"Articles","previous_headings":"The trend_map argument","what":"Fitting and inspecting the model","title":"Shared latent states in mvgam","text":"Though model doesn’t perfectly match data-generating process (allowed series different underlying dynamics), can still fit show resulting inferences look like: summary model informative shows two latent process models estimated, even though three observed time series. model converges well Quick plots main effects can made using conditional_effects(): Even informative plots latent processes. series 1 2 share exact estimates, estimates series 3 different: However, forecasts series’ 1 2 differ different intercepts observation model","code":"full_mod <- mvgam(y ~ series - 1, trend_formula = ~ s(season, bs = 'cc', k = 6), trend_model = 'AR1', trend_map = trend_map, family = poisson(), data = simdat$data_train) summary(full_mod) ## GAM observation formula: ## y ~ series - 1 ## ## GAM process formula: ## ~s(season, bs = \"cc\", k = 6) ## ## Family: ## poisson ## ## Link function: ## log ## ## Trend model: ## AR1 ## ## N process models: ## 2 ## ## N series: ## 3 ## ## N timepoints: ## 75 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## seriesseries_1 -0.15 0.088 0.31 1.00 1895 ## seriesseries_2 0.92 1.100 1.20 1.00 1267 ## seriesseries_3 1.90 2.100 2.30 1.02 256 ## ## Process model AR parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## ar1[1] -0.72 -0.420 -0.056 1 676 ## ar1[2] -0.28 -0.011 0.280 1 1433 ## ## Process error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma[1] 0.33 0.49 0.67 1 487 ## sigma[2] 0.59 0.73 0.91 1 948 ## ## GAM process model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## s(season).1_trend -0.22 -0.011 0.20 1 1612 ## s(season).2_trend -0.27 -0.045 0.18 1 1745 ## s(season).3_trend -0.15 0.074 0.29 1 1347 ## s(season).4_trend -0.15 0.067 0.28 1 1561 ## ## Approximate significance of GAM process smooths: ## edf F p-value ## s(season) 1.52 0.1 0.91 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:55:17 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(conditional_effects(full_mod, type = 'link'), ask = FALSE) plot(full_mod, type = 'trend', series = 1) plot(full_mod, type = 'trend', series = 2) plot(full_mod, type = 'trend', series = 3) plot(full_mod, type = 'forecast', series = 1) plot(full_mod, type = 'forecast', series = 2) plot(full_mod, type = 'forecast', series = 3)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"example-signal-detection","dir":"Articles","previous_headings":"","what":"Example: signal detection","title":"Shared latent states in mvgam","text":"Now explore complicated example. simulate true hidden signal trying track. signal depends nonlinearly covariate (called productivity, represents measure productive landscape ). signal also demonstrates fairly large amount temporal autocorrelation: Plot signal inspect ’s evolution time plot relationship signal productivity covariate: Next simulate three sensors trying track hidden signal. sensors different observation errors can depend nonlinearly second external covariate, called temperature example. makes use gamSim Plot sensor observations now plot observed relationships three sensors temperature covariate","code":"set.seed(543210) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- gamSim(n = 100, eg = 1, scale = 1) ## Gu & Wahba 4 term additive model # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector(scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1))) plot(true_signal, type = 'l', bty = 'l', lwd = 2, ylab = 'True signal', xlab = 'Time') plot(true_signal ~ productivity, pch = 16, bty = 'l', ylab = 'True signal', xlab = 'Productivity') set.seed(543210) sim_series = function(n_series = 3, true_signal){ temp_effects <- gamSim(n = 100, eg = 7, scale = 0.1) temperature <- temp_effects$y alphas <- rnorm(n_series, sd = 2) do.call(rbind, lapply(seq_len(n_series), function(series){ data.frame(observed = rnorm(length(true_signal), mean = alphas[series] + 1.5*as.vector(scale(temp_effects[, series + 1])) + true_signal, sd = runif(1, 1, 2)), series = paste0('sensor_', series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal) })) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ## Gu & Wahba 4 term additive model, correlated predictors plot_mvgam_series(data = model_dat, y = 'observed', series = 'all') plot(observed ~ temperature, data = model_dat %>% dplyr::filter(series == 'sensor_1'), pch = 16, bty = 'l', ylab = 'Sensor 1', xlab = 'Temperature') plot(observed ~ temperature, data = model_dat %>% dplyr::filter(series == 'sensor_2'), pch = 16, bty = 'l', ylab = 'Sensor 2', xlab = 'Temperature') plot(observed ~ temperature, data = model_dat %>% dplyr::filter(series == 'sensor_3'), pch = 16, bty = 'l', ylab = 'Sensor 3', xlab = 'Temperature')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"the-shared-signal-model","dir":"Articles","previous_headings":"Example: signal detection","what":"The shared signal model","title":"Shared latent states in mvgam","text":"Now can formulate fit model allows sensor’s observation error depend nonlinearly temperature allowing true signal depend nonlinearly productivity. fixing values trend column 1 trend_map, assuming observation sensors tracking latent signal. use informative priors two variance components (process error observation error), reflect prior belief observation error smaller overall true process error View reduced version model summary many spline coefficients model","code":"mod <- mvgam(formula = # formula for observations, allowing for different # intercepts and hierarchical smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = 'sz', k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8), trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation 'AR1', trend_map = # trend_map forces all sensors to track the same # latent signal data.frame(series = unique(model_dat$series), trend = c(1, 1, 1)), # informative priors on process error # and observation error will help with convergence priors = c(prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs)), # Gaussian observations family = gaussian(), data = model_dat) summary(mod, include_betas = FALSE) ## GAM observation formula: ## observed ~ series + s(temperature, k = 10) + s(series, temperature, ## bs = \"sz\", k = 8) ## ## GAM process formula: ## ~s(productivity, k = 8) ## ## Family: ## gaussian ## ## Link function: ## identity ## ## Trend model: ## AR1 ## ## N process models: ## 1 ## ## N series: ## 3 ## ## N timepoints: ## 100 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1100; warmup = 600; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma_obs[1] 1.6 1.9 2.2 1 1757 ## sigma_obs[2] 1.4 1.7 2.0 1 1090 ## sigma_obs[3] 1.3 1.5 1.8 1 1339 ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 0.72 1.70 2.50 1.01 360 ## seriessensor_2 -2.10 -0.96 0.32 1.00 1068 ## seriessensor_3 -3.40 -2.00 -0.39 1.00 1154 ## ## Approximate significance of GAM observation smooths: ## edf F p-value ## s(temperature) 1.22 12.66 < 2e-16 *** ## s(series,temperature) 1.92 0.95 6.9e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Process model AR parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## ar1[1] 0.33 0.59 0.83 1 492 ## ## Process error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma[1] 0.72 1 1.3 1.01 392 ## ## Approximate significance of GAM process smooths: ## edf F p-value ## s(productivity) 3.6 8.31 5.1e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhats above 1.05 found for 28 parameters ## *Diagnose further to investigate why the chains have not mixed ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:57:49 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"inspecting-effects-on-both-process-and-observation-models","dir":"Articles","previous_headings":"Example: signal detection","what":"Inspecting effects on both process and observation models","title":"Shared latent states in mvgam","text":"Don’t pay much attention approximate p-values smooth terms. calculation values incredibly sensitive estimates smoothing parameters don’t tend find meaningful. meaningful, however, prediction-based plots smooth functions. example, estimated response underlying signal productivity: estimated relationships sensor observations temperature covariate: main effects can quickly plotted conditional_effects: conditional_effects simply wrapper flexible plot_predictions function marginaleffects package. can get useful plots effects using function customisation: successfully estimated effects, nonlinear, impact hidden process observations. single joint model. can always challenges models, particularly estimating process observation error time. example, pairs plot observation error sensor 1 hidden process error shows strong correlations might want deal using structured prior: leave model -example","code":"plot(mod, type = 'smooths', trend_effects = TRUE) plot(mod, type = 'smooths') plot(conditional_effects(mod, type = 'link'), ask = FALSE) plot_predictions(mod, condition = c('temperature', 'series', 'series'), points = 0.5) + theme(legend.position = 'none') pairs(mod, variable = c('sigma[1]', 'sigma_obs[1]'))"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"recovering-the-hidden-signal","dir":"Articles","previous_headings":"Example: signal detection","what":"Recovering the hidden signal","title":"Shared latent states in mvgam","text":"final key question whether can successfully recover true hidden signal. trend slot returned model parameters estimates signal, can easily plot using mvgam S3 method plot. can also overlay true values hidden signal, shows model done good job recovering :","code":"plot(mod, type = 'trend') # Overlay the true simulated signal points(true_signal, pch = 16, cex = 1, col = 'white') points(true_signal, pch = 16, cex = 0.8)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/shared_states.html","id":"further-reading","dir":"Articles","previous_headings":"","what":"Further reading","title":"Shared latent states in mvgam","text":"following papers resources offer lot useful material types State-Space models can applied practice: Holmes, Elizabeth E., Eric J. Ward, Wills Kellie. “MARSS: multivariate autoregressive state-space models analyzing time-series data.” R Journal. 4.1 (2012): 11. Ward, Eric J., et al. “Inferring spatial structure time‐series data: using multivariate state‐space models detect metapopulation structure California sea lions Gulf California, Mexico.” Journal Applied Ecology 47.1 (2010): 47-56. Auger‐Méthé, Marie, et al. “guide state–space modeling ecological time series.” Ecological Monographs 91.4 (2021): e01470.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"time-varying-effects","dir":"Articles","previous_headings":"","what":"Time-varying effects","title":"Time-varying effects in mvgam","text":"Dynamic fixed-effect coefficients (often referred dynamic linear models) can readily incorporated GAMs / DGAMs. mvgam, dynamic() formula wrapper offers convenient interface set . plan incorporate range dynamic options (random walk, AR1 etc…) moment low-rank Gaussian Process (GP) smooths allowed (making use either gp basis mgcv Hilbert space approximate GPs). advantageous splines random walk effects several reasons. First, GPs force time-varying effect smooth. often makes sense reality, expect regression coefficient change rapidly one time point next. Second, GPs provide information ‘global’ dynamics time-varying effect length-scale parameters. means can use provide accurate forecasts effect expected change future, something couldn’t well used splines estimate effect. example illustrates.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"simulating-time-varying-effects","dir":"Articles","previous_headings":"Time-varying effects","what":"Simulating time-varying effects","title":"Time-varying effects in mvgam","text":"Simulate time-varying coefficient using squared exponential Gaussian Process function length scale \\(\\rho\\)=10. using internal function mvgam (sim_gp function): plot time-varying coefficient shows changes smoothly time: Next need simulate values covariate, call temp (represent \\(temperature\\)). case just use standard normal distribution simulate covariate: Finally, simulate outcome variable, Gaussian observation process (observation error) time-varying effect \\(temperature\\) Gather data data.frame fitting models, split data training testing folds. Plot series","code":"set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N) + 0.5 plot(beta_temp, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred') box(bty = 'l', lwd = 2) temp <- rnorm(N, sd = 1) out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25) time <- seq_along(temp) plot(out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred') box(bty = 'l', lwd = 2) data <- data.frame(out, temp, time) data_train <- data[1:190,] data_test <- data[191:200,] plot_mvgam_series(data = data_train, newdata = data_test, y = 'out')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"the-dynamic-function","dir":"Articles","previous_headings":"Time-varying effects","what":"The dynamic() function","title":"Time-varying effects in mvgam","text":"Time-varying coefficients can fairly easily set using s() gp() wrapper functions mvgam formulae fitting nonlinear effect time using covariate interest numeric variable (see ?mgcv::s ?brms::gp details). dynamic() formula wrapper offers way automate process, eventually allow broader variety time-varying effects (random walk AR processes). Depending arguments specified dynamic, either set low-rank GP smooth function using s() bs = 'gp' fixed value length scale parameter \\(\\rho\\), set Hilbert space approximate GP using gp() function c=5/4 \\(\\rho\\) estimated (see ?dynamic details). first example use s() option, mis-specify \\(\\rho\\) parameter , practice, never known. call dynamic() set following smooth: s(time, = temp, bs = \"gp\", m = c(-2, 8, 2), k = 40) Inspect model summary, shows dynamic() wrapper used construct low-rank Gaussian Process smooth function: model used spline gp basis, ’s smooths can visualised just like gam. Plot estimated time-varying coefficient -sample training period can also plot estimates -sample --sample periods see Gaussian Process function produces sensible smooth forecasts. supply full dataset newdata argument plot_mvgam_smooth inspect posterior forecasts time-varying smooth function. Overlay true simulated function see model adequately estimated ’s dynamics training testing data partitions can also use plot_predictions marginaleffects package visualise time-varying coefficient effect estimated different values \\(temperature\\): results sensible forecasts observations well syntax similar wish estimate parameters underlying Gaussian Process, time using Hilbert space approximation. simply omit rho argument dynamic make happen. set call similar gp(time, = 'temp', c = 5/4, k = 40). model summary now contains estimates marginal deviation length scale parameters underlying Gaussian Process function: Effects gp() terms can also plotted smooths: plot plot_predictions() call show effect case similar estimated approximate GP smooth model : Forecasts also similar:","code":"mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train) summary(mod, include_betas = FALSE) ## GAM formula: ## out ~ s(time, by = temp, bs = \"gp\", m = c(-2, 8, 2), k = 40) ## ## Family: ## gaussian ## ## Link function: ## identity ## ## Trend model: ## None ## ## N series: ## 1 ## ## N timepoints: ## 190 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma_obs[1] 0.23 0.25 0.28 1 2222 ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 4 4 4.1 1 2893 ## ## Approximate significance of GAM observation smooths: ## edf F p-value ## s(time):temp 14 55.2 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:58:50 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(mod, type = 'smooths') plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) lines(beta_temp, lwd = 2.5, col = 'white') lines(beta_temp, lwd = 2) range_round = function(x){ round(range(x, na.rm = TRUE), 2) } plot_predictions(mod, newdata = datagrid(time = unique, temp = range_round), by = c('time', 'temp', 'temp'), type = 'link') plot(mod, type = 'forecast', newdata = data_test) ## Out of sample CRPS: ## [1] 1.280347 mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train) summary(mod, include_betas = FALSE) ## GAM formula: ## out ~ gp(time, by = temp, c = 5/4, k = 40, scale = TRUE) ## ## Family: ## gaussian ## ## Link function: ## identity ## ## Trend model: ## None ## ## N series: ## 1 ## ## N timepoints: ## 190 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma_obs[1] 0.24 0.26 0.29 1 2151 ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 4 4 4.1 1 2989 ## ## GAM gp term marginal deviation (alpha) and length scale (rho) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## alpha_gp(time):temp 0.640 0.890 1.400 1.01 745 ## rho_gp(time):temp 0.028 0.053 0.069 1.00 888 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 1 of 2000 iterations ended with a divergence (0.05%) ## *Try running with larger adapt_delta to remove the divergences ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 3:59:45 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) lines(beta_temp, lwd = 2.5, col = 'white') lines(beta_temp, lwd = 2) plot_predictions(mod, newdata = datagrid(time = unique, temp = range_round), by = c('time', 'temp', 'temp'), type = 'link') plot(mod, type = 'forecast', newdata = data_test) ## Out of sample CRPS: ## [1] 1.667521"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"salmon-survival-example","dir":"Articles","previous_headings":"","what":"Salmon survival example","title":"Time-varying effects in mvgam","text":"use openly available data marine survival Chinook salmon illustrate time-varying effects can used improve ecological time series models. Scheuerell Williams (2005) used dynamic linear model examine relationship marine survival Chinook salmon index ocean upwelling strength along west coast USA. authors hypothesized stronger upwelling April create better growing conditions phytoplankton, translate zooplankton provide better foraging opportunities juvenile salmon entering ocean. data survival measured proportional variable 42 years (1964–2005) available MARSS package: First need prepare data modelling. variable CUI.apr standardized make easier sampler estimate underlying GP parameters time-varying effect. also need convert survival back proportion, current form logit-transformed (time series packages handle proportional data). usual, also need create time indicator series indicator working mvgam: Inspect data Plot features outcome variable, shows proportional variable particular restrictions want model:","code":"load(url('https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda')) dplyr::glimpse(SalmonSurvCUI) ## Rows: 42 ## Columns: 3 ## $ year 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 19… ## $ logit.s -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82,… ## $ CUI.apr 57, 5, 43, 11, 47, -21, 25, -2, -1, 43, 2, 35, 0, 1, -1, 6, -7… SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor('salmon')) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data dplyr::glimpse(model_data) ## Rows: 42 ## Columns: 6 ## $ year 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1… ## $ logit.s -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82… ## $ CUI.apr 2.37949804, 0.03330223, 1.74782994, 0.30401713, 1.92830654, -… ## $ time 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18… ## $ series salmon, salmon, salmon, salmon, salmon, salmon, salmon, salmo… ## $ survival 0.030472033, 0.034891409, 0.027119717, 0.046088827, 0.0263393… plot_mvgam_series(data = model_data, y = 'survival')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"a-state-space-beta-regression","dir":"Articles","previous_headings":"Salmon survival example","what":"A State-Space Beta regression","title":"Time-varying effects in mvgam","text":"mvgam can easily handle data bounded 0 1 Beta observation model (using mgcv function betar(), see ?mgcv::betar details). First fit simple State-Space model uses Random Walk dynamic process model predictors Beta observation model: summary model shows good behaviour Hamiltonian Monte Carlo sampler provides useful summaries Beta observation model parameters: plot underlying dynamic component shows easily handled temporal evolution time series: Posterior hindcasts also good automatically respect observational data bounding 0 1:","code":"mod0 <- mvgam(formula = survival ~ 1, trend_model = 'RW', family = betar(), data = model_data) summary(mod0) ## GAM formula: ## survival ~ 1 ## ## Family: ## beta ## ## Link function: ## logit ## ## Trend model: ## RW ## ## N series: ## 1 ## ## N timepoints: ## 42 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation precision parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## phi[1] 160 310 580 1.01 612 ## ## GAM coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) -4.2 -3.4 -2.4 1.02 125 ## ## Latent trend variance estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma[1] 0.18 0.33 0.55 1.02 276 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhat looks reasonable for all parameters ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 4:00:41 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(mod0, type = 'trend') plot(mod0, type = 'forecast')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"including-time-varying-upwelling-effects","dir":"Articles","previous_headings":"Salmon survival example","what":"Including time-varying upwelling effects","title":"Time-varying effects in mvgam","text":"Now can increase complexity model constructing fitting State-Space model time-varying effect coastal upwelling index addition autoregressive dynamics. use Beta observation model capture restrictions proportional observations, time include dynamic() effect CUI.apr latent process model. specify \\(\\rho\\) parameter, instead opting estimate using Hilbert space approximate GP: summary model now includes estimates time-varying GP parameters: estimates underlying dynamic process, hindcasts, haven’t changed much: process error parameter \\(\\sigma\\) slightly smaller model first model: process error need flexible second model? estimates dynamic process now informed partly time-varying effect upwelling, can visualise link scale using plot() trend_effects = TRUE: outcome scale, range possible CUI.apr values, using plot_predictions():","code":"mod1 <- mvgam(formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE), trend_model = 'RW', family = betar(), data = model_data) summary(mod1, include_betas = FALSE) ## GAM observation formula: ## survival ~ 1 ## ## GAM process formula: ## ~dynamic(CUI.apr, k = 25, scale = FALSE) ## ## Family: ## beta ## ## Link function: ## logit ## ## Trend model: ## RW ## ## N process models: ## 1 ## ## N series: ## 1 ## ## N timepoints: ## 42 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1000; warmup = 500; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation precision parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## phi[1] 190 360 670 1 858 ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) -4.1 -3.2 -2.2 1.07 64 ## ## Process error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma[1] 0.18 0.31 0.51 1.02 274 ## ## GAM process model gp term marginal deviation (alpha) and length scale (rho) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## alpha_gp_time_byCUI_apr_trend 0.028 0.32 1.5 1.02 205 ## rho_gp_time_byCUI_apr_trend 1.400 6.50 40.0 1.02 236 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhats above 1.05 found for 30 parameters ## *Diagnose further to investigate why the chains have not mixed ## 89 of 2000 iterations ended with a divergence (4.45%) ## *Try running with larger adapt_delta to remove the divergences ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## E-FMI indicated no pathological behavior ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 4:01:44 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(mod1, type = 'trend') plot(mod1, type = 'forecast') # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = 'sigma', regex = TRUE) %>% dplyr::mutate(model = 'Mod0') mod1_sigma <- as.data.frame(mod1, variable = 'sigma', regex = TRUE) %>% dplyr::mutate(model = 'Mod1') sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 library(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() plot(mod1, type = 'smooth', trend_effects = TRUE) plot_predictions(mod1, newdata = datagrid(CUI.apr = range_round, time = unique), by = c('time', 'CUI.apr', 'CUI.apr'))"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"comparing-model-predictive-performances","dir":"Articles","previous_headings":"Salmon survival example","what":"Comparing model predictive performances","title":"Time-varying effects in mvgam","text":"key question fitting multiple time series models whether one provides better predictions . several options mvgam exploring quantitatively. First, can compare models based -sample approximate leave-one-cross-validation implemented popular loo package: second model larger Expected Log Predictive Density (ELPD), meaning slightly favoured simpler model include time-varying upwelling effect. However, two models certainly differ much. metric compares -sample performance, hoping use models produce reasonable forecasts. Luckily, mvgam also routines comparing models using approximate leave-future-cross-validation. refit models reduced training set (starting time point 30) produce approximate 1-step ahead forecasts. forecasts used estimate forecast ELPD expanding training set one time point time. use Pareto-smoothed importance sampling reweight posterior predictions, acting kind particle filter don’t need refit model often (can read process works Bürkner et al. 2020). model time-varying upwelling effect tends provides better 1-step ahead forecasts, higher total forecast ELPD can also plot ELPDs model contrast. , values less zero suggest time-varying predictor model (Mod1) gives better 1-step ahead forecasts: useful exercise expand model think kinds predictors might impact measurement error, easily implemented observation formula mvgam. now, leave model -.","code":"loo_compare(mod0, mod1) ## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details. ## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details. ## elpd_diff se_diff ## mod1 0.0 0.0 ## mod0 -2.3 1.6 lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) sum(lfo_mod0$elpds) ## [1] 34.73176 sum(lfo_mod1$elpds) ## [1] 36.05325 plot(x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = 'ELPDmod0 - ELPDmod1', xlab = 'Evaluation time point', pch = 16, col = 'darkred', bty = 'l') abline(h = 0, lty = 'dashed')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html","id":"further-reading","dir":"Articles","previous_headings":"","what":"Further reading","title":"Time-varying effects in mvgam","text":"following papers resources offer lot useful material dynamic linear models can applied / evaluated practice: Bürkner, PC, Gabry, J Vehtari, Approximate leave-future-cross-validation Bayesian time series models. Journal Statistical Computation Simulation. 90:14 (2020) 2499-2523. Herrero, Asier, et al. individual landscape back: time‐varying effects climate herbivory tree sapling growth distribution limits. Journal Ecology 104.2 (2016): 430-442. Holmes, Elizabeth E., Eric J. Ward, Wills Kellie. “MARSS: multivariate autoregressive state-space models analyzing time-series data.” R Journal. 4.1 (2012): 11. Scheuerell, Mark D., John G. Williams. Forecasting climate induced changes survival Snake River Spring/Summer Chinook Salmon (Oncorhynchus Tshawytscha) Fisheries Oceanography 14 (2005): 448–57.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"state-space-models","dir":"Articles","previous_headings":"","what":"State-Space Models","title":"State-Space models in mvgam","text":"State-Space models allow us separately make inferences underlying dynamic process model interested (.e. evolution time series collection time series) observation model (.e. way survey / measure underlying process). extremely useful ecology observations always imperfect / noisy measurements thing interested measuring. also helpful often know covariates impact ability measure accurately (.e. take accurate counts rodents thunderstorm happening) covariate impact underlying process (highly unlikely rodent abundance responds one storm, instead probably responds longer-term weather climate variation). State-Space model allows us model components single unified modelling framework. major advantage mvgam can include nonlinear effects random effects model components also capturing dynamic processes.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"lake-washington-plankton-data","dir":"Articles","previous_headings":"State-Space Models","what":"Lake Washington plankton data","title":"State-Space models in mvgam","text":"data use illustrate can fit State-Space models mvgam long-term monitoring study plankton counts (cells per mL) taken Lake Washington Washington, USA. data available part MARSS package can downloaded using following: work five different groups plankton: usual, preparing data correct format mvgam modelling takes little bit wrangling dplyr: Inspect data structure Note z-scored counts example make easier specify priors (though completely necessary; often better build model respects properties actual outcome variables) always helpful check data NAs attempting models: missing observations, isn’t issue modelling mvgam. useful property understand counts tend highly seasonal. plots z-scored counts z-scored temperature measurements lake month: try capture seasonality process model, easy given flexibility GAMs. Next split data training testing splits: Now time fit models. requires bit thinking can best tackle seasonal variation likely dependence structure data. algae interacting part complex system within lake, certainly expect lagged cross-dependencies underling dynamics. capture seasonal variation, multivariate dynamic model forced try capture , lead poor convergence unstable results (feasibly capture cyclic dynamics complex multi-species Lotka-Volterra model, ordinary differential equation approaches beyond scope mvgam).","code":"load(url('https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda')) outcomes <- c('Greens', 'Bluegreens', 'Diatoms', 'Unicells', 'Other.algae') # loop across each plankton group to create the long datframe plankton_data <- do.call(rbind, lapply(outcomes, function(x){ # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame(year = lakeWAplanktonTrans[, 'Year'], month = lakeWAplanktonTrans[, 'Month'], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, 'Temp'])})) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() head(plankton_data) ## # A tibble: 6 × 6 ## year month y series temp time ## ## 1 1965 1 -0.542 Greens -1.23 1 ## 2 1965 1 -0.344 Bluegreens -1.23 1 ## 3 1965 1 -0.0768 Diatoms -1.23 1 ## 4 1965 1 -1.52 Unicells -1.23 1 ## 5 1965 1 -0.491 Other.algae -1.23 1 ## 6 1965 2 NA Greens -1.32 2 dplyr::glimpse(plankton_data) ## Rows: 600 ## Columns: 6 ## $ year 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196… ## $ month 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, … ## $ y -0.54241769, -0.34410776, -0.07684901, -1.52243490, -0.49055442… ## $ series Greens, Bluegreens, Diatoms, Unicells, Other.algae, Greens, Blu… ## $ temp -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.… ## $ time 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, … plot_mvgam_series(data = plankton_data, series = 'all') image(is.na(t(plankton_data)), axes = F, col = c('grey80', 'darkred')) axis(3, at = seq(0,1, len = NCOL(plankton_data)), labels = colnames(plankton_data)) plankton_data %>% dplyr::filter(series == 'Other.algae') %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = 'white', size = 1.3) + geom_line(aes(y = y), col = 'darkred', size = 1.1) + ylab('z-score') + xlab('Time') + ggtitle('Temperature (black) vs Other algae (red)') ## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. ## ℹ Please use `linewidth` instead. ## This warning is displayed once every 8 hours. ## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was ## generated. plankton_data %>% dplyr::filter(series == 'Diatoms') %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = 'white', size = 1.3) + geom_line(aes(y = y), col = 'darkred', size = 1.1) + ylab('z-score') + xlab('Time') + ggtitle('Temperature (black) vs Diatoms (red)') plankton_data %>% dplyr::filter(series == 'Greens') %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = 'white', size = 1.3) + geom_line(aes(y = y), col = 'darkred', size = 1.1) + ylab('z-score') + xlab('Time') + ggtitle('Temperature (black) vs Greens (red)') plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"capturing-seasonality","dir":"Articles","previous_headings":"State-Space Models","what":"Capturing seasonality","title":"State-Space models in mvgam","text":"First fit model include dynamic component, just see can reproduce seasonal variation observations. model introduces hierarchical multidimensional smooths, time series share “global” tensor product month temp variables, capturing expectation algal seasonality responds temperature variation. response depend year temperatures recorded (.e. response warm temperatures Spring different response warm temperatures Autumn). model also fits series-specific deviation smooths (.e. one tensor product per series) capture algal group’s seasonality differs overall “global” seasonality. Note include series-specific intercepts model series z-scored mean 0. “global” tensor product smooth function can quickly visualized: plot, red indicates -average linear predictors white indicates -average. can plot deviation smooths algal group see vary “global” pattern: multidimensional smooths done good job capturing seasonal variation observations: basic model gives us confidence can capture seasonal variation observations. model captured remaining temporal dynamics, obvious inspect Dunn-Smyth residuals series:","code":"notrend_mod <- mvgam(y ~ # tensor of temp and month to capture # \"global\" seasonality te(temp, month, k = c(4, 4)) + # series-specific deviation tensor products te(temp, month, k = c(4, 4), by = series), family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = 'None') plot_mvgam_smooth(notrend_mod, smooth = 1) plot_mvgam_smooth(notrend_mod, smooth = 2) plot_mvgam_smooth(notrend_mod, smooth = 3) plot_mvgam_smooth(notrend_mod, smooth = 4) plot_mvgam_smooth(notrend_mod, smooth = 5) plot_mvgam_smooth(notrend_mod, smooth = 6) plot(notrend_mod, type = 'forecast', series = 1) ## Out of sample CRPS: ## [1] 6.808547 plot(notrend_mod, type = 'forecast', series = 2) ## Out of sample CRPS: ## [1] 6.747564 plot(notrend_mod, type = 'forecast', series = 3) ## Out of sample CRPS: ## [1] 4.123851 plot(notrend_mod, type = 'forecast', series = 4) ## Out of sample CRPS: ## [1] 3.597626 plot(notrend_mod, type = 'forecast', series = 5) ## Out of sample CRPS: ## [1] 2.838391 plot(notrend_mod, type = 'residuals', series = 1) plot(notrend_mod, type = 'residuals', series = 2) plot(notrend_mod, type = 'residuals', series = 3) plot(notrend_mod, type = 'residuals', series = 4) plot(notrend_mod, type = 'residuals', series = 5)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"multiseries-dynamics","dir":"Articles","previous_headings":"State-Space Models","what":"Multiseries dynamics","title":"State-Space models in mvgam","text":"Now time get multivariate State-Space models. fit two models can incorporate lagged cross-dependencies latent process models. first model assumes process errors operate independently one another, second assumes may contemporaneous correlations process errors. models include Vector Autoregressive component process means, can model complex community dynamics. models can described mathematically follows: \\[\\begin{align*} \\boldsymbol{count}_t & \\sim \\text{Normal}(\\mu_{obs[t]}, \\sigma_{obs}) \\\\ \\mu_{obs[t]} & = process_t \\\\ process_t & \\sim \\text{MVNormal}(\\mu_{process[t]}, \\Sigma_{process}) \\\\ \\mu_{process[t]} & = VAR * process_{t-1} + f_{global}(\\boldsymbol{month},\\boldsymbol{temp})_t + f_{series}(\\boldsymbol{month},\\boldsymbol{temp})_t \\\\ f_{global}(\\boldsymbol{month},\\boldsymbol{temp}) & = \\sum_{k=1}^{K}b_{global} * \\beta_{global} \\\\ f_{series}(\\boldsymbol{month},\\boldsymbol{temp}) & = \\sum_{k=1}^{K}b_{series} * \\beta_{series} \\end{align*}\\] can see terms observation model apart underlying process model. easily add covariates observation model felt explain systematic observation errors. also assume independent observation processes (covariance structure observation errors \\(\\sigma_{obs}\\)). present, mvgam support multivariate observation models. feature added future versions. However underlying process model multivariate, lot going . component Vector Autoregressive part, process mean time \\(t\\) \\((\\mu_{process[t]})\\) vector evolves function vector-valued process model time \\(t-1\\). \\(VAR\\) matrix captures dynamics self-dependencies diagonal possibly asymmetric cross-dependencies -diagonals, also incorporating nonlinear smooth functions capture seasonality series. contemporaneous process errors modeled \\(\\Sigma_{process}\\), can constrained process errors independent (.e. setting -diagonals 0) can fully parameterized using Cholesky decomposition (using Stan’s \\(LKJcorr\\) distribution place prior strength inter-species correlations). interested inner-workings, mvgam makes use recent breakthrough Sarah Heaps enforce stationarity Bayesian VAR processes. advantageous often don’t expect forecast variance increase without bound forever future, many estimated VARs tend behave way. Ok lot take . Let’s fit models try inspect going assume. first, need update mvgam’s default priors observation process errors. default, mvgam uses fairly wide Student-T prior parameters avoid overly informative. observations z-scored expect large process observation errors. However, also expect small observation errors either know measurements perfect. let’s update priors parameters. , get see formula latent process (.e. trend) model used mvgam: Get names parameters whose priors can modified: default prior distributions: Setting priors easy mvgam can use brms routines. use informative Normal priors error components, impose lower bound 0.2 observation errors: may noticed something else unique model: intercept term observation formula. shared intercept parameter can sometimes unidentifiable respect latent VAR process, particularly series similar long-run averages (case z-scored). often get better convergence State-Space models drop parameter. mvgam accomplishes fixing coefficient intercept zero. Now can fit first model, assumes process errors contemporaneously uncorrelated","code":"priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend), # VAR1 model with uncorrelated process errors trend_model = 'VAR1', family = gaussian(), data = plankton_train) priors[, 3] ## [1] \"(Intercept)\" ## [2] \"process error sd\" ## [3] \"diagonal autocorrelation population mean\" ## [4] \"off-diagonal autocorrelation population mean\" ## [5] \"diagonal autocorrelation population variance\" ## [6] \"off-diagonal autocorrelation population variance\" ## [7] \"shape1 for diagonal autocorrelation precision\" ## [8] \"shape1 for off-diagonal autocorrelation precision\" ## [9] \"shape2 for diagonal autocorrelation precision\" ## [10] \"shape2 for off-diagonal autocorrelation precision\" ## [11] \"observation error sd\" ## [12] \"te(temp,month) smooth parameters, te(temp,month):trendtrend1 smooth parameters, te(temp,month):trendtrend2 smooth parameters, te(temp,month):trendtrend3 smooth parameters, te(temp,month):trendtrend4 smooth parameters, te(temp,month):trendtrend5 smooth parameters\" priors[, 4] ## [1] \"(Intercept) ~ student_t(3, -0.1, 2.5);\" ## [2] \"sigma ~ student_t(3, 0, 2.5);\" ## [3] \"es[1] = 0;\" ## [4] \"es[2] = 0;\" ## [5] \"fs[1] = sqrt(0.455);\" ## [6] \"fs[2] = sqrt(0.455);\" ## [7] \"gs[1] = 1.365;\" ## [8] \"gs[2] = 1.365;\" ## [9] \"hs[1] = 0.071175;\" ## [10] \"hs[2] = 0.071175;\" ## [11] \"sigma_obs ~ student_t(3, 0, 2.5);\" ## [12] \"lambda_trend ~ normal(5, 30);\" priors <- c(prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma)) var_mod <- mvgam( # observation formula, which is empty y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend), # VAR1 model with uncorrelated process errors trend_model = 'VAR1', family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors)"},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"inspecting-ss-models","dir":"Articles","previous_headings":"State-Space Models","what":"Inspecting SS models","title":"State-Space models in mvgam","text":"model’s summary bit different mvgam summaries. separates parameters based whether belong observation model latent process model. may often covariates impact observations latent process, can fairly complex models component. notice parameters fully converged, particularly VAR coefficients (called output) process errors (Sigma). Note set include_betas = FALSE stop summary printing output spline coefficients, can dense hard interpret: convergence model isn’t fabulous (moment). can plot smooth functions, time operate process model. can see plot using trend_effects = TRUE plotting functions: VAR matrix particular interest , captures lagged dependencies cross-dependencies latent process model: Unfortunately bayesplot doesn’t know matrix parameters see actually transpose VAR matrix. little bit wrangling gives us histograms correct order: lot happening matrix. cell captures lagged effect process column process row next timestep. example, effect cell [1,3], quite strongly negative, means increase process series 3 (Greens) time \\(t\\) expected lead subsequent decrease process series 1 (Bluegreens) time \\(t+1\\). latent process model now capturing effects smooth seasonal effects, trend plot shows best estimate true count time point: process error \\((\\Sigma)\\) captures unmodelled variation process models. , fixed -diagonals 0, histograms look like flat boxes: observation error estimates \\((\\sigma_{obs})\\) represent much model thinks might miss true count take imperfect measurements: still bit hard identify overall, especially trying estimate process observation error. Often need make strong assumptions important determining unexplained variation observations.","code":"summary(var_mod, include_betas = FALSE) ## GAM observation formula: ## y ~ 1 ## ## GAM process formula: ## ~te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), ## by = trend) ## ## Family: ## gaussian ## ## Link function: ## identity ## ## Trend model: ## VAR1 ## ## N process models: ## 5 ## ## N series: ## 5 ## ## N timepoints: ## 112 ## ## Status: ## Fitted using Stan ## 4 chains, each with iter = 1500; warmup = 1000; thin = 1 ## Total post-warmup draws = 2000 ## ## ## Observation error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## sigma_obs[1] 0.20 0.25 0.34 1.01 367 ## sigma_obs[2] 0.25 0.40 0.53 1.03 151 ## sigma_obs[3] 0.41 0.64 0.82 1.09 53 ## sigma_obs[4] 0.23 0.37 0.49 1.01 191 ## sigma_obs[5] 0.32 0.43 0.54 1.02 330 ## ## GAM observation model coefficient (beta) estimates: ## 2.5% 50% 97.5% Rhat n_eff ## (Intercept) 0 0 0 NaN NaN ## ## Process model VAR parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## A[1,1] -0.014 0.520 0.880 1.05 83 ## A[1,2] -0.340 -0.028 0.200 1.01 384 ## A[1,3] -0.490 -0.042 0.360 1.01 382 ## A[1,4] -0.300 0.029 0.390 1.01 412 ## A[1,5] -0.092 0.130 0.510 1.03 153 ## A[2,1] -0.150 0.019 0.230 1.01 522 ## A[2,2] 0.610 0.790 0.920 1.03 272 ## A[2,3] -0.390 -0.130 0.051 1.02 223 ## A[2,4] -0.038 0.110 0.350 1.03 254 ## A[2,5] -0.060 0.056 0.200 1.01 467 ## A[3,1] -0.240 0.022 0.600 1.05 66 ## A[3,2] -0.490 -0.200 0.027 1.03 193 ## A[3,3] 0.062 0.410 0.730 1.01 233 ## A[3,4] -0.030 0.230 0.610 1.03 206 ## A[3,5] -0.089 0.120 0.360 1.02 180 ## A[4,1] -0.130 0.057 0.400 1.01 342 ## A[4,2] -0.094 0.057 0.270 1.01 368 ## A[4,3] -0.410 -0.110 0.140 1.02 281 ## A[4,4] 0.470 0.740 0.950 1.01 333 ## A[4,5] -0.210 -0.039 0.110 1.01 513 ## A[5,1] -0.190 0.080 0.760 1.07 63 ## A[5,2] -0.410 -0.120 0.080 1.04 163 ## A[5,3] -0.620 -0.180 0.130 1.02 190 ## A[5,4] -0.050 0.190 0.590 1.04 134 ## A[5,5] 0.510 0.730 0.920 1.01 417 ## ## Process error parameter estimates: ## 2.5% 50% 97.5% Rhat n_eff ## Sigma[1,1] 0.040 0.26 0.66 1.11 46 ## Sigma[1,2] 0.000 0.00 0.00 NaN NaN ## Sigma[1,3] 0.000 0.00 0.00 NaN NaN ## Sigma[1,4] 0.000 0.00 0.00 NaN NaN ## Sigma[1,5] 0.000 0.00 0.00 NaN NaN ## Sigma[2,1] 0.000 0.00 0.00 NaN NaN ## Sigma[2,2] 0.064 0.11 0.18 1.00 528 ## Sigma[2,3] 0.000 0.00 0.00 NaN NaN ## Sigma[2,4] 0.000 0.00 0.00 NaN NaN ## Sigma[2,5] 0.000 0.00 0.00 NaN NaN ## Sigma[3,1] 0.000 0.00 0.00 NaN NaN ## Sigma[3,2] 0.000 0.00 0.00 NaN NaN ## Sigma[3,3] 0.062 0.16 0.29 1.04 115 ## Sigma[3,4] 0.000 0.00 0.00 NaN NaN ## Sigma[3,5] 0.000 0.00 0.00 NaN NaN ## Sigma[4,1] 0.000 0.00 0.00 NaN NaN ## Sigma[4,2] 0.000 0.00 0.00 NaN NaN ## Sigma[4,3] 0.000 0.00 0.00 NaN NaN ## Sigma[4,4] 0.054 0.13 0.26 1.02 209 ## Sigma[4,5] 0.000 0.00 0.00 NaN NaN ## Sigma[5,1] 0.000 0.00 0.00 NaN NaN ## Sigma[5,2] 0.000 0.00 0.00 NaN NaN ## Sigma[5,3] 0.000 0.00 0.00 NaN NaN ## Sigma[5,4] 0.000 0.00 0.00 NaN NaN ## Sigma[5,5] 0.100 0.21 0.35 1.01 310 ## ## Approximate significance of GAM process smooths: ## edf F p-value ## te(temp,month) 4.303 0.74 0.0780 . ## te(temp,month):seriestrend1 0.777 0.03 1.0000 ## te(temp,month):seriestrend2 1.773 0.11 0.9687 ## te(temp,month):seriestrend3 5.408 1.84 0.0026 ** ## te(temp,month):seriestrend4 1.290 0.24 0.8507 ## te(temp,month):seriestrend5 2.231 0.08 0.9907 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Stan MCMC diagnostics: ## n_eff / iter looks reasonable for all parameters ## Rhats above 1.05 found for 20 parameters ## *Diagnose further to investigate why the chains have not mixed ## 0 of 2000 iterations ended with a divergence (0%) ## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%) ## Chain 4: E-FMI = 0.1994 ## *E-FMI below 0.2 indicates you may need to reparameterize your model ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 29 4:07:53 PM 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split MCMC chains ## (at convergence, Rhat = 1) plot(var_mod, 'smooths', trend_effects = TRUE) mcmc_plot(var_mod, variable = 'A', regex = TRUE, type = 'hist') A_pars <- matrix(NA, nrow = 5, ncol = 5) for(i in 1:5){ for(j in 1:5){ A_pars[i, j] <- paste0('A[', i, ',', j, ']') } } mcmc_plot(var_mod, variable = as.vector(t(A_pars)), type = 'hist') plot(var_mod, type = 'trend', series = 1) plot(var_mod, type = 'trend', series = 3) Sigma_pars <- matrix(NA, nrow = 5, ncol = 5) for(i in 1:5){ for(j in 1:5){ Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']') } } mcmc_plot(var_mod, variable = as.vector(t(Sigma_pars)), type = 'hist') mcmc_plot(var_mod, variable = 'sigma_obs', regex = TRUE, type = 'hist')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"correlated-process-errors","dir":"Articles","previous_headings":"State-Space Models","what":"Correlated process errors","title":"State-Space models in mvgam","text":"Let’s see estimates improve allow process errors correlated. , need first update priors observation errors: now can fit correlated process error model Plot convergence diagnostics two models, shows models display similar levels convergence: \\((\\Sigma)\\) matrix now captures evidence contemporaneously correlated process error: symmetric matrix tells us support correlated process errors. example, series 1 3 (Bluegreens Greens) show negatively correlated process errors, series 1 4 (Bluegreens .algae) show positively correlated errors. easier interpret estimates convert covariance matrix correlation matrix. compute posterior median process error correlations: model able capture correlated errors, VAR matrix changed slightly: still evidence lagged cross-dependence, interactions now pulled toward zero. model better? Forecasts don’t appear differ much, least qualitatively (forecasts three series, model): can compute variogram score sample forecasts get sense model better job capturing dependence structure true evaluation set: can also compute energy score sample forecasts get sense model provides forecasts better calibrated: models tend provide similar forecasts, though correlated error model slightly better overall. probably need use extensive rolling forecast evaluation exercise felt like needed choose one production. mvgam offers utilities (.e. see ?lfo_cv guidance).","code":"priors <- c(prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma)) varcor_mod <- mvgam( # observation formula, which remains empty y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend), # VAR1 model with correlated process errors trend_model = 'VAR1cor', family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors) mcmc_plot(varcor_mod, type = 'rhat') + labs(title = 'VAR1cor') mcmc_plot(var_mod, type = 'rhat') + labs(title = 'VAR1') Sigma_pars <- matrix(NA, nrow = 5, ncol = 5) for(i in 1:5){ for(j in 1:5){ Sigma_pars[i, j] <- paste0('Sigma[', i, ',', j, ']') } } mcmc_plot(varcor_mod, variable = as.vector(t(Sigma_pars)), type = 'hist') Sigma_post <- as.matrix(varcor_mod, variable = 'Sigma', regex = TRUE) median_correlations <- cov2cor(matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5)) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ## Bluegreens Diatoms Greens Other.algae Unicells ## Bluegreens 1.00 -0.04 0.16 -0.03 0.34 ## Diatoms -0.04 1.00 -0.20 0.50 0.16 ## Greens 0.16 -0.20 1.00 0.19 0.48 ## Other.algae -0.03 0.50 0.19 1.00 0.29 ## Unicells 0.34 0.16 0.48 0.29 1.00 A_pars <- matrix(NA, nrow = 5, ncol = 5) for(i in 1:5){ for(j in 1:5){ A_pars[i, j] <- paste0('A[', i, ',', j, ']') } } mcmc_plot(varcor_mod, variable = as.vector(t(A_pars)), type = 'hist') plot(var_mod, type = 'forecast', series = 1, newdata = plankton_test) ## Out of sample CRPS: ## [1] 3.04799 plot(varcor_mod, type = 'forecast', series = 1, newdata = plankton_test) ## Out of sample CRPS: ## [1] 3.10641 plot(var_mod, type = 'forecast', series = 2, newdata = plankton_test) ## Out of sample CRPS: ## [1] 5.87983 plot(varcor_mod, type = 'forecast', series = 2, newdata = plankton_test) ## Out of sample CRPS: ## [1] 5.566324 plot(var_mod, type = 'forecast', series = 3, newdata = plankton_test) ## Out of sample CRPS: ## [1] 3.985545 plot(varcor_mod, type = 'forecast', series = 3, newdata = plankton_test) ## Out of sample CRPS: ## [1] 3.975588 # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = 'variogram')$all_series$score - score(fcvar, score = 'variogram')$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(variogram[VAR1cor]~-~variogram[VAR1])) abline(h = 0, lty = 'dashed') # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = 'energy')$all_series$score - score(fcvar, score = 'energy')$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(energy[VAR1cor]~-~energy[VAR1])) abline(h = 0, lty = 'dashed')"},{"path":"https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html","id":"further-reading","dir":"Articles","previous_headings":"State-Space Models","what":"Further reading","title":"State-Space models in mvgam","text":"following papers resources offer lot useful material multivariate State-Space models can applied practice: Heaps, Sarah E. “Enforcing stationarity prior vector autoregressions.” Journal Computational Graphical Statistics 32.1 (2023): 74-83. Hannaford, Naomi E., et al. “sparse Bayesian hierarchical vector autoregressive model microbial dynamics wastewater treatment plant.” Computational Statistics & Data Analysis 179 (2023): 107659. Holmes, Elizabeth E., Eric J. Ward, Wills Kellie. “MARSS: multivariate autoregressive state-space models analyzing time-series data.” R Journal. 4.1 (2012): 11. Ward, Eric J., et al. “Inferring spatial structure time‐series data: using multivariate state‐space models detect metapopulation structure California sea lions Gulf California, Mexico.” Journal Applied Ecology 47.1 (2010): 47-56. Auger‐Méthé, Marie, et al. “guide state–space modeling ecological time series.” Ecological Monographs 91.4 (2021): e01470.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Nicholas J Clark. Author, maintainer.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Nicholas J. Clark, Konstans Wells (2022). Dynamic Generalized Additive Models (DGAMs) forecasting discrete ecological time series Methods Ecology Evolution DOI: https://doi.org/10.1111/2041-210X.13974","code":"@Article{, title = {Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series}, author = {Nicholas J. Clark and Konstans Wells}, journal = {Methods in Ecology and Evolution}, year = {2022}, url = {https://doi.org/10.1111/2041-210X.13974}, }"},{"path":"https://nicholasjclark.github.io/mvgam/index.html","id":"mvgam","dir":"","previous_headings":"","what":"mvgam","title":"Multivariate (Dynamic) Generalized Additive Models","text":"MultiVariate (Dynamic) Generalized Addivite Models goal mvgam use Bayesian framework estimate parameters Dynamic Generalized Additive Models (DGAMs) time series dynamic trend components. package provides interface fit Bayesian DGAMs using either JAGS Stan backend, note users strongly encouraged opt Stan JAGS. formula syntax based package mgcv provide familiar GAM modelling interface. motivation package primary objectives described detail Clark & Wells 2022 (published Methods Ecology Evolution). introduction package worked examples also shown seminar: Ecological Forecasting Dynamic Generalized Additive Models.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Multivariate (Dynamic) Generalized Additive Models","text":"Install development version GitHub using: devtools::install_github(\"nicholasjclark/mvgam\"). Note actually condition models MCMC sampling, either JAGS software must installed (along R packages rjags runjags) Stan software must installed (along either rstan /cmdstanr). rstan listed dependency mvgam ensure installation less difficult. users wish fit models using mvgam, please refer installation links JAGS , Stan rstan , Stan cmdstandr . need fairly recent version Stan ensure model syntax recognized. see warnings variable \"array\" exist, usually sign need update version Stan. highly recommend use Cmdstan cmdstanr interface backend. Cmdstan easier install, date new features, uses less memory Rstan. See documentation Cmdstan team information.","code":""},{"path":[]},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/index.html","id":"getting-started","dir":"","previous_headings":"","what":"Getting started","title":"Multivariate (Dynamic) Generalized Additive Models","text":"mvgam originally designed analyse forecast non-negative integer-valued data (counts). data traditionally challenging analyse existing time-series analysis packages. development mvgam resulted support growing number observation families extend types data. Currently, package can handle data following families: gaussian() real-valued data student_t() heavy-tailed real-valued data lognormal() non-negative real-valued data Gamma() non-negative real-valued data betar() proportional data (0,1) poisson() count data nb() overdispersed count data nmix() count data imperfect detection tweedie() overdispersed count data Note poisson(), nb(), tweedie() available using JAGS. families, apart tweedie(), supported using Stan. See ??mvgam_families information. simple example simulating modelling proportional data Beta observations set seasonal series independent Gaussian Process dynamic trends: Plot series see evolve time Fit DGAM series uses hierarchical cyclic seasonal smooth term capture variation seasonality among series. model also includes series-specific latent Gaussian Processes squared exponential covariance functions capture temporal dynamics Plot estimated posterior hindcast forecast distributions series Various S3 functions can used inspect parameter estimates, plot smooth functions residuals, evaluate models posterior predictive checks forecast comparisons. Please see package documentation detailed examples.","code":"data <- sim_mvgam(family = betar(), T = 80, trend_model = 'GP', trend_rel = 0.5, seasonality = 'shared') plot_mvgam_series(data = data$data_train, series = 'all') mod <- mvgam(y ~ s(season, bs = 'cc', k = 7) + s(season, by = series, m = 1, k = 5), trend_model = 'GP', data = data$data_train, newdata = data$data_test, family = betar()) layout(matrix(1:4, nrow = 2, byrow = TRUE)) for(i in 1:3){ plot(mod, type = 'forecast', series = i) }"},{"path":"https://nicholasjclark.github.io/mvgam/index.html","id":"vignettes","dir":"","previous_headings":"","what":"Vignettes","title":"Multivariate (Dynamic) Generalized Additive Models","text":"can set build_vignettes = TRUE installing either devtools::install_github remotes::install_github, aware slow installation drastically. Instead, can always access vignette htmls online https://nicholasjclark.github.io/mvgam/articles/","code":""},{"path":"https://nicholasjclark.github.io/mvgam/index.html","id":"other-resources","dir":"","previous_headings":"","what":"Other resources","title":"Multivariate (Dynamic) Generalized Additive Models","text":"number case studies compiled highlight DGAMs can estimated using MCMC sampling: Ecological Forecasting Dynamic Generalized Additive Models mvgam case study 1: model comparison data assimilation mvgam case study 2: multivariate models mvgam case study 3: distributed lag models package can also used generate necessary data structures, initial value functions modelling code necessary fit DGAMs using Stan JAGS. can helpful users wish make changes model better suit bespoke research / analysis goals. following resources can helpful troubleshoot: Stan Discourse JAGS Discourse","code":""},{"path":"https://nicholasjclark.github.io/mvgam/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 Nicholas Clark Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/add_tweedie_lines.html","id":null,"dir":"Reference","previous_headings":"","what":"Tweedie JAGS modifications — add_tweedie_lines","title":"Tweedie JAGS modifications — add_tweedie_lines","text":"Tweedie JAGS modifications","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/add_tweedie_lines.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tweedie JAGS modifications — add_tweedie_lines","text":"","code":"add_tweedie_lines(model_file, upper_bounds)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/add_tweedie_lines.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tweedie JAGS modifications — add_tweedie_lines","text":"model_file template JAGS model file modified upper_bounds Optional upper bounds truncated observation likelihood","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/add_tweedie_lines.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tweedie JAGS modifications — add_tweedie_lines","text":"modified JAGS model file","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/all_neon_tick_data.html","id":null,"dir":"Reference","previous_headings":"","what":"NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data","title":"NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data","text":"dataset containing timeseries Amblyomma americanum Ixodes scapularis nymph abundances NEON sites","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/all_neon_tick_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data","text":"","code":"all_neon_tick_data"},{"path":"https://nicholasjclark.github.io/mvgam/reference/all_neon_tick_data.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data","text":"tibble/dataframe containing covariate information alongside main fields : Year Year sampling epiWeek Epidemiological week sampling plot_ID NEON plot ID survey location siteID NEON site ID survey location amblyomma_americanum Counts . americanum nymphs ixodes_scapularis Counts . scapularis nymphs","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/all_neon_tick_data.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"NEON Amblyomma and Ixodes tick abundance survey data — all_neon_tick_data","text":"https://www.neonscience.org/data","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/code.html","id":null,"dir":"Reference","previous_headings":"","what":"Print the model code from an mvgam object — code","title":"Print the model code from an mvgam object — code","text":"Print model code mvgam object","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/code.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print the model code from an mvgam object — code","text":"","code":"code(object)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/code.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print the model code from an mvgam object — code","text":"object list object returned mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/code.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Print the model code from an mvgam object — code","text":"character string containing model code tidy format","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Display Conditional Effects of Predictors — conditional_effects.mvgam","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"Display conditional effects one numeric /categorical predictors mvgam models, including two-way interaction effects.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"","code":"# S3 method for mvgam conditional_effects( x, effects = NULL, type = \"response\", points = TRUE, rug = TRUE, ... ) # S3 method for mvgam_conditional_effects plot(x, plot = TRUE, ask = FALSE, ...) # S3 method for mvgam_conditional_effects print(x, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"x Object class mvgam mvgam_conditional_effects effects optional character vector naming effects (main effects interactions) compute conditional plots. Interactions specified : variable names. NULL (default), plots generated main effects two-way interactions estimated model. specifying effects manually, two-way interactions (including grouping variables) may plotted even originally modeled. type character specifying scale predictions. value link (default) linear predictor calculated link scale. expected used, predictions reflect expectation response (mean) ignore uncertainty observation process. response used, predictions take uncertainty observation process account return predictions outcome scale. Two special cases also allowed: type latent_N return estimated latent abundances N-mixture distribution, type detection return estimated detection probability N-mixture distribution points Logical. Indicates original data points added, type == 'response'. Default TRUE. rug Logical. Indicates displays tick marks plotted axes mark distribution raw data, type == 'response'. Default TRUE. ... arguments pass plot_predictions plot Logical; indicates plots plotted directly active graphic device. Defaults TRUE. ask Logical. Indicates user prompted new page plotted. used plot TRUE. Default FALSE.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"conditional_effects returns object class mvgam_conditional_effects named list one slot per effect containing ggplot object, can customized using ggplot2 package. corresponding plot method draw plots active graphic device","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"function acts wrapper flexible plot_predictions. creating conditional_effects particular predictor (interaction two predictors), one choose values predictors condition . default, mean used continuous variables reference category used factors. Use plot_predictions change create bespoke conditional effects plots.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Display Conditional Effects of Predictors — conditional_effects.mvgam","text":"","code":"if (FALSE) { # Simulate some data simdat <- sim_mvgam(family = poisson(), seasonality = 'hierarchical') # Fit a model mod <- mvgam(y ~ s(season, by = series) + year:series, family = poisson(), data = simdat$data_train) # Plot all main effects on the response scale plot(conditional_effects(mod), ask = FALSE) # Change the prediction interval to 70% using plot_predictions() argument # 'conf_level' plot(conditional_effects(mod, conf_level = 0.7), ask = FALSE) # Plot all main effects on the link scale plot(conditional_effects(mod, type = 'link'), ask = FALSE) # Works the same for smooth terms set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian()) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":null,"dir":"Reference","previous_headings":"","what":"Defining dynamic coefficients in mvgam formulae — dynamic","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"Set time-varying (dynamic) coefficients use mvgam models. Currently, low-rank Gaussian Process smooths available estimating dynamics time-varying coefficient.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"","code":"dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"variable variable dynamic smooth function k Optional number basis functions computing approximate GPs. missing, k set large possible accurately estimate nonlinear function rho Either positive numeric stating length scale used approximating squared exponential Gaussian Process smooth (see gp.smooth details) missing, case length scale estimated setting Hilbert space approximate GP stationary Logical. TRUE (default) rho supplied, latent Gaussian Process smooth linear trend component. FALSE, linear trend covariate added Gaussian Process smooth. Leave TRUE believe coefficient evolving much trend, linear component basis functions can hard penalize zero. sometimes causes divergence issues Stan. See gp.smooth details. Ignored rho missing (case Hilbert space approximate GP used) scale Logical; TRUE (default) rho missing, predictors scaled maximum Euclidean distance two points 1. often improves sampling speed convergence. Scaling also affects estimated length-scale parameters resemble scaled predictors (original predictors) scale TRUE.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"mvgam currently sets dynamic coefficients low-rank squared exponential Gaussian Process smooths via call s(time, = variable, bs = \"gp\", m = c(2, rho, 2)). smooths, specified reasonable values length scale parameter, give realistic sample forecasts standard splines thin plate cubic. user must set value rho, currently support estimating value mgcv. may big problem, estimating latent length scales often difficult anyway. rho parameter thought prior smoothness latent dynamic coefficient function (higher values rho lead smoother functions temporal covariance structure. Values k set automatically ensure enough basis functions used approximate expected wiggliness underlying dynamic function (k increase rho decreases)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/dynamic.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Defining dynamic coefficients in mvgam formulae — dynamic","text":"","code":"if (FALSE) { # Simulate a time-varying coefficient \\ #(as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 beta <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N) + 0.5 plot(beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred') # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot(out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred') # Gather into a data.frame and fit a dynamic coefficient mmodel data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190,] data_test <- data[191:200,] # Fit a model using the dynamic function mod <- mvgam(out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":null,"dir":"Reference","previous_headings":"","what":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"Evaluate forecasts fitted mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"","code":"eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 2, score = \"drps\", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 2, score = \"drps\", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 2, score = \"drps\", log = FALSE, weights )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"object list object returned mvgam n_samples integer specifying number samples generate model's posterior distribution eval_timepoint integer indexing timepoint represents last 'observed' set outcome data fc_horizon integer specifying length forecast horizon evaluating forecasts n_cores integer specifying number cores generating particle forecasts parallel score character specifying type ranked probability score use evaluation. Options : variogram, drps crps log logical. forecasts truths logged prior scoring? often appropriate comparing performance models series vary observation ranges weights optional vector weights (length(weights) == n_series) weighting pairwise correlations evaluating variogram score multivariate forecasts. Useful -weighting series larger magnitude observations less interest forecasting. Ignored score != 'variogram' n_evaluations integer specifying total number evaluations perform evaluation_seq Optional integer sequence specifying exact set timepoints evaluating model's forecasts. sequence values <3 > max(training timepoints) - fc_horizon model1 list object returned mvgam representing first model evaluated model2 list object returned mvgam representing second model evaluated","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"eval_mvgam, list object containing information specific evaluations series (using drps crps score) vector scores using variogram. roll_eval_mvgam, list object containing information specific evaluations series well total evaluation summary (taken summing forecast score series evaluation averaging coverages evaluation) compare_mvgams, series plots comparing forecast Rank Probability Scores competing model. lower score preferred. Note however possible select model ultimately perform poorly true --sample forecasting. example wiggly smooth function 'year' included model function learned prior evaluating rolling window forecasts, model generate tight predictions result. forecasting ahead timepoints model seen (.e. next year), smooth function end extrapolating, sometimes strange unexpected ways. therefore recommended use smooth functions covariates adequately measured data (.e. 'seasonality', example) reduce possible extrapolation smooths let latent trends mvgam model capture temporal dependencies data. trends time series models provide much stable forecasts","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"eval_mvgam generates set samples representing fixed parameters estimated full mvgam model latent trend states given point time. trends rolled forward total fc_horizon timesteps according estimated state space dynamics generate '--sample' forecast evaluated true observations horizon window. function therefore simulates situation model's parameters already estimated observed data evaluation timepoint like generate forecasts latent trends observed timepoint. Evaluation involves calculating appropriate Rank Probability Score binary indicator whether true value lies within forecast's 90% prediction interval roll_eval_mvgam sets sequence evaluation timepoints along rolling window iteratively calls eval_mvgam evaluate '--sample' forecasts. Evaluation involves calculating Discrete Rank Probability Score binary indicator whether true value lies within forecast's 90% prediction interval compare_mvgams automates evaluation compare two fitted models using rolling window forecast evaluation provides series summary plots facilitate model selection. essentially wrapper roll_eval_mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Evaluate forecasts from fitted mvgam objects — evaluate_mvgams","text":"","code":"if (FALSE) { # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR2', family = poisson(), data = dat$data_train, newdata = dat$data_test) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'RW', family = poisson(), data = dat$data_train, newdata = dat$data_test) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams(mod_ar2, mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives # better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"method extracts posterior estimates fitted values (.e. actual predictions, included estimates trend states, obtained fitting model). also includes option obtaining summaries computed draws.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"","code":"# S3 method for mvgam fitted( object, process_error = TRUE, scale = c(\"response\", \"linear\"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"object object class mvgam process_error Logical. TRUE dynamic trend model fit, expected uncertainty process model accounted using draws latent trend SD parameters. FALSE, uncertainty latent trend component ignored calculating predictions scale Either \"response\" \"linear\". \"response\", results returned scale response variable. \"linear\", results returned scale linear predictor term, without applying inverse link function transformations. summary summary statistics returned instead raw values? Default TRUE.. robust FALSE (default) mean used measure central tendency standard deviation measure variability. TRUE, median median absolute deviation (MAD) applied instead. used summary TRUE. probs percentiles computed quantile function. used summary TRUE. ... arguments passed prepare_predictions control several aspects data validation prediction.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"array predicted mean response values. summary = FALSE output resembles posterior_epred.mvgam predict.mvgam. summary = TRUE output n_observations x E matrix. number summary statistics E equal 2 + length(probs): Estimate column contains point estimates (either mean median depending argument robust), Est.Error column contains uncertainty estimates (either standard deviation median absolute deviation depending argument robust). remaining columns starting Q contain quantile estimates specified via argument probs.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"method gives actual fitted values model (.e. see generate hindcasts fitted model using hindcast.mvgam type = 'expected'). predictions can overly precise flexible dynamic trend component included model. contrast set predict functions (.e. posterior_epred.mvgam predict.mvgam), assume dynamic trend component reached stationarity returning hypothetical predictions","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Expected Values of the Posterior Predictive Distribution — fitted.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"Extract compute hindcasts forecasts fitted mvgam object","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"","code":"forecast(object, ...) # S3 method for mvgam forecast( object, newdata, data_test, series = \"all\", n_cores = 1, type = \"response\", ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"object list object returned mvgam. See mvgam() ... Ignored newdata Optional dataframe list test data containing least 'series' 'time' addition variables included linear predictor original formula. included, covariate information newdata used generate forecasts fitted model equations. newdata originally included call mvgam, forecasts already produced generative model simply extracted plotted. However newdata supplied original model call, assumption made newdata supplied comes sequentially data supplied data original model (.e. assume time gap last observation series 1 data first observation series 1 newdata) data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows series Either integer specifying series set forecast, character string '', specifying series forecast. preferable fitted model contained multivariate trends (either dynamic factor VAR process), saves recomputing full set trends series individually n_cores integer specifying number cores generating forecasts parallel type value link (default) linear predictor calculated link scale. expected used, predictions reflect expectation response (mean) ignore uncertainty observation process. response used, predictions take uncertainty observation process account return predictions outcome scale. variance used, variance response respect mean (mean-variance relationship) returned. Two special cases also allowed: type latent_N return estimated latent abundances N-mixture distribution, type detection return estimated detection probability N-mixture distribution","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"object class mvgam_forecast containing hindcast forecast distributions. See mvgam_forecast-class details.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"Posterior predictions drawn fitted mvgam used simulate forecast distribution","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract or compute hindcasts and forecasts for a fitted mvgam object — forecast.mvgam","text":"","code":"if (FALSE) { simdat <- sim_mvgam(n_series = 3, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast(mod, newdata = simdat$data_test) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast(mod, newdata = simdat$data_test, type = 'expected') plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract formulae from mvgam objects — formula.mvgam","title":"Extract formulae from mvgam objects — formula.mvgam","text":"Extract formulae mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract formulae from mvgam objects — formula.mvgam","text":"","code":"# S3 method for mvgam formula(x, trend_effects = FALSE, ...) # S3 method for mvgam_prefit formula(x, trend_effects = FALSE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract formulae from mvgam objects — formula.mvgam","text":"x mvgam mvgam_prefit object trend_effects logical, return formula observation model (FALSE) underlying process model (ifTRUE) ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract formulae from mvgam objects — formula.mvgam","text":"formula object","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Extract formulae from mvgam objects — formula.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_monitor_pars.html","id":null,"dir":"Reference","previous_headings":"","what":"Return parameters to monitor during modelling — get_monitor_pars","title":"Return parameters to monitor during modelling — get_monitor_pars","text":"Return parameters monitor modelling","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_monitor_pars.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return parameters to monitor during modelling — get_monitor_pars","text":"","code":"get_monitor_pars(family, smooths_included = TRUE, use_lv, trend_model, drift)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_monitor_pars.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return parameters to monitor during modelling — get_monitor_pars","text":"family character smooths_included Logical. smooth terms included model formula? use_lv Logical (use latent variable trends ) trend_model type trend model used drift Logical (drift term estimated )","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_monitor_pars.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return parameters to monitor during modelling — get_monitor_pars","text":"string parameters monitor","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"function lists parameters can prior distributions changed given mvgam model, well listing default distributions","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"","code":"get_mvgam_priors( formula, trend_formula, data, data_train, family = \"poisson\", use_lv = FALSE, n_lv, use_stan = TRUE, trend_model = \"None\", trend_map, drift = FALSE )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"formula character string specifying GAM observation model formula. exactly like formula GLM except smooth terms, s(), te(), ti(), t2(), well time-varying dynamic() terms, can added right hand side specify linear predictor depends smooth functions predictors (linear functionals ). nmix() family models, formula used set linear predictor detection probability. Details formula syntax used mvgam can found mvgam_formulae trend_formula optional character string specifying GAM process model formula. supplied, linear predictor modelled latent trends capture process model evolution separately observation model. response variable specified left-hand side formula (.e. valid option ~ season + s(year)). Also note use identifier series formula specify effects vary across time series. Instead use trend. ensure models trend_map supplied still work consistently (.e. allowing effects vary across process models, even time series share underlying process model). feature currently available RW(), AR() VAR() trend models. nmix() family models, trend_formula used set linear predictor underlying latent abundance data dataframe list containing model response variable covariates required GAM formula optional trend_formula. include columns: series (factor index series IDs;number levels identical number unique series labels (.e. n_series = length(levels(data$series)))) time (numeric integer index time point observation). variables included linear predictor formula must also present data_train Deprecated. Still works place data users recommended use data instead seamless integration R workflows family family specifying exponential observation family series. Currently supported families : nb() count data poisson() count data gaussian() real-valued data betar() proportional data (0,1) lognormal() non-negative real-valued data student_t() real-valued data Gamma() non-negative real-valued data nmix() count data imperfect detection modeled via State-Space N-Mixture model. latent states Poisson, capturing 'true' latent abundance, observation process Binomial account imperfect detection. See mvgam_families example use family Note nb() poisson() available using JAGS backend. Default poisson(). See mvgam_families details use_lv logical. TRUE, use dynamic factors estimate series' latent trends reduced dimension format. available RW(), AR() GP() trend models. Defaults FALSE n_lv integer number latent dynamic factors use use_lv == TRUE. > n_series. Defaults arbitrarily min(2, floor(n_series / 2)) use_stan Logical. TRUE, model compiled sampled using Hamiltonian Monte Carlo call cmdstan_model call stan. Note many options using Stan vs JAGS trend_model character function specifying time series dynamics latent trend. Options : None (latent trend component; .e. GAM component contributes linear predictor, observation process source error; similarly estimated gam) 'RW' RW() 'AR1' AR(p = 1) 'AR2' AR(p = 2) 'AR3' AR(p = 3) 'VAR1' VAR()(available Stan) 'PWlogistic, 'PWlinear' PW() (available Stan) 'GP' GP() (Gaussian Process squared exponential kernel; available Stan) trend types apart GP() PW(), moving average /correlated process error terms can also estimated (example, RW(cor = TRUE) set multivariate Random Walk n_series > 1). See mvgam_trends details trend_map Optional data.frame specifying series depend latent trends. Useful allowing multiple series depend latent trend process, different observation processes. supplied, latent factor model set setting use_lv = TRUE using mapping set shared trends. Needs column names series trend, integer values trend column state trend series depend . series column single unique entry series data (names perfectly match factor levels series variable data). See examples details drift logical estimate drift parameter latent trend components. Useful latent trend expected broadly follow non-zero slope. available RW() AR() trend models. Note latent trend less stationary, drift parameter can become unidentifiable, especially intercept term included GAM linear predictor (default calling jagam). Drift parameters also likely unidentifiable using dynamic factor models. Therefore defaults FALSE","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"either data.frame containing prior definitions (suitable priors can altered user) NULL, indicating priors model can modified mvgam interface","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"Users can supply model formula, prior fitting model, default priors can inspected altered. make alterations, change contents prior column supplying data.frame mvgam function using argument priors. using Stan backend, users can also modify parameter bounds modifying new_lowerbound /new_upperbound columns. necessary using restrictive distributions parameters, Beta distribution trend sd parameters example (Beta support (0,1)), upperbound 1. Another option make use prior modification functions brms (.e. prior) change prior distributions bounds (just use name parameter like change class argument; see examples )","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"prior, new_lowerbound /new_upperbound columns output altered defining user-defined priors mvgam model. Use familiar underlying probabilistic programming language. sanity checks done ensure code legal (.e. check lower bounds smaller upper bounds, example)","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract information on default prior distributions for an mvgam model — get_mvgam_priors","text":"","code":"# Simulate three integer-valued time series library(mvgam) dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not always necessary, # but this can be useful for testing whether your updated priors are written correctly) mod_default <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = 'nb', data = dat$data_train, trend_model = 'AR2', run_model = FALSE) # Inspect the model file with default mvgam priors code(mod_default) #> // Stan model code generated by package mvgam #> functions { #> vector rep_each(vector x, int K) { #> int N = rows(x); #> vector[N * K] y; #> int pos = 1; #> for (n in 1 : N) { #> for (k in 1 : K) { #> y[pos] = x[n]; #> pos += 1; #> } #> } #> return y; #> } #> } #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[8, 8] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // random effect variances #> vector[1] sigma_raw; #> #> // random effect means #> vector[1] mu_raw; #> #> // negative binomial overdispersion #> vector[n_series] phi_inv; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend AR2 terms #> vector[n_series] ar2; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : 8] = b_raw[1 : 8]; #> b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1]; #> } #> model { #> // prior for random effect population variances #> sigma_raw ~ student_t(3, 0, 2.5); #> #> // prior for random effect population means #> mu_raw ~ std_normal(); #> #> // prior for s(season)... #> b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]); #> #> // prior (non-centred) for s(series)... #> b_raw[9 : 11] ~ std_normal(); #> #> // priors for AR parameters #> ar1 ~ std_normal(); #> ar2 ~ std_normal(); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for overdispersion parameters #> phi_inv ~ student_t(3, 0, 0.1); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma); #> for (s in 1 : n_series) { #> trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s] #> + ar2[s] * trend[1 : (n - 2), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> array[n_nonmissing] real flat_phis; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]); #> flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends) #> * append_row(b, 1.0)), #> inv(flat_phis)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> matrix[n, n_series] phi_vec; #> vector[n_series] phi; #> phi = inv(phi_inv); #> for (s in 1 : n_series) { #> phi_vec[1 : n, s] = rep_vector(phi[s], n); #> } #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]); #> } #> } #> #> # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = 'nb', data = dat$data_train, trend_model = 'AR2') test_priors #> param_name param_length #> 1 vector[n_sp] lambda; 2 #> 2 vector[1] mu_raw; 1 #> 3 vector[1] sigma_raw; 1 #> 4 vector[n_series] ar1; 3 #> 5 vector[n_series] ar2; 3 #> 6 vector[n_series] sigma; 3 #> 7 vector[n_series] phi_inv; 3 #> param_info prior #> 1 s(season) smooth parameters lambda ~ normal(5, 30); #> 2 s(series) pop mean mu_raw ~ std_normal(); #> 3 s(series) pop sd sigma_raw ~ student_t(3, 0, 2.5); #> 4 trend AR1 coefficient ar1 ~ std_normal(); #> 5 trend AR2 coefficient ar2 ~ std_normal(); #> 6 trend sd sigma ~ student_t(3, 0, 2.5); #> 7 inverse of NB dispsersion phi_inv ~ student_t(3, 0, 0.1); #> example_change new_lowerbound new_upperbound #> 1 lambda ~ exponential(0.03); NA NA #> 2 mu_raw ~ normal(0.56, 0.85); NA NA #> 3 sigma_raw ~ exponential(0.28); NA NA #> 4 ar1 ~ normal(0.03, 0.94); NA NA #> 5 ar2 ~ normal(0.08, 0.56); NA NA #> 6 sigma ~ exponential(0.97); NA NA #> 7 phi_inv ~ normal(0.75, 0.12); NA NA # Make a few changes; first, change the population mean for the series-level # random intercepts test_priors$prior[2] <- 'mu_raw ~ normal(0.2, 0.5);' # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- 'ar2 ~ normal(0, 0.25);' # Check that the changes are made to the model file without any warnings by # setting 'run_model = FALSE' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = 'nb', data = dat$data_train, trend_model = 'AR2', priors = test_priors, run_model = FALSE) code(mod) #> // Stan model code generated by package mvgam #> functions { #> vector rep_each(vector x, int K) { #> int N = rows(x); #> vector[N * K] y; #> int pos = 1; #> for (n in 1 : N) { #> for (k in 1 : K) { #> y[pos] = x[n]; #> pos += 1; #> } #> } #> return y; #> } #> } #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[8, 8] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // random effect variances #> vector[1] sigma_raw; #> #> // random effect means #> vector[1] mu_raw; #> #> // negative binomial overdispersion #> vector[n_series] phi_inv; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend AR2 terms #> vector[n_series] ar2; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : 8] = b_raw[1 : 8]; #> b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1]; #> } #> model { #> // prior for random effect population variances #> sigma_raw ~ student_t(3, 0, 2.5); #> #> // prior for random effect population means #> mu_raw ~ normal(0.2, 0.5); #> #> // prior for s(season)... #> b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]); #> #> // prior (non-centred) for s(series)... #> b_raw[9 : 11] ~ std_normal(); #> #> // priors for AR parameters #> ar1 ~ std_normal(); #> ar2 ~ normal(0, 0.25); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for overdispersion parameters #> phi_inv ~ student_t(3, 0, 0.1); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma); #> for (s in 1 : n_series) { #> trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s] #> + ar2[s] * trend[1 : (n - 2), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> array[n_nonmissing] real flat_phis; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]); #> flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends) #> * append_row(b, 1.0)), #> inv(flat_phis)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> matrix[n, n_series] phi_vec; #> vector[n_series] phi; #> phi = inv(phi_inv); #> for (s in 1 : n_series) { #> phi_vec[1 : n, s] = rep_vector(phi[s], n); #> } #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]); #> } #> } #> #> # No warnings, the model is ready for fitting now in the usual way with the addition # of the 'priors' argument # The same can be done using brms functions; here we will also change the ar1 prior # and put some bounds on the ar coefficients to enforce stationarity; we set the # prior using the 'class' argument in all brms prior functions brmsprior <- c(prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1)) brmsprior #> prior class coef group resp dpar nlpar lb ub source #> normal(0.2, 0.5) mu_raw user #> normal(0, 0.25) ar1 -1 1 user #> normal(0, 0.25) ar2 -1 1 user mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = 'nb', data = dat$data_train, trend_model = 'AR2', priors = brmsprior, run_model = FALSE) code(mod) #> // Stan model code generated by package mvgam #> functions { #> vector rep_each(vector x, int K) { #> int N = rows(x); #> vector[N * K] y; #> int pos = 1; #> for (n in 1 : N) { #> for (k in 1 : K) { #> y[pos] = x[n]; #> pos += 1; #> } #> } #> return y; #> } #> } #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[8, 8] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // random effect variances #> vector[1] sigma_raw; #> #> // random effect means #> vector[1] mu_raw; #> #> // negative binomial overdispersion #> vector[n_series] phi_inv; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend AR2 terms #> vector[n_series] ar2; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : 8] = b_raw[1 : 8]; #> b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1]; #> } #> model { #> // prior for random effect population variances #> sigma_raw ~ student_t(3, 0, 2.5); #> #> // prior for random effect population means #> mu_raw ~ normal(0.2, 0.5); #> #> // prior for s(season)... #> b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]); #> #> // prior (non-centred) for s(series)... #> b_raw[9 : 11] ~ std_normal(); #> #> // priors for AR parameters #> ar1 ~ normal(0, 0.25); #> ar2 ~ normal(0, 0.25); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for overdispersion parameters #> phi_inv ~ student_t(3, 0, 0.1); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma); #> for (s in 1 : n_series) { #> trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s] #> + ar2[s] * trend[1 : (n - 2), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> array[n_nonmissing] real flat_phis; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]); #> flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends) #> * append_row(b, 1.0)), #> inv(flat_phis)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> matrix[n, n_series] phi_vec; #> vector[n_series] phi; #> phi = inv(phi_inv); #> for (s in 1 : n_series) { #> phi_vec[1 : n, s] = rep_vector(phi[s], n); #> } #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]); #> } #> } #> #> # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- 'ar2_bananas ~ normal(0, 0.25);' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = 'nb', data = dat$data_train, trend_model = 'AR2', priors = test_priors, run_model = FALSE) #> Warning: no match found in model_file for parameter: ar2_bananas code(mod) #> // Stan model code generated by package mvgam #> functions { #> vector rep_each(vector x, int K) { #> int N = rows(x); #> vector[N * K] y; #> int pos = 1; #> for (n in 1 : N) { #> for (k in 1 : K) { #> y[pos] = x[n]; #> pos += 1; #> } #> } #> return y; #> } #> } #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[8, 8] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // random effect variances #> vector[1] sigma_raw; #> #> // random effect means #> vector[1] mu_raw; #> #> // negative binomial overdispersion #> vector[n_series] phi_inv; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend AR2 terms #> vector[n_series] ar2; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : 8] = b_raw[1 : 8]; #> b[9 : 11] = mu_raw[1] + b_raw[9 : 11] * sigma_raw[1]; #> } #> model { #> // prior for random effect population variances #> sigma_raw ~ student_t(3, 0, 2.5); #> #> // prior for random effect population means #> mu_raw ~ normal(0.2, 0.5); #> #> // prior for s(season)... #> b_raw[1 : 8] ~ multi_normal_prec(zero[1 : 8], S1[1 : 8, 1 : 8] * lambda[1]); #> #> // prior (non-centred) for s(series)... #> b_raw[9 : 11] ~ std_normal(); #> #> // priors for AR parameters #> ar1 ~ std_normal(); #> ar2 ~ std_normal(); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for overdispersion parameters #> phi_inv ~ student_t(3, 0, 0.1); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> trend[2, 1 : n_series] ~ normal(trend[1, 1 : n_series] * ar1, sigma); #> for (s in 1 : n_series) { #> trend[3 : n, s] ~ normal(ar1[s] * trend[2 : (n - 1), s] #> + ar2[s] * trend[1 : (n - 2), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> array[n_nonmissing] real flat_phis; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]); #> flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends) #> * append_row(b, 1.0)), #> inv(flat_phis)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> matrix[n, n_series] phi_vec; #> vector[n_series] phi; #> phi = inv(phi_inv); #> for (s in 1 : n_series) { #> phi_vec[1 : n, s] = rep_vector(phi[s], n); #> } #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]); #> } #> } #> #> # Example of changing parametric (fixed effect) priors simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors(y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = 'AR1') # Change priors for the intercept and fake covariate effects priors$prior[1] <- '(Intercept) ~ normal(0, 1);' priors$prior[2] <- 'cov ~ normal(0, 0.1);' mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = 'AR1', family = poisson(), priors = priors, run_model = FALSE) code(mod2) #> // Stan model code generated by package mvgam #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[9, 18] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : num_basis] = b_raw[1 : num_basis]; #> } #> model { #> // prior for (Intercept)... #> b_raw[1] ~ normal(0, 1); #> #> // prior for cov... #> b_raw[2] ~ normal(0, 0.1); #> #> // prior for s(season)... #> b_raw[3 : 11] ~ multi_normal_prec(zero[3 : 11], #> S1[1 : 9, 1 : 9] * lambda[1] #> + S1[1 : 9, 10 : 18] * lambda[2]); #> #> // priors for AR parameters #> ar1 ~ std_normal(); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> for (s in 1 : n_series) { #> trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0, #> append_row(b, 1.0)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); #> } #> } #> #> # Likewise using brms utilities (note that you can use # Intercept rather than `(Intercept)`) to change priors on the intercept brmsprior <- c(prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept)) brmsprior #> prior class coef group resp dpar nlpar lb ub source #> normal(0.2, 0.5) cov user #> normal(0, 0.25) Intercept user mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = 'AR1', family = poisson(), priors = brmsprior, run_model = FALSE) code(mod2) #> // Stan model code generated by package mvgam #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[9, 18] S1; // mgcv smooth penalty matrix S1 #> int n_nonmissing; // number of nonmissing observations #> array[n_nonmissing] int flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // latent trend AR1 terms #> vector[n_series] ar1; #> #> // latent trend variance parameters #> vector[n_series] sigma; #> #> // latent trends #> matrix[n, n_series] trend; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : num_basis] = b_raw[1 : num_basis]; #> } #> model { #> // prior for (Intercept)... #> b_raw[1] ~ normal(0, 0.25); #> #> // prior for cov... #> b_raw[2] ~ normal(0.2, 0.5); #> #> // prior for s(season)... #> b_raw[3 : 11] ~ multi_normal_prec(zero[3 : 11], #> S1[1 : 9, 1 : 9] * lambda[1] #> + S1[1 : 9, 10 : 18] * lambda[2]); #> #> // priors for AR parameters #> ar1 ~ std_normal(); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for latent trend variance parameters #> sigma ~ student_t(3, 0, 2.5); #> #> // trend estimates #> trend[1, 1 : n_series] ~ normal(0, sigma); #> for (s in 1 : n_series) { #> trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]); #> } #> { #> // likelihood functions #> vector[n_nonmissing] flat_trends; #> flat_trends = to_vector(trend)[obs_ind]; #> flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0, #> append_row(b, 1.0)); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> vector[n_series] tau; #> array[n, n_series] int ypred; #> rho = log(lambda); #> for (s in 1 : n_series) { #> tau[s] = pow(sigma[s], -2.0); #> } #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; #> ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); #> } #> } #> #> # The \"class = 'b'\" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) #> Gu & Wahba 4 term additive model dat$time <- 1:NROW(dat) mod <- mvgam(y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = 'b'), data = dat, family = gaussian(), run_model = FALSE) code(mod) #> // Stan model code generated by package mvgam #> functions { #> vector rep_each(vector x, int K) { #> int N = rows(x); #> vector[N * K] y; #> int pos = 1; #> for (n in 1 : N) { #> for (k in 1 : K) { #> y[pos] = x[n]; #> pos += 1; #> } #> } #> return y; #> } #> } #> data { #> int total_obs; // total number of observations #> int n; // number of timepoints per series #> int n_sp; // number of smoothing parameters #> int n_series; // number of series #> int num_basis; // total number of basis coefficients #> vector[num_basis] zero; // prior locations for basis coefficients #> matrix[total_obs, num_basis] X; // mgcv GAM design matrix #> array[n, n_series] int ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) #> matrix[9, 18] S1; // mgcv smooth penalty matrix S1 #> matrix[9, 18] S2; // mgcv smooth penalty matrix S2 #> int n_nonmissing; // number of nonmissing observations #> vector[n_nonmissing] flat_ys; // flattened nonmissing observations #> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations #> array[n_nonmissing] int obs_ind; // indices of nonmissing observations #> } #> parameters { #> // raw basis coefficients #> vector[num_basis] b_raw; #> #> // gaussian observation error #> vector[n_series] sigma_obs; #> #> // smoothing parameters #> vector[n_sp] lambda; #> } #> transformed parameters { #> // basis coefficients #> vector[num_basis] b; #> b[1 : num_basis] = b_raw[1 : num_basis]; #> } #> model { #> // prior for (Intercept)... #> b_raw[1] ~ student_t(3, 7.4, 3.7); #> #> // prior for x0... #> b_raw[2] ~ normal(0, 0.75); #> #> // prior for x1... #> b_raw[3] ~ normal(0, 0.75); #> #> // prior for s(x2)... #> b_raw[4 : 12] ~ multi_normal_prec(zero[4 : 12], #> S1[1 : 9, 1 : 9] * lambda[1] #> + S1[1 : 9, 10 : 18] * lambda[2]); #> #> // prior for s(x3)... #> b_raw[13 : 21] ~ multi_normal_prec(zero[13 : 21], #> S2[1 : 9, 1 : 9] * lambda[3] #> + S2[1 : 9, 10 : 18] * lambda[4]); #> #> // priors for smoothing parameters #> lambda ~ normal(5, 30); #> #> // priors for observation error parameters #> sigma_obs ~ student_t(3, 0, 3.7); #> { #> // likelihood functions #> vector[n_nonmissing] flat_sigma_obs; #> flat_sigma_obs = rep_each(sigma_obs, n)[obs_ind]; #> flat_ys ~ normal_id_glm(flat_xs, 0.0, b, flat_sigma_obs); #> } #> } #> generated quantities { #> vector[total_obs] eta; #> matrix[n, n_series] sigma_obs_vec; #> matrix[n, n_series] mus; #> vector[n_sp] rho; #> array[n, n_series] real ypred; #> rho = log(lambda); #> #> // posterior predictions #> eta = X * b; #> for (s in 1 : n_series) { #> sigma_obs_vec[1 : n, s] = rep_vector(sigma_obs[s], n); #> } #> for (s in 1 : n_series) { #> mus[1 : n, s] = eta[ytimes[1 : n, s]]; #> ypred[1 : n, s] = normal_rng(mus[1 : n, s], sigma_obs_vec[1 : n, s]); #> } #> } #> #>"},{"path":"https://nicholasjclark.github.io/mvgam/reference/GP.html","id":null,"dir":"Reference","previous_headings":"","what":"Specify dynamic Gaussian processes — GP","title":"Specify dynamic Gaussian processes — GP","text":"Set low-rank approximate Gaussian Process trend models using Hilbert basis expansions mvgam. function evaluate arguments – exists purely help set model particular GP trend models.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/GP.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Specify dynamic Gaussian processes — GP","text":"","code":"GP()"},{"path":"https://nicholasjclark.github.io/mvgam/reference/GP.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Specify dynamic Gaussian processes — GP","text":"object class mvgam_trend, contains list arguments interpreted parsing functions mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/GP.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Specify dynamic Gaussian processes — GP","text":"GP trend estimated series using Hilbert space approximate Gaussian Processes. mvgam, latent squared exponential GP trends approximated using default 20 basis functions using multiplicative factor c = 5/4, saves computational costs compared fitting full GPs adequately estimating GP alpha rho parameters.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"Extract hindcasts fitted mvgam object","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"","code":"hindcast(object, ...) # S3 method for mvgam hindcast(object, series = \"all\", type = \"response\", ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"object list object returned mvgam. See mvgam() ... Ignored series Either integer specifying series set forecast, character string '', specifying series forecast. preferable fitted model contained multivariate trends (either dynamic factor VAR process), saves recomputing full set trends series individually type value link (default) linear predictor calculated link scale. expected used, predictions reflect expectation response (mean) ignore uncertainty observation process. response used, predictions take uncertainty observation process account return predictions outcome scale. variance used, variance response respect mean (mean-variance relationship) returned. Two special cases also allowed: type latent_N return estimated latent abundances N-mixture distribution, type detection return estimated detection probability N-mixture distribution","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"object class mvgam_forecast containing hindcast distributions. See mvgam_forecast-class details. #'@seealso forecast.mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"Posterior retrodictions drawn fitted mvgam organized convenient format","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract hindcasts for a fitted mvgam object — hindcast.mvgam","text":"","code":"if (FALSE) { simdat <- sim_mvgam(n_series = 3, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/index-mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Index mvgam objects — index-mvgam","title":"Index mvgam objects — index-mvgam","text":"Index mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/index-mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Index mvgam objects — index-mvgam","text":"","code":"# S3 method for mvgam variables(x, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/index-mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Index mvgam objects — index-mvgam","text":"x list object returned mvgam. See mvgam() ... Arguments passed individual methods (applicable).","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"Approximate leave-future-cross-validation fitted mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"","code":"lfo_cv(object, ...) # S3 method for mvgam lfo_cv(object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"object list object returned mvgam. See mvgam() ... Ignored data dataframe list containing model response variable covariates required GAM formula. include columns: 'series' (character factor index series IDs) 'time' (numeric index time point observation). variables included linear predictor formula must also present min_t Integer specifying minimum training time required making predictions data. Default either 30, whatever training time allows least 10 lfo-cv calculations (.e. pmin(max(data$time) - 10, 30)) fc_horizon Integer specifying number time steps ahead evaluating forecasts pareto_k_threshold Proportion specifying threshold Pareto shape parameter considered unstable, triggering model refit. Default 0.7","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"list class mvgam_lfo containing approximate ELPD scores, Pareto-k shape values 'specified pareto_k_threshold","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"Approximate leave-future-cross-validation uses expanding training window scheme evaluate model forecasting ability. steps used function mirror laid lfo vignette loo package, written Paul Bürkner, Jonah Gabry, Aki Vehtari. First, refit model using first min_t observations perform single exact fc_horizon-ahead forecast step. forecast evaluated min_t + fc_horizon sample observations using Expected Log Predictive Density (ELPD). Next, approximate successive round expanding window forecasts moving forward one step time 1:N_evaluations re-weighting draws model's posterior predictive distribution using Pareto Smoothed Importance Sampling (PSIS). iteration , PSIS weights obtained next observation included model re-fit (.e. last observation training data, min_t + ). importance ratios stable, consider approximation adequate use re-weighted posterior's forecast evaluating next holdout set testing observations ((min_t + + 1):(min_t + + fc_horizon)). point importance ratio variability become large importance sampling fail. indicated estimated shape parameter k generalized Pareto distribution crossing certain threshold pareto_k_threshold. refit model using observations time failure. restart process iterate forward next refit triggered (Bürkner et al. 2020).","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-cross-validation Bayesian time series models Journal Statistical Computation Simulation. 90:14, 2499-2523.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Approximate leave-future-out cross-validation of fitted mvgam objects — lfo_cv.mvgam","text":"","code":"if (FALSE) { # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR2', family = poisson(), data = dat$data_train, newdata = dat$data_test) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'RW', family = poisson(), data = dat$data_train, newdata = dat$data_test) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives # better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","title":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","text":"Compute pointwise Log-Likelihoods fitted mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","text":"","code":"# S3 method for mvgam logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","text":"object list object returned mvgam linpreds Optional matrix linear predictor draws use calculating poitwise log-likelihoods newdata Optional data.frame list object specifying series column linpreds belongs . linpreds supplied, newdata must also supplied family_pars Optional list containing posterior draws family-specific parameters (.e. shape, scale overdispersion parameters). Required linpreds newdata supplied include_forecast Logical. newdata fed model compute forecasts, log-likelihood draws observations also returned. Defaults TRUE ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","text":"matrix dimension n_samples x n_observations containing pointwise log-likelihood draws observations newdata. newdata supplied, log-likelihood draws returned observations originally fed model (training observations , supplied original model via newdata argument mvgam, testing observations)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute pointwise Log-Likelihoods from fitted mvgam objects — logLik.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Extract logLikelihood values lls <- logLik(mod) str(lls) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/loo.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"LOO information criteria for mvgam models — loo.mvgam","title":"LOO information criteria for mvgam models — loo.mvgam","text":"Extract LOOIC (leave-one-information criterion) using loo::loo()","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/loo.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"LOO information criteria for mvgam models — loo.mvgam","text":"","code":"# S3 method for mvgam loo(x, ...) # S3 method for mvgam loo_compare(x, ..., model_names = NULL)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/loo.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"LOO information criteria for mvgam models — loo.mvgam","text":"x Object class mvgam ... mvgam objects. model_names NULL (default) use model names derived deparsing call. Otherwise use passed values model names.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/loo.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"LOO information criteria for mvgam models — loo.mvgam","text":"","code":"if (FALSE) { # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(111) simdat <- sim_mvgam(seasonality = 'hierarchical', trend_model = 'AR1', family = gaussian()) # Fit a model with shared seasonality mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian()) plot(mod1, type = 'smooths') loo(mod1) # Now fit a model with hierarchical seasonality mod2 <- update(mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4)) plot(mod2, type = 'smooths') loo(mod2) # Now add a AR1 dynamic errors to mod2 mod3 <- update(mod2, trend_model = 'AR1') plot(mod3, type = 'smooths') plot(mod3, type = 'trend') loo(mod3) # Compare models using LOO loo_compare(mod1, mod2, mod3) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate trend correlations based on mvgam latent factor loadings — lv_correlations","title":"Calculate trend correlations based on mvgam latent factor loadings — lv_correlations","text":"function uses samples latent trends series fitted mvgam model calculates correlations among series' trends","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate trend correlations based on mvgam latent factor loadings — lv_correlations","text":"","code":"lv_correlations(object)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate trend correlations based on mvgam latent factor loadings — lv_correlations","text":"object list object returned mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate trend correlations based on mvgam latent factor loadings — lv_correlations","text":"list object containing mean posterior correlations full array posterior correlations","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","title":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","text":"Convenient way call MCMC plotting functions implemented bayesplot package","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","text":"","code":"# S3 method for mvgam mcmc_plot( object, type = \"intervals\", variable = NULL, regex = FALSE, use_alias = TRUE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","text":"object R object typically class brmsfit type type plot. Supported types (names) hist, dens, hist_by_chain, dens_overlay, violin, intervals, areas, areas_ridges, combo, acf, acf_bar, trace, trace_highlight, scatter, hex, pairs, violin, rhat, rhat_hist, neff, neff_hist nuts_energy. overview various plot types see MCMC-overview. variable Names variables (parameters) plot, given character vector regular expression (regex = TRUE). default, hopefully large selection variables plotted. regex Logical; Indicates whether variable treated regular expressions. Defaults FALSE. use_alias Logical. informative names parameters available (.e. beta coefficients b smoothing parameters rho), replace uninformative names informative alias. Defaults TRUE ... Additional arguments passed plotting functions. See MCMC-overview details.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","text":"ggplot object can customized using ggplot2 package.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"MCMC plots as implemented in bayesplot — mcmc_plot.mvgam","text":"","code":"if (FALSE) { simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","title":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","text":"Extract model.frame fitted mvgam object","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","text":"","code":"# S3 method for mvgam model.frame(formula, trend_effects = FALSE, ...) # S3 method for mvgam_prefit model.frame(formula, trend_effects = FALSE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","text":"formula model formula terms object R object. trend_effects logical, return model.frame observation model (FALSE) underlying process model (ifTRUE) ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","text":"matrix containing fitted model frame","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Extract model.frame from a fitted mvgam object — model.frame.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":null,"dir":"Reference","previous_headings":"","what":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"Uses constructors package splines2 build monotonically increasing decreasing splines. Details also Wang & Yan (2021).","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"","code":"# S3 method for moi.smooth.spec smooth.construct(object, data, knots) # S3 method for mod.smooth.spec smooth.construct(object, data, knots) # S3 method for moi.smooth Predict.matrix(object, data) # S3 method for mod.smooth Predict.matrix(object, data)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"object smooth specification object, usually generated term s(x, bs = \"moi\", ...) s(x, bs = \"mod\", ...) data list containing just data (including variable) required term, names corresponding object$term (object$). variable last element. knots list containing knots supplied basis setup --- order names data. Can NULL. See details information.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"object class \"moi.smooth\" \"mod.smooth\". addition usual elements smooth class documented smooth.construct, object contain slot called boundary defines endpoints beyond spline begin extrapolating (extrapolation flat due first order penalty placed smooth function)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"constructor normally called directly, rather used internally mvgam. supplied knots spline placed evenly throughout covariate values term refers: example, fitting 101 data 11 knot spline x knot every 10th (ordered) x value. spline implementation closed-form -spline basis based recursion formula given Ramsay (1988), basis coefficients must constrained either non-negative (monotonically increasing functions) non-positive (monotonically decreasing) Take note using either monotonic basis, number basis functions k must supplied even integer due manner monotonic basis functions constructed","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"constructor result valid smooth using call gam bam, however resulting functions guaranteed monotonic constraints basis coefficients enforced","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"Wang, Wenjie, Jun Yan. \"Shape-Restricted Regression Splines R Package splines2.\" Journal Data Science 19.3 (2021). Ramsay, J. O. (1988). Monotone regression splines action. Statistical Science, 3(4), 425--441.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/monotonic.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Monotonic splines in mvgam — smooth.construct.moi.smooth.spec","text":"","code":"if (FALSE) { # Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity mod_data <- data.frame(y = y, x = x) mod <- gam(y ~ s(x, k = 16), data = mod_data, family = gaussian()) library(marginaleffects) plot_predictions(mod, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam(y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian()) plot_predictions(mod2, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c(f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam(y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian()) # Visualise the different monotonic functions plot_predictions(mod, condition = c('x', 'fac', 'fac'), points = 0.5) plot(mod, type = 'smooth', realisations = TRUE) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam-class.html","id":null,"dir":"Reference","previous_headings":"","what":"Fitted mvgam object description — mvgam-class","title":"Fitted mvgam object description — mvgam-class","text":"fitted mvgam object returned function mvgam. Run methods(class = \"mvgam\") see overview available methods.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam-class.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fitted mvgam object description — mvgam-class","text":"mvgam object contains following elements: call original observation model formula trend_call trend_formula supplied, original trend model formula returned. Otherwise NULL family character description observation distribution trend_model character description latent trend model trend_map data.frame describing mapping trend states observations, supplied original model. Otherwise NULL drift Logical specifying whether drift term used trend model priors model priors updated defaults, prior dataframe returned. Otherwise NULL model_output MCMC object returned fitting engine. model fitted using Stan, object class stanfit (see stanfit-class details). JAGS used backend, object class runjags (see runjags-class details) model_file character string model file used describe model either Stan JAGS syntax model_data return_model_data set TRUE fitting model, list object containing data objects needed condition model returned. item list described detail top model_file. Otherwise NULL inits return_model_data set TRUE fitting model, initial value functions used initialise MCMC chains returned. Otherwise NULL monitor_pars parameters monitored MCMC sampling returned character vector sp_names character vector specifying names smoothing parameter mgcv_model object class gam containing mgcv version observation model. object used generating linear predictor matrix making predictions new data. coefficients model object contain posterior median coefficients GAM linear predictor, used generating plots smooth functions mvgam currently handle (plots three-dimensional smooths). model therefore used inference. See gamObject details trend_mgcv_model trend_formula supplied, object class gam containing mgcv version trend model. Otherwise NULL ytimes matrix object used model fitting indexing series timepoints observed row supplied data. Used internally downstream plotting prediction functions resids named list object containing posterior draws Dunn-Smyth randomized quantile residuals use_lv Logical flag indicating whether latent dynamic factors used model n_lv use_lv == TRUE, number latent dynamic factors used model upper_bounds bounds supplied original model fit, returned. Otherwise NULL obs_data original data object (either list dataframe) supplied model fitting. test_data test data supplied (argument newdata original model), returned. Othwerise NULL fit_engine Character describing fit engine, either stan jags backend Character describing backend used modelling, either rstan, cmdstanr rjags algorithm Character describing algorithm used finding posterior, either sampling, laplace, pathfinder, meanfield fullrank max_treedepth model fitted using Stan, value supplied maximum treedepth tuning parameter returned (see stan details). Otherwise NULL adapt_delta model fitted using Stan, value supplied adapt_delta tuning parameter returned (see stan details). Otherwise NULL","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam-class.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Fitted mvgam object description — mvgam-class","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"function estimates posterior distribution Generalised Additive Models (GAMs) can include smooth spline functions, specified GAM formula, well latent temporal processes, specified trend_model. modelling options include State-Space representations allow covariates dynamic processes occur latent 'State' level also capturing observation-level effects. Prior specifications flexible explicitly encourage users apply prior distributions actually reflect beliefs. addition, model fits can easily assessed compared posterior predictive checks, forecast comparisons leave-one-/ leave-future-cross-validation.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"","code":"mvgam( formula, trend_formula, knots, trend_knots, data, data_train, newdata, data_test, run_model = TRUE, prior_simulation = FALSE, return_model_data = FALSE, family = \"poisson\", use_lv = FALSE, n_lv, trend_map, trend_model = \"None\", drift = FALSE, chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, priors, refit = FALSE, lfo = FALSE, use_stan = TRUE, backend = getOption(\"brms.backend\", \"cmdstanr\"), algorithm = getOption(\"brms.algorithm\", \"sampling\"), autoformat = TRUE, save_all_pars = FALSE, max_treedepth, adapt_delta, jags_path, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"formula character string specifying GAM observation model formula. exactly like formula GLM except smooth terms, s(), te(), ti(), t2(), well time-varying dynamic() terms, can added right hand side specify linear predictor depends smooth functions predictors (linear functionals ). nmix() family models, formula used set linear predictor detection probability. Details formula syntax used mvgam can found mvgam_formulae trend_formula optional character string specifying GAM process model formula. supplied, linear predictor modelled latent trends capture process model evolution separately observation model. response variable specified left-hand side formula (.e. valid option ~ season + s(year)). Also note use identifier series formula specify effects vary across time series. Instead use trend. ensure models trend_map supplied still work consistently (.e. allowing effects vary across process models, even time series share underlying process model). feature currently available RW(), AR() VAR() trend models. nmix() family models, trend_formula used set linear predictor underlying latent abundance knots optional list containing user specified knot values used basis construction. bases user simply supplies knots used, must match k value supplied (note number knots always just k). Different terms can use different numbers knots, unless share covariate trend_knots knots , optional list knot values smooth functions within trend_formula data dataframe list containing model response variable covariates required GAM formula optional trend_formula. include columns: series (factor index series IDs;number levels identical number unique series labels (.e. n_series = length(levels(data$series)))) time (numeric integer index time point observation). variables included linear predictor formula must also present data_train Deprecated. Still works place data users recommended use data instead seamless integration R workflows newdata Optional dataframe list test data containing least series time addition variables included linear predictor formula. included, observations variable y set NA fitting model posterior simulations can obtained data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows run_model logical. FALSE, model fitted instead function return model file data / initial values needed fit model outside mvgam prior_simulation logical. TRUE, observations fed model, instead simulations prior distributions returned return_model_data logical. TRUE, list data needed fit model returned, along initial values smooth AR parameters, model fitted. helpful users wish modify model file add stochastic elements currently avaiable mvgam. Default FALSE reduce size returned object, unless run_model == FALSE family family specifying exponential observation family series. Currently supported families : nb() count data poisson() count data gaussian() real-valued data betar() proportional data (0,1) lognormal() non-negative real-valued data student_t() real-valued data Gamma() non-negative real-valued data nmix() count data imperfect detection modeled via State-Space N-Mixture model. latent states Poisson, capturing 'true' latent abundance, observation process Binomial account imperfect detection. See mvgam_families example use family Note nb() poisson() available using JAGS backend. Default poisson(). See mvgam_families details use_lv logical. TRUE, use dynamic factors estimate series' latent trends reduced dimension format. available RW(), AR() GP() trend models. Defaults FALSE n_lv integer number latent dynamic factors use use_lv == TRUE. > n_series. Defaults arbitrarily min(2, floor(n_series / 2)) trend_map Optional data.frame specifying series depend latent trends. Useful allowing multiple series depend latent trend process, different observation processes. supplied, latent factor model set setting use_lv = TRUE using mapping set shared trends. Needs column names series trend, integer values trend column state trend series depend . series column single unique entry series data (names perfectly match factor levels series variable data). See examples details trend_model character function specifying time series dynamics latent trend. Options : None (latent trend component; .e. GAM component contributes linear predictor, observation process source error; similarly estimated gam) 'RW' RW() 'AR1' AR(p = 1) 'AR2' AR(p = 2) 'AR3' AR(p = 3) 'VAR1' VAR()(available Stan) 'PWlogistic, 'PWlinear' PW() (available Stan) 'GP' GP() (Gaussian Process squared exponential kernel; available Stan) trend types apart GP() PW(), moving average /correlated process error terms can also estimated (example, RW(cor = TRUE) set multivariate Random Walk n_series > 1). See mvgam_trends details drift logical estimate drift parameter latent trend components. Useful latent trend expected broadly follow non-zero slope. available RW() AR() trend models. Note latent trend less stationary, drift parameter can become unidentifiable, especially intercept term included GAM linear predictor (default calling jagam). Drift parameters also likely unidentifiable using dynamic factor models. Therefore defaults FALSE chains integer specifying number parallel chains model. Ignored algorithm %% c('meanfield', 'fullrank', 'pathfinder', 'laplace') burnin integer specifying number warmup iterations Markov chain run tune sampling algorithms. Ignored algorithm %% c('meanfield', 'fullrank', 'pathfinder', 'laplace') samples integer specifying number post-warmup iterations Markov chain run sampling posterior distribution thin Thinning interval monitors. Ignored algorithm %% c('meanfield', 'fullrank', 'pathfinder', 'laplace') parallel logical specifying whether multiple cores used generating MCMC simulations parallel. TRUE, number cores use min(c(chains, parallel::detectCores() - 1)) threads integer Experimental option use multithreading within-chain parallelisation Stan. recommend use experienced Stan's reduce_sum function slow running model sped means. available using Cmdstan backend priors optional data.frame prior definitions (JAGS Stan syntax). using Stan, can also object class brmsprior (see. prior details). See get_mvgam_priors 'Details' information changing default prior distributions refit Logical indicating whether refit, called using update.mvgam. Users leave FALSE lfo Logical indicating whether part call lfo_cv.mvgam. Returns lighter version model residuals fewer monitored parameters speed post-processing. downstream functions work properly, users always leave set FALSE use_stan Logical. TRUE, model compiled sampled using Hamiltonian Monte Carlo call cmdstan_model call stan. Note many options using Stan vs JAGS backend Character string naming package use backend fitting Stan model (use_stan = TRUE). Options \"cmdstanr\" (default) \"rstan\". Can set globally current R session via \"brms.backend\" option (see options). Details rstan cmdstanr packages available https://mc-stan.org/rstan/ https://mc-stan.org/cmdstanr/, respectively algorithm Character string naming estimation approach use. Options \"sampling\" MCMC (default), \"meanfield\" variational inference factorized normal distributions, \"fullrank\" variational inference multivariate normal distribution, \"laplace\" Laplace approximation (available using cmdstanr backend) \"pathfinder\" pathfinder algorithm (currently available using cmdstanr backend). Can set globally current R session via \"brms.algorithm\" option (see options). Limited testing suggests \"meanfield\" performs best non-MCMC approximations dynamic GAMs, possibly difficulties estimating covariances among many spline parameters latent trend parameters. rigorous testing carried autoformat Logical. Use stanc parser automatically format Stan code check deprecations. Defaults TRUE save_all_pars Logical flag indicate draws variables defined Stan's parameters block saved (default FALSE). max_treedepth positive integer placing cap number simulation steps evaluated iteration use_stan == TRUE. Default 12. Increasing value can sometimes help exploration complex posterior geometries, rarely fruitful go max_treedepth 14 adapt_delta positive numeric 0 1 defining target average proposal acceptance probability Stan's adaptation period, use_stan == TRUE. Default 0.8. general need change adapt_delta unless see warning message divergent transitions, case can increase adapt_delta default value closer 1 (e.g. 0.95 0.99, 0.99 0.999, etc). step size used numerical integrator function adapt_delta increasing adapt_delta result smaller step size fewer divergences. Increasing adapt_delta typically result slower sampler, always lead robust sampler jags_path Optional character vector specifying path location JAGS executable (.exe) use modelling use_stan == FALSE. missing, path recovered call findjags ... arguments passed Stan. backend = \"rstan\" arguments passed sampling vb. backend = \"cmdstanr\" arguments passed cmdstanr::sample, cmdstanr::variational, cmdstanr::laplace cmdstanr::pathfinder method","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"list object class mvgam containing model output, text representation model file, mgcv model output (easily generating simulations unsampled covariate values), Dunn-Smyth residuals series key information needed functions package. See mvgam-class details. Use methods(class = \"mvgam\") overview available methods.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"Dynamic GAMs useful wish predict future values time series show temporal dependence want rely extrapolating smooth term (can sometimes lead unpredictable unrealistic behaviours). addition, smooths can often try wiggle excessively capture autocorrelation present time series, exacerbates problem forecasting ahead. GAMs naturally viewed Bayesian lens, often must model time series show complex distributional features missing data, parameters mvgam models estimated Bayesian framework using Markov Chain Monte Carlo default. general overview provided primary vignettes: vignette(\"mvgam_overview\") vignette(\"data_in_mvgam\"). full list available vignettes see vignette(package = \"mvgam\") Formula syntax: Details formula syntax used mvgam can found mvgam_formulae. Note possible supply empty formula predictors intercepts observation model (.e. y ~ 0 y ~ -1). case, intercept-observation model set intercept coefficient fixed zero. can handy wish fit pure State-Space models variation dynamic trend controls average expectation, /intercepts non-identifiable (piecewise trends, see examples ) Families link functions: Details families supported mvgam can found mvgam_families. Trend models: Details latent trend models supported mvgam can found mvgam_trends. Priors: jagam model file generated formula modified include latent temporal processes. Default priors intercepts scale parameters generated using practice brms. Prior distributions important model parameters can altered user inspect model sensitivities given priors (see get_mvgam_priors details). Note latent trends estimated link scale choose priors accordingly. However control model specification can accomplished first using mvgam baseline, editing returned model accordingly. model file can edited run outside mvgam setting run_model = FALSE encouraged complex modelling tasks. Note, priors formally checked ensure right syntax respective probabilistic modelling framework, user ensure correct (.e. use dnorm normal densities JAGS, mean precision parameterisation; use normal normal densities Stan, mean standard deviation parameterisation) Random effects: smooth terms using random effect basis (smooth.construct.re.smooth.spec), non-centred parameterisation automatically employed avoid degeneracies common hierarchical models. Note however centred versions may perform better series particularly informative, foray Bayesian modelling, worth building understanding model's assumptions limitations following principled workflow. Also note models parameterised using drop.unused.levels = FALSE jagam ensure predictions can made levels supplied factor variable Observation level parameters: one series included data observation family contains one parameter used, additional observation family parameters (.e. phi nb() sigma gaussian()) estimated independently series. Factor regularisation: using dynamic factor model trends JAGS factor precisions given regularized penalty priors theoretically allow factors dropped model squeezing increasing factors' variances zero. done help protect selecting many latent factors needed capture dependencies data, can often advantageous set n_lv slightly larger number. However larger numbers factors come additional computational costs balanced well. using Stan, factors parameterised fixed variance parameters Residuals: series, randomized quantile (.e. Dunn-Smyth) residuals calculated inspecting model diagnostics fitted model appropriate Dunn-Smyth residuals standard normal distribution autocorrelation evident. particular observation missing, residual calculated comparing independent draws model's posterior distribution Using Stan: mvgam primarily designed use Hamiltonian Monte Carlo parameter estimation via software Stan (using either cmdstanr rstan interface). great advantages using Stan Gibbs / Metropolis Hastings samplers, includes option estimate smooth latent trends via Hilbert space approximate Gaussian Processes. often makes sense ecological series, expect change smoothly. mvgam, latent squared exponential GP trends approximated using default 20 basis functions, saves computational costs compared fitting full GPs adequately estimating GP alpha rho parameters. many advantages Stan JAGS, development package applied Stan. includes planned addition response distributions, plans handle zero-inflation, plans incorporate greater variety trend models. Users strongly encouraged opt Stan JAGS proceeding workflows","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) forecasting discrete ecological time series. Methods Ecology Evolution. 14:3, 771-784.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit a Bayesian dynamic GAM to a univariate or multivariate set of time series — mvgam","text":"","code":"if (FALSE) { # Simulate a collection of three time series that have shared seasonal dynamics dat <- sim_mvgam(T = 80, n_series = 3, prop_missing = 0.1, prop_trend = 0.6) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = 'all') # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent random walk temporal process; # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam(formula = y ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = 'RW', family = 'poisson', use_stan = TRUE, run_model = FALSE) # View the model code in Stan language code(mod1) # Inspect the data objects needed to condition the model str(mod1$model_data) # The following code can be used to run the model outside of mvgam; first using rstan model_data <- mod1$model_data library(rstan) fit <- stan(model_code = mod1$model_file, data = model_data) # Now using cmdstanr library(cmdstanr) model_data <- mod1$model_data cmd_mod <- cmdstan_model(write_stan_file(mod1$model_file), stanc_options = list('canonicalize=deprecations,braces,parentheses')) cmd_mod$print() fit <- cmd_mod$sample(data = model_data, chains = 4, parallel_chains = 4, refresh = 100) # Now fit the model using mvgam with the Stan backend mod1 <- mvgam(formula = y ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = 'RW', family = poisson(), use_stan = TRUE) # Extract the model summary summary(mod1) # Plot the estimated historical trend and forecast for one series plot(mod1, type = 'trend', series = 1) plot(mod1, type = 'forecast', series = 1) # Residual diagnostics plot(mod1, type = 'residuals', series = 1) resids <- residuals(mod1) str(resids) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = 'smooths') # Plot estimated first derivatives of the smooth plot(mod1, type = 'smooths', derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = 'smooths', residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = 'smooths', realisations = TRUE) # Plot conditional response predictions using marginaleffects plot(conditional_effects(mod1), ask = FALSE) plot_predictions(mod1, condition = 'season', points = 0.5) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) # Investigate model fit loo(mod1) # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has it's own unique latent trend trend_map <- data.frame(series = unique(mod_data$series), trend = c(1,1,2)) # Fit the model using AR1 trends mod <- mvgam(y ~ s(season, bs = 'cc'), trend_map = trend_map, trend_model = 'AR1', data = mod_data, return_model_data = TRUE) # The mapping matrix is now supplied as data to the model in the 'Z' element mod1$model_data$Z code(mod) # The first two series share an identical latent trend; the third is different plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'trend', series = 3) # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(3) N = 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for(i in 2:N){ beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1], sd = 0.025) } # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate the Gaussian observation process out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5) # Gather necessary data into a data.frame; split into training / testing data = data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180,] data_test <- data[181:200,] # Fit the model using the dynamic() formula helper mod <- mvgam(formula = out ~ dynamic(temp, rho = 8), family = gaussian(), data = data_train, newdata = data_test) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = 'smooths') plot(mod, type = 'forecast', newdata = data_test) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = 'dashed', lwd = 2) # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam(prop_trend = 0, mu = c(0, 2, 2), seasonality = 'hierarchical') # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam(formula = y ~ offset(offset) + s(series, bs = 're') + s(season, bs = 'cc') + s(season, by = series, m = 1, k = 5), data = dat$data_train, trend_model = 'None', use_stan = TRUE) # Inspect the model file to see the modification to the linear predictor # (eta) mod$model_file # Forecasts for the first two series will differ in magnitude layout(matrix(1:2, ncol = 2)) plot(mod, type = 'forecast', series = 1, newdata = dat$data_test, ylim = c(0, 75)) plot(mod, type = 'forecast', series = 2, newdata = dat$data_test, ylim = c(0, 75)) layout(1) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 plot(mod, 'forecast', newdata = dat$data_test) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = 'link', newdata = dat$data_test) series1_inds <- which(dat$data_test$series == 'series_1') series2_inds <- which(dat$data_test$series == 'series_2') # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = 'l', col = 'grey75', ylim = range(preds_rr), ylab = 'Series1 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series1_inds], col = 'grey75') } plot(preds_rr[1, series2_inds], type = 'l', col = 'darkred', ylim = range(preds_rr), ylab = 'Series2 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series2_inds], col = 'darkred') } layout(1) # Example of logistic growth with possible changepoints # Simple logistic growth model dNt = function(r, N, k){ r * N * (k - N) } # Iterate growth through time Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { # population at next time step is current population + growth, # but we introduce several 'shocks' as changepoints if(i %in% c(5, 15, 25, 41, 45, 60, 80)){ N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k)) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } # Simulate expected values set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') # Take Poisson draws y <- rpois(100, expected) plot(y, xlab = 'Time') # Assemble data into dataframe and model. We set a # fixed carrying capacity of 35 for this example, but note that # this value is not required to be fixed at each timepoint mod_data <- data.frame(y = y, time = 1:100, cap = 35, series = as.factor('series_1')) plot_mvgam_series(data = mod_data) # The intercept is nonidentifiable when using piecewise # trends because the trend functions have their own offset # parameters 'm'; it is recommended to always drop intercepts # when using these trend models mod <- mvgam(y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data) summary(mod) # Plot the posterior hindcast plot(mod, type = 'forecast') # View the changepoints with ggplot2 utilities library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs(y = 'Potential changepoint', x = 'Rate change') }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"Extract quantities can used diagnose sampling behavior algorithms applied Stan back-end mvgam.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"","code":"# S3 method for mvgam nuts_params(object, pars = NULL, ...) # S3 method for mvgam log_posterior(object, ...) # S3 method for mvgam rhat(x, pars = NULL, ...) # S3 method for mvgam neff_ratio(object, pars = NULL, ...) # S3 method for mvgam neff_ratio(object, pars = NULL, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"object, x mvgam object. pars optional character vector parameter names. nuts_params NUTS sampler parameter names rather model parameters. pars omitted parameters included. ... Arguments passed individual methods.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"exact form output depends method.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"details see bayesplot-extractors.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract diagnostic quantities of mvgam models — mvgam_diagnostics","text":"","code":"if (FALSE) { simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == \"divergent__\")$Value) head(neff_ratio(mod)) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract posterior draws from fitted mvgam objects — mvgam_draws","title":"Extract posterior draws from fitted mvgam objects — mvgam_draws","text":"Extract posterior draws conventional formats data.frames, matrices, arrays.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract posterior draws from fitted mvgam objects — mvgam_draws","text":"","code":"# S3 method for mvgam as.data.frame( x, row.names = NULL, optional = TRUE, variable = \"betas\", use_alias = TRUE, regex = FALSE, ... ) # S3 method for mvgam as.matrix(x, variable = \"betas\", regex = FALSE, use_alias = TRUE, ...) # S3 method for mvgam as.array(x, variable = \"betas\", regex = FALSE, use_alias = TRUE, ...) # S3 method for mvgam as_draws( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) # S3 method for mvgam as_draws_matrix( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) # S3 method for mvgam as_draws_df( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) # S3 method for mvgam as_draws_array( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) # S3 method for mvgam as_draws_list( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) # S3 method for mvgam as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract posterior draws from fitted mvgam objects — mvgam_draws","text":"x list object class mvgam row.names Ignored optional Ignored variable character specifying parameters extract. Can either one following options: obs_params (parameters specific observation model, overdispsersions negative binomial models observation error SD gaussian / student-t models) betas (beta coefficients GAM observation model linear predictor; default) smooth_params (smoothing parameters GAM observation model) linpreds (estimated linear predictors whatever link scale used model) trend_params (parameters governing trend dynamics, AR parameters, trend SD parameters Gaussian Process parameters) trend_betas (beta coefficients GAM latent process model linear predictor; available trend_formula supplied original model) trend_smooth_params (process model GAM smoothing parameters; available trend_formula supplied original model) trend_linpreds (process model linear predictors identity scale; available trend_formula supplied original model) can character vector providing variables extract use_alias Logical. informative names parameters available (.e. beta coefficients b smoothing parameters rho), replace uninformative names informative alias. Defaults TRUE regex Logical. using one prespecified options extractions, variable treated (vector ) regular expressions? variable x matching least one regular expressions selected. Defaults FALSE. ... Ignored inc_warmup warmup draws included? Defaults FALSE.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract posterior draws from fitted mvgam objects — mvgam_draws","text":"data.frame, matrix, array containing posterior draws.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract posterior draws from fitted mvgam objects — mvgam_draws","text":"","code":"if (FALSE) { sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = sim$data_train, family = Gamma()) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars)}"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":null,"dir":"Reference","previous_headings":"","what":"Supported mvgam families — mvgam_families","title":"Supported mvgam families — mvgam_families","text":"Supported mvgam families","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Supported mvgam families — mvgam_families","text":"","code":"tweedie(link = \"log\") student_t(link = \"identity\") nmix(link = \"log\")"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Supported mvgam families — mvgam_families","text":"link specification family link function. present changed","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Supported mvgam families — mvgam_families","text":"mvgam currently supports following standard observation families: gaussian identity link, real-valued data poisson log-link, count data Gamma log-link, non-negative real-valued data addition, following extended families mgcv brms packages supported: betar logit-link, proportional data (0,1) nb log-link, count data lognormal identity-link, non-negative real-valued data Finally, mvgam supports three extended families described : tweedie log-link, count data (power parameter p fixed 1.5) student_t() (student) identity-link, real-valued data nmix count data imperfect detection modeled via State-Space N-Mixture model. latent states Poisson (log link), capturing 'true' latent abundance, observation process Binomial account imperfect detection. observation formula models used set linear predictor detection probability (logit link). See example detailed worked explanation nmix() family poisson(), nb(), tweedie() available using JAGS. families, apart tweedie(), supported using Stan. Note currently possible change default link functions mvgam, call change silently ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Supported mvgam families — mvgam_families","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Supported mvgam families — mvgam_families","text":"","code":"if (FALSE) { # An N-mixture example using family nmix() # Simulate data from a Poisson-Binomial N-Mixture model # True abundance is predicted by a single nonlinear function of temperature # as well as a nonlinear long-term trend (as a Gaussian Process function) set.seed(123) gamdat <- gamSim(n = 80); N <- NROW(gamdat) abund_linpred <- gamdat$y; temperature <- gamdat$x2 trend <- mvgam:::sim_gp(rnorm(3, 0, 0.1), alpha_gp = 3, rho_gp = 16, h = N) true_abund <- floor(10 + abund_linpred + trend) # Detection probability increases linearly with decreasing rainfall rainfall <- rnorm(N) detect_linpred <- 0.4 + -0.55 * rainfall detect_prob <- plogis(detect_linpred) # Simulate observed counts obs_abund <- rbinom(N, size = true_abund, prob = detect_prob) # Plot true and observed time series plot(true_abund, type = 'l', ylab = 'Abundance', xlab = 'Time', ylim = c(0, max(true_abund)), bty = 'l', lwd = 2) lines(obs_abund, col = 'darkred', lwd = 2) title(main = 'True = black; observed = red') # Gather data into a dataframe suitable for mvgam modelling; # This will require a 'cap' variable specifying the maximum K to marginalise # over when estimating latent abundance (it does NOT have to be a fixed value) model_dat <- data.frame(obs_abund, temperature, rainfall, cap = max(obs_abund) + 20, time = 1:N, series = as.factor('series1')) # Training and testing folds data_train <- model_dat %>% dplyr::filter(time <= 74) data_test <- model_dat %>% dplyr::filter(time > 74) # Fit a model with informative priors on the two intercept parameters # and on the length scale of the GP temporal trend parameter # Note that the 'trend_formula' applies to the latent count process # (a Poisson process with log-link), while the 'formula' applies to the # detection probability (logit link) mod <- mvgam(formula = obs_abund ~ rainfall, trend_formula = ~ s(temperature, k = 5) + gp(time, k = 10, c = 5/4, scale = FALSE), family = nmix(), data = data_train, newdata = data_test, priors = c(prior(std_normal(), class = '(Intercept)'), prior(normal(2, 2), class = '(Intercept)_trend'), prior(normal(15, 5), class = 'rho_gp_trend(time)'))) # Model summary and diagnostics summary(mod) plot(mod, type = 'residuals') # Intercept parameters mcmc_plot(mod, variable = \"Intercept\", regex = TRUE, type = 'hist') # Hindcasts and forecasts of latent abundance (with truth overlain) fc <- forecast(mod, type = 'latent_N') plot(fc); points(true_abund, pch = 16, cex = 0.8) # Latent abundance predictions are restricted based on 'cap' max(model_dat$cap); range(fc$forecasts[[1]]) # Hindcasts and forecasts of detection probability (with truth overlain) fc <- forecast(mod, type = 'detection') plot(fc); points(detect_prob, pch = 16, cex = 0.8) # Hindcasts and forecasts of observations # (after accounting for detection error) fc <- forecast(mod, type = 'response') plot(fc) # Hindcasts and forecasts of response expectations # (with truth overlain) fc <- forecast(object = mod, type = 'expected') plot(fc); points(detect_prob * true_abund, pch = 16, cex = 0.8) # Plot conditional effects library(ggplot2) # Effects on true abundance can be visualised using type = 'link' abund_plots <- plot(conditional_effects(mod, type = 'link', effects = c('temperature', 'time')), plot = FALSE) # Effect of temperature on abundance abund_plots[[1]] + ylab('Latent abundance') # Long-term trend in abundance abund_plots[[2]] + ylab('Latent abundance') # Effect of rainfall on detection probability det_plot <- plot(conditional_effects(mod, type = 'detection', effects = 'rainfall'), plot = FALSE) det_plot[[1]] + ylab('Pr(detection)') # More targeted plots can use marginaleffects capabilities; # Here visualise how response predictions might change # if we considered different possible 'cap' limits on latent # abundance and different rainfall measurements plot_predictions(mod, condition = list('temperature', cap = c(15, 20, 40), rainfall = c(-1, 1)), type = 'response', conf_level = 0.5) + ylab('Observed abundance') + theme_classic() }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_forecast-class.html","id":null,"dir":"Reference","previous_headings":"","what":"mvgam_forecast object description — mvgam_forecast-class","title":"mvgam_forecast object description — mvgam_forecast-class","text":"mvgam_forecast object returned function hindcast forecast. Run methods(class = \"mvgam_forecast\") see overview available methods.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_forecast-class.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"mvgam_forecast object description — mvgam_forecast-class","text":"mvgam_forecast object contains following elements: call original observation model formula trend_call trend_formula supplied, original trend model formula returned. Otherwise NULL family character description observation distribution family_pars list containing draws family-specific parameters (.e. shape, scale overdispersion parameters). returned type = link. Otherwise NULL trend_model character description latent trend model drift Logical specifying whether drift term used trend model use_lv Logical flag indicating whether latent dynamic factors used model fit_engine Character describing fit engine, either stan jags type type predictions included (either link, response trend) series_names Names time series, taken levels(data$series) original model fit train_observations list training observation vectors length n_series train_times vector unique training times test_observations forecast function used, list test observation vectors length n_series. Otherwise NULL test_times forecast function used, vector unique validation (testing) times. Otherwise NULL hindcasts list posterior hindcast distributions length n_series. forecasts forecast function used, list posterior forecast distributions length n_series. Otherwise NULL","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_forecast-class.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"mvgam_forecast object description — mvgam_forecast-class","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_formulae.html","id":null,"dir":"Reference","previous_headings":"","what":"Details of formula specifications in mvgam — mvgam_formulae","title":"Details of formula specifications in mvgam — mvgam_formulae","text":"Details formula specifications mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_formulae.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Details of formula specifications in mvgam — mvgam_formulae","text":"mvgam accept observation model formula optional process model formula (via argument trend_formula). Neither formulae can specified lists, contrary accepted behaviour mgcv brms models. Note possible supply empty formula predictors intercepts observation model (.e. y ~ 0 y ~ -1). case, intercept-observation model set intercept coefficient fixed zero. can handy wish fit pure State-Space models variation dynamic trend controls average expectation, /intercepts non-identifiable. formulae supplied mvgam exactly like supplied glm except smooth terms, s, te, ti t2, time-varying effects using dynamic, monotonically increasing (using s(x, bs = 'moi')) decreasing splines (using s(x, bs = 'mod'); see smooth.construct.moi.smooth.spec details), well Gaussian Process functions using gp, can added right hand side (. supported mvgam formulae). details specifying different kinds smooth functions, control behaviours modifying potential complexities / penalties behave, can found extensive documentation mgcv package.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_formulae.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Details of formula specifications in mvgam — mvgam_formulae","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_marginaleffects.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper functions for mvgam marginaleffects calculations — mvgam_marginaleffects","title":"Helper functions for mvgam marginaleffects calculations — mvgam_marginaleffects","text":"Helper functions mvgam marginaleffects calculations Functions needed working marginaleffects Functions needed getting data / objects insight","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_marginaleffects.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper functions for mvgam marginaleffects calculations — mvgam_marginaleffects","text":"","code":"# S3 method for mvgam get_coef(model, trend_effects = FALSE, ...) # S3 method for mvgam set_coef(model, coefs, trend_effects = FALSE, ...) # S3 method for mvgam get_vcov(model, vcov = NULL, ...) # S3 method for mvgam get_predict(model, newdata, type = \"response\", process_error = FALSE, ...) # S3 method for mvgam get_data(x, source = \"environment\", verbose = TRUE, ...) # S3 method for mvgam_prefit get_data(x, source = \"environment\", verbose = TRUE, ...) # S3 method for mvgam find_predictors( x, effects = c(\"fixed\", \"random\", \"all\"), component = c(\"all\", \"conditional\", \"zi\", \"zero_inflated\", \"dispersion\", \"instruments\", \"correlation\", \"smooth_terms\"), flatten = FALSE, verbose = TRUE, ... ) # S3 method for mvgam_prefit find_predictors( x, effects = c(\"fixed\", \"random\", \"all\"), component = c(\"all\", \"conditional\", \"zi\", \"zero_inflated\", \"dispersion\", \"instruments\", \"correlation\", \"smooth_terms\"), flatten = FALSE, verbose = TRUE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_marginaleffects.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Helper functions for mvgam marginaleffects calculations — mvgam_marginaleffects","text":"model Model object trend_effects logical, extract process model component (applicable trend_formula specified model) ... Additional arguments passed predict() method supplied modeling package.arguments particularly useful mixed-effects bayesian models (see online vignettes marginaleffects website). Available arguments can vary model model, depending range supported arguments modeling package. See \"Model-Specific Arguments\" section ?slopes documentation non-exhaustive list available arguments. coefs vector coefficients insert model object vcov Type uncertainty estimates report (e.g., robust standard errors). Acceptable values: FALSE: compute standard errors. can speed computation considerably. TRUE: Unit-level standard errors using default vcov(model) variance-covariance matrix. String indicates kind uncertainty estimates return. Heteroskedasticity-consistent: \"HC\", \"HC0\", \"HC1\", \"HC2\", \"HC3\", \"HC4\", \"HC4m\", \"HC5\". See ?sandwich::vcovHC Heteroskedasticity autocorrelation consistent: \"HAC\" Mixed-Models degrees freedom: \"satterthwaite\", \"kenward-roger\" : \"NeweyWest\", \"KernHAC\", \"OPG\". See sandwich package documentation. One-sided formula indicates name cluster variables (e.g., ~unit_id). formula passed cluster argument sandwich::vcovCL function. Square covariance matrix Function returns covariance matrix (e.g., stats::vcov(model)) newdata Grid predictor values evaluate slopes. Warning: Please avoid modifying dataset fitting model calling marginaleffects function. can sometimes lead unexpected results. NULL (default): Unit-level slopes observed value dataset (empirical distribution). dataset retrieved using insight::get_data(), tries extract data environment. may produce unexpected results original data frame altered since fitting model. datagrid() call specify custom grid regressors. example: newdata = datagrid(cyl = c(4, 6)): cyl variable equal 4 6 regressors fixed means modes. See Examples section datagrid() documentation. string: \"mean\": Marginal Effects Mean. Slopes predictor held mean mode. \"median\": Marginal Effects Median. Slopes predictor held median mode. \"marginalmeans\": Marginal Effects Marginal Means. See Details section . \"tukey\": Marginal Effects Tukey's 5 numbers. \"grid\": Marginal Effects grid representative numbers (Tukey's 5 numbers unique values categorical predictors). type string indicates type (scale) predictions used compute contrasts slopes. can differ based model type, typically string : \"response\", \"link\", \"probs\", \"zero\". unsupported string entered, model-specific list acceptable values returned error message. type NULL, first entry error message used default. process_error logical. TRUE, uncertainty latent process (trend) model incorporated predictions x fitted model. source String, indicating data recovered. source = \"environment\" (default), data recovered environment (e.g. data workspace). option usually fastest way getting data ensures original variables used model fitting returned. Note always current data recovered environment. Hence, data modified model fitting (e.g., variables recoded rows filtered), returned data may longer equal model data. source = \"frame\" (\"mf\"), data taken model frame. transformed variables back-transformed, possible. option returns data even available environment, however, certain edge cases back-transforming original data may fail. source = \"environment\" fails recover data, tries extract data model frame; source = \"frame\" data extracted model frame, data recovered environment. ways returns observations missing data variables used model fitting. verbose Toggle messages warnings. effects model data fixed effects (\"fixed\"), random effects (\"random\") (\"\") returned? applies mixed gee models. component predictor variables, predictor variables conditional model, zero-inflated part model, dispersion term instrumental variables returned? Applies models zero-inflated /dispersion formula, models instrumental variable (called fixed-effects regressions). May abbreviated. Note conditional component also called count mean component, depending model. flatten Logical, TRUE, values returned character vector, list. Duplicated values removed.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_marginaleffects.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Helper functions for mvgam marginaleffects calculations — mvgam_marginaleffects","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_trends.html","id":null,"dir":"Reference","previous_headings":"","what":"Supported mvgam trend models — mvgam_trends","title":"Supported mvgam trend models — mvgam_trends","text":"Supported mvgam trend models","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_trends.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Supported mvgam trend models — mvgam_trends","text":"mvgam currently supports following dynamic trend models: None (latent trend component; .e. GAM component contributes linear predictor, observation process source error; similarly estimated gam) RW() AR(p = 1, 2, 3) VAR()(available Stan) PW() (piecewise linear logistic trends; available Stan) GP() (Gaussian Process squared exponential kernel; available Stan) types apart GP() PW(), moving average /correlated process error terms can also estimated (example, RW(cor = TRUE) set multivariate Random Walk data contains >1 series). Character strings can also supplied instead various trend functions. full list possible models currently supported : 'RW' 'RWMA' 'RWcor' 'RWMAcor' 'AR1' 'AR1MA' 'AR1cor' 'AR1MAcor' 'AR2' 'AR2MA' 'AR2cor' 'AR2MAcor' 'AR3' 'AR3MA' 'AR3cor' 'AR3MAcor' 'VAR' 'VARcor' 'VAR1' ('VAR') 'VAR1cor' ('VARcor') 'VARMA' 'VARMAcor' 'VARMA1,1cor' 'PWlinear' 'PWlogistic' 'GP' 'None' Note RW, AR1, AR2 AR3 available using JAGS. trend models supported using Stan. Dynamic factor models can used latent factors evolve either RW, AR1-3 GP. VAR models (.e. VAR VARcor models), users can either fix trend error covariances 0 (using VAR) estimate potentially allow contemporaneously correlated errors using VARcor. VAR models, stationarity latent process enforced prior using parameterisation given Heaps (2022). Stationarity enforced using AR1, AR2 AR3 models, though can changed user specifying lower upper bounds autoregressive parameters using functionality get_mvgam_priors priors argument mvgam. Piecewise trends follow formulation popular prophet package produced Facebook, users can allow changepoints control potential flexibility trend. See Taylor Letham (2018) details","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/mvgam_trends.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Supported mvgam trend models — mvgam_trends","text":"Sarah E. Heaps (2022) Enforcing stationarity prior Vector Autoregressions. Journal Computational Graphical Statistics. 32:1, 1-10. Sean J. Taylor Benjamin Letham (2018) Forecasting scale. American Statistician 72.1, 37-45.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a matrix of output plots from a mvgam object — pairs.mvgam","title":"Create a matrix of output plots from a mvgam object — pairs.mvgam","text":"pairs method customized MCMC output.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a matrix of output plots from a mvgam object — pairs.mvgam","text":"","code":"# S3 method for mvgam pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a matrix of output plots from a mvgam object — pairs.mvgam","text":"x object class mvgam variable Names variables (parameters) plot, given character vector regular expression (regex = TRUE). default, hopefully large selection variables plotted. regex Logical; Indicates whether variable treated regular expressions. Defaults FALSE. use_alias Logical. informative names parameters available (.e. beta coefficients b smoothing parameters rho), replace uninformative names informative alias. Defaults TRUE ... arguments passed mcmc_pairs.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a matrix of output plots from a mvgam object — pairs.mvgam","text":"detailed description see mcmc_pairs.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a matrix of output plots from a mvgam object — pairs.mvgam","text":"","code":"if (FALSE) { simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_fc.html","id":null,"dir":"Reference","previous_headings":"","what":"Forecast from a particle filtered mvgam object — pfilter_mvgam_fc","title":"Forecast from a particle filtered mvgam object — pfilter_mvgam_fc","text":"function generates forecast set particles capture unique proposal current state system modelled mvgam object. covariate timepoint information data_test used generate GAM component forecast, trends run forward time according state space dynamics. forecast weighted ensemble, weights determined particle's proposal likelihood prior recent assimilation step","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_fc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Forecast from a particle filtered mvgam object — pfilter_mvgam_fc","text":"","code":"pfilter_mvgam_fc( file_path = \"pfilter\", n_cores = 2, newdata, data_test, plot_legend = TRUE, legend_position = \"topleft\", ylim, return_forecasts = FALSE )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_fc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Forecast from a particle filtered mvgam object — pfilter_mvgam_fc","text":"file_path character string specifying file path particles saved n_cores integer specifying number cores generating particle forecasts parallel newdata dataframe list test data containing least 'series' time', addition variables included linear predictor formula data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows plot_legend logical stating whether include legend highlight observations used calibration assimilated particle filter legend_position legend location may specified setting x single keyword list \"bottomright\", \"bottom\", \"bottomleft\", \"left\", \"topleft\", \"top\", \"topright\", \"right\" \"center\". places legend inside plot frame given location. ylim Optional vector y-axis limits (min, max). limits used plots return_forecasts logical. TRUE, returned list object contain plots forecasts well forecast objects (matrix dimension n_particles x horizon)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_fc.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Forecast from a particle filtered mvgam object — pfilter_mvgam_fc","text":"named list containing functions call base R plots series' forecast. Optionally actual forecasts returned within list separate list matrices","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_init.html","id":null,"dir":"Reference","previous_headings":"","what":"Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init","title":"Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init","text":"function generates set particles captures unique proposal current state system. next observation data_assim assimilated particles weighted proposal's multivariate composite likelihood update model's forecast distribution","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_init.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init","text":"","code":"pfilter_mvgam_init( object, newdata, data_assim, n_particles = 1000, file_path = \"pfilter\", n_cores = 2 )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_init.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init","text":"object list object returned mvgam newdata dataframe list test data containing least one observation per series (beyond last observation seen model object) assimilated particle filter. least contain 'series' 'time' one-step ahead horizon, addition variables included linear predictor object data_assim Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows n_particles integer specifying number unique particles generate tracking latent system state file_path character string specifying file path saving initiated particles n_cores integer specifying number cores generating particle forecasts parallel","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_init.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Initiate particles for online filtering from a fitted mvgam object — pfilter_mvgam_init","text":"list object length = n_particles containing information parameters current state estimates particle generated saved, along important information original model, .rda object file_path","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_online.html","id":null,"dir":"Reference","previous_headings":"","what":"Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online","title":"Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online","text":"function operates sequentially new observations data_assim update posterior forecast distribution. wrapper calls pfilter_mvgam_smooth. iteration, next observation assimilated particles weighted proposal's multivariate composite likelihood","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_online.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online","text":"","code":"pfilter_mvgam_online( newdata, data_assim, file_path = \"pfilter\", threshold = 0.5, use_resampling = FALSE, kernel_lambda = 0.25, n_cores = 1 )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_online.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online","text":"newdata dataframe list test data containing least one observation per series (beyond last observation seen model initialising particles pfilter_mvgam_init previous calls pfilter_mvgam_online. least contain 'series' 'time' one-step ahead horizon, addition variables included linear predictor object data_assim Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows file_path character string specifying file path locating particles threshold proportional numeric specifying Effective Sample Size limit resampling particles triggered (calculated ESS / n_particles) use_resampling == TRUE. 0 1 use_resampling logical specifying whether resampling used ESS falls specified threshold. Default option FALSE, relying instead kernel smoothing maintain particle diversity kernel_lambda proportional numeric specifying strength kernel smoothing use pulling low weight particles toward high likelihood state space. 0 1 n_cores integer specifying number cores generating particle forecasts parallel","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_online.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Automatic online particle filtering for assimilating new observations into a fitted mvgam model — pfilter_mvgam_online","text":"list object length = n_particles containing information parameters current state estimates particle generated saved, along important information original model, .rda object file_path","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_smooth.html","id":null,"dir":"Reference","previous_headings":"","what":"Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth","title":"Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth","text":"function operates new observation next_assim update posterior forecast distribution. next observation assimilated particle weights updated light recent multivariate composite likelihood. Low weight particles smoothed towards high weight state space using importance sampling, options given using resampling high weight particles Effective Sample Size falls user-specified threshold","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_smooth.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth","text":"","code":"pfilter_mvgam_smooth( particles, mgcv_model, next_assim, threshold = 0.25, n_cores = 1, use_resampling = FALSE, kernel_lambda = 0.5 )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_smooth.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth","text":"particles list particles run one observation prior observation next_assim mgcv_model gam model returned call link{mvgam} next_assim dataframe test data containing one observation per series (beyond last observation seen model initialising particles pfilter_mvgam_init previous calls pfilter_mvgam_online. least contain 'series' 'time' one-step ahead horizon, addition variables included linear predictor object threshold proportional numeric specifying Effective Sample Size limit resampling particles triggered (calculated ESS / n_particles) use_resampling == TRUE. 0 1 n_cores integer specifying number cores generating particle forecasts parallel use_resampling logical specifying whether resampling used ESS falls specified threshold. Note resampling can result loss original model's diversity GAM beta coefficients, may undesirable consequences forecast distribution. use_resampling TRUE, effort made remedy assigning randomly sampled draws GAM beta coefficients original model's distribution particle. however guarantee loss diversity, especially successive resampling take place. Default option therefore FALSE kernel_lambda proportional numeric specifying strength smoothing use pulling low weight particles toward high likelihood state space. 0 1","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_smooth.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Assimilate new observations into a fitted mvgam model using resampling and kernel smoothing — pfilter_mvgam_smooth","text":"list object length = n_particles containing information parameters current state estimates particle","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":null,"dir":"Reference","previous_headings":"","what":"Specify piecewise linear or logistic trends — PW","title":"Specify piecewise linear or logistic trends — PW","text":"Set piecewise linear logistic trend models mvgam. functions evaluate arguments – exist purely help set model particular piecewise trend models.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Specify piecewise linear or logistic trends — PW","text":"","code":"PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = \"linear\" )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Specify piecewise linear or logistic trends — PW","text":"n_changepoints non-negative integer specifying number potential changepoints. Potential changepoints selected uniformly first changepoint_range proportion timepoints data. Default 10 changepoint_range Proportion history data trend changepoints estimated. Defaults 0.8 first 80%. changepoint_scale Parameter modulating flexibility automatic changepoint selection altering scale parameter Laplace distribution. resulting prior double_exponential(0, changepoint_scale). Large values allow many changepoints flexible trend, small values allow changepoints. Default 0.05. growth Character string specifying either 'linear' 'logistic' growth trend. 'logistic', variable labelled cap MUST data specify maximum saturation point trend (see details examples mvgam information). Default 'linear'.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Specify piecewise linear or logistic trends — PW","text":"object class mvgam_trend, contains list arguments interpreted parsing functions mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Specify piecewise linear or logistic trends — PW","text":"Offsets intercepts: trend models, offset parameter included trend estimation process. parameter incredibly difficult identify also include intercept observation formula. reason, highly recommended drop intercept formula (.e. y ~ x + 0 y ~ x - 1, x optional predictor terms). Logistic growth cap variable: forecasting growth, often maximum achievable point time series can reach. example, total market size, total population size carrying capacity population dynamics. can advantageous forecast saturate near point predictions sensible. function allows make forecasts using logistic growth trend model, specified carrying capacity. Note capacity need static time, can vary series x timepoint combination necessary. must supply cap value observation data using growth = 'logistic'. observation families use non-identity link function, cap value internally transformed link scale (.e. specified cap log transformed using poisson() nb() family). therefore important specify cap values scale outcome. Note also missing values allowed cap.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Specify piecewise linear or logistic trends — PW","text":"Taylor, Sean J., Benjamin Letham. \"Forecasting scale.\" American Statistician 72.1 (2018): 37-45.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See magrittr::%>% details.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pipe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pipe operator — %>%","text":"","code":"lhs %>% rhs"},{"path":"https://nicholasjclark.github.io/mvgam/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe operator — %>%","text":"lhs value magrittr placeholder. rhs function call using magrittr semantics.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/pipe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pipe operator — %>%","text":"result calling rhs(lhs).","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Default mvgam plots — plot.mvgam","title":"Default mvgam plots — plot.mvgam","text":"function takes fitted mvgam object produces plots smooth functions, forecasts, trends uncertainty components","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Default mvgam plots — plot.mvgam","text":"","code":"# S3 method for mvgam plot( x, type = \"residuals\", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Default mvgam plots — plot.mvgam","text":"x list object returned mvgam. See mvgam() type character specifying type plot return. Options : series, residuals, smooths, re (random effect smooths), pterms (parametric effects), forecast, trend, uncertainty, factors series integer specifying series set plotted. ignored type == 're' residuals logical. TRUE type = 'smooths', posterior quantiles partial residuals added plots 1-D smooths series ribbon rectangles. Partial residuals smooth term median Dunn-Smyth residuals obtained dropping term concerned model, leaving estimates fixed (.e. estimates term plus original median Dunn-Smyth residuals). Note mvgam works Dunn-Smyth residuals working residuals, used mgcv, magnitudes partial residuals different expect plot.gam. Interpretation similar though, partial residuals evenly scattered around smooth function function well estimated newdata Optional dataframe list test data containing least 'series' 'time' addition variables included linear predictor original formula. argument optional plotting sample forecast period observations (type = forecast) required plotting uncertainty components (type = uncertainty). data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows trend_effects logical. TRUE trend_formula used model fitting, terms trend (.e. process) model plotted ... Additional arguments individual plotting function.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Default mvgam plots — plot.mvgam","text":"base R plot set plots","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Default mvgam plots — plot.mvgam","text":"plots useful getting overview fitted model estimated random effects smooth functions, individual plotting functions functions marginaleffects package offer far customisation.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Default mvgam plots — plot.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Default mvgam plots — plot.mvgam","text":"","code":"if (FALSE) { # Simulate some time series dat <- sim_mvgam(T = 80, n_series = 3) # Fit a basic model mod <- mvgam(y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = 'RW') # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with marginaleffects utilities plot_predictions(mod, condition = 'season', type = 'link') plot_predictions(mod, condition = c('season', 'series', 'series'), type = 'link') plot_predictions(mod, condition = 'series', type = 'link') # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam(y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = 'RW') plot(mod, type = 'smooths', trend_effects = TRUE) # But marginaleffects functions work without any modification plot_predictions(mod, condition = 'season', type = 'link') }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam_lfo.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot Pareto-k and ELPD values from a leave-future-out object — plot.mvgam_lfo","title":"Plot Pareto-k and ELPD values from a leave-future-out object — plot.mvgam_lfo","text":"function takes object class mvgam_lfo create several informative diagnostic plots","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam_lfo.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot Pareto-k and ELPD values from a leave-future-out object — plot.mvgam_lfo","text":"","code":"# S3 method for mvgam_lfo plot(x, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam_lfo.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot Pareto-k and ELPD values from a leave-future-out object — plot.mvgam_lfo","text":"x object class mvgam_lfo ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot.mvgam_lfo.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot Pareto-k and ELPD values from a leave-future-out object — plot.mvgam_lfo","text":"base R plot Pareto-k ELPD values evaluation timepoints. Pareto-k plot, dashed red line indicates specified threshold chosen triggering model refits. ELPD plot, dashed red line indicated bottom 10% quantile ELPD values. Points threshold may represent outliers difficult forecast","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_effects.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Effect plot as implemented in marginaleffects — plot_effects.mvgam","title":"Effect plot as implemented in marginaleffects — plot_effects.mvgam","text":"Convenient way call marginal conditional effect plotting functions implemented marginaleffects package","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_effects.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Effect plot as implemented in marginaleffects — plot_effects.mvgam","text":"","code":"plot_effects(object, ...) # S3 method for mvgam plot_effects( object, condition = NULL, by = NULL, newdata = NULL, type = NULL, conf_level = 0.95, wts = NULL, transform = NULL, points = 0, rug = FALSE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_effects.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Effect plot as implemented in marginaleffects — plot_effects.mvgam","text":"... Additional arguments passed predict() method supplied modeling package.arguments particularly useful mixed-effects bayesian models (see online vignettes marginaleffects website). Available arguments can vary model model, depending range supported arguments modeling package. See \"Model-Specific Arguments\" section ?marginaleffects documentation non-exhaustive list available arguments. condition Conditional predictions Character vector (max length 3): Names predictors display. Named list (max length 3): List names correspond predictors. List elements can : Numeric vector Function returns numeric vector set unique categorical values Shortcut strings common reference values: \"minmax\", \"quartile\", \"threenum\" 1: x-axis. 2: color/shape. 3: facets. Numeric variables positions 2 3 summarized Tukey's five numbers ?stats::fivenum Marginal predictions Character vector (max length 3): Names categorical predictors marginalize across. 1: x-axis. 2: color. 3: facets. newdata newdata NULL, grid determined condition argument. newdata NULL, argument behaves way predictions() function. type string indicates type (scale) predictions used compute contrasts slopes. can differ based model type, typically string : \"response\", \"link\", \"probs\", \"zero\". unsupported string entered, model-specific list acceptable values returned error message. type NULL, default value used. default first model-related row marginaleffects:::type_dictionary dataframe. conf_level numeric value 0 1. Confidence level use build confidence interval. wts string numeric: weights use computing average contrasts slopes. weights affect averaging avg_*() argument, unit-level estimates . Internally, estimates weights passed weighted.mean() function. string: column name weights variable newdata. supplying column name wts, recommended supply original data (including weights variable) explicitly newdata. numeric: vector length equal number rows original data newdata (supplied). transform function applied unit-level adjusted predictions confidence intervals just function returns results. bayesian models, function applied individual draws posterior distribution, computing summaries. points Number 0 1 controls transparency raw data points. 0 (default) display points. rug TRUE displays tick marks axes mark distribution raw data.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_effects.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Effect plot as implemented in marginaleffects — plot_effects.mvgam","text":"ggplot object can customized using ggplot2 package","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":null,"dir":"Reference","previous_headings":"","what":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"function takes fitted mvgam object returns plots summary statistics latent dynamic factors","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"","code":"plot_mvgam_factors(object, plot = TRUE)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"object list object returned mvgam. See mvgam() plot logical specifying whether factors plotted","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"dataframe factor contributions , optionally, series base R plots","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"model object estimated using dynamic factors, possible factors contributed estimated trends. due regularisation penalty acts independently factor's Gaussian precision, squeeze un-needed factors white noise process (effectively dropping factor model). function, factor tested null hypothesis white noise calculating sum factor's 2nd derivatives. factor larger contribution larger sum due weaker penalty factor's precision. plot == TRUE, factors also plotted.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Latent factor summaries for a fitted mvgam object — plot_mvgam_factors","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","title":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","text":"Plot mvgam posterior predictions specified series","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","text":"","code":"plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) # S3 method for mvgam_forecast plot( x, series = 1, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, return_score = FALSE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","text":"object list object returned mvgam. See mvgam() series integer specifying series set plotted newdata Optional dataframe list test data containing least 'series' 'time' addition variables included linear predictor original formula. included, covariate information newdata used generate forecasts fitted model equations. newdata originally included call mvgam, forecasts already produced generative model simply extracted plotted. However newdata supplied original model call, assumption made newdata supplied comes sequentially data supplied data original model (.e. assume time gap last observation series 1 data first observation series 1 newdata). newdata contains observations column y, observations used compute Discrete Rank Probability Score forecast distribution data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows realisations logical. TRUE, forecast realisations shown spaghetti plot, making easier visualise diversity possible forecasts. FALSE, default, empirical quantiles forecast distribution shown n_realisations integer specifying number posterior realisations plot, realisations = TRUE. Ignored otherwise hide_xlabels logical. TRUE, xlabels printed allow user add custom labels using axis base R xlab label x axis. ylab label y axis. ylim Optional vector y-axis limits (min, max) n_cores integer specifying number cores generating forecasts parallel return_forecasts logical. TRUE, function plot forecast well returning forecast object (matrix dimension n_samples x horizon) return_score logical. TRUE sample test data provided newdata, probabilistic score calculated returned. score used depend observation family fitted model. Discrete families (poisson, negative binomial, tweedie) use Discrete Rank Probability Score. families use Continuous Rank Probability Score. value returned sum scores within sample forecast horizon ... par graphical parameters. x Object class mvgam_forecast","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","text":"base R graphics plot optional list containing forecast distribution sample probabilistic forecast score","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot mvgam posterior predictions for a specified series — plot_mvgam_forecasts","text":"plot_mvgam_fc draws posterior predictions object class mvgam calculates posterior empirical quantiles. plot.mvgam_forecast takes object class mvgam_forecast, forecasts already computed, plots resulting forecast distribution. realisations = FALSE, posterior quantiles plotted along true observed data used train model. Otherwise, spaghetti plot returned show possible forecast paths.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","title":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","text":"function plots posterior empirical quantiles partial effects parametric terms","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","text":"","code":"plot_mvgam_pterms(object, trend_effects = FALSE)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","text":"object list object returned mvgam. See mvgam() trend_effects logical. TRUE trend_formula used model fitting, terms trend (.e. process) model plotted","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","text":"base R graphics plot","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot mvgam parametric term partial effects — plot_mvgam_pterms","text":"Posterior empirical quantiles parametric term's partial effect estimates (link scale) calculated visualised ribbon plots. effects can interpreted partial effect parametric term contributes terms model set 0","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam random effect terms — plot_mvgam_randomeffects","title":"Plot mvgam random effect terms — plot_mvgam_randomeffects","text":"function plots posterior empirical quantiles random effect smooths (bs = re)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam random effect terms — plot_mvgam_randomeffects","text":"","code":"plot_mvgam_randomeffects(object, trend_effects = FALSE)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam random effect terms — plot_mvgam_randomeffects","text":"object list object returned mvgam. See mvgam() trend_effects logical. TRUE trend_formula used model fitting, terms trend (.e. process) model plotted","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot mvgam random effect terms — plot_mvgam_randomeffects","text":"base R graphics plot","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot mvgam random effect terms — plot_mvgam_randomeffects","text":"Posterior empirical quantiles random effect coefficient estimates (link scale) calculated visualised ribbon plots. Labels coefficients taken levels original factor variable used specify smooth model's formula","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":null,"dir":"Reference","previous_headings":"","what":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"function takes fitted mvgam object returns various residual diagnostic plots","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"","code":"plot_mvgam_resids(object, series = 1, newdata, data_test)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"object list object returned mvgam. See mvgam() series integer specifying series set plotted newdata Optional dataframe list test data containing least 'series', 'y', 'time' addition variables included linear predictor formula. included, covariate information newdata used generate forecasts fitted model equations. newdata originally included call mvgam, forecasts already produced generative model simply extracted used calculate residuals. However newdata supplied original model call, assumption made newdata supplied comes sequentially data supplied data original model (.e. assume time gap last observation series 1 data_train first observation series 1 newdata). data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"series base R plots","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"total four base R plots generated examine Dunn-Smyth residuals specified series. Plots include residuals vs fitted values plot, Q-Q plot, two plots check remaining temporal autocorrelation residuals. Note, plots use posterior medians fitted values / residuals, uncertainty represented.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Residual diagnostics for a fitted mvgam object — plot_mvgam_resids","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot observed time series used for mvgam modelling — plot_mvgam_series","title":"Plot observed time series used for mvgam modelling — plot_mvgam_series","text":"function takes either fitted mvgam object data_train object produces plots observed time series, ACF, CDF histograms exploratory data analysis","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot observed time series used for mvgam modelling — plot_mvgam_series","text":"","code":"plot_mvgam_series( object, data, data_train, newdata, data_test, y = \"y\", lines = TRUE, series = 1, n_bins, log_scale = FALSE )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot observed time series used for mvgam modelling — plot_mvgam_series","text":"object Optional list object returned mvgam. Either object data_train must supplied. data Optional dataframe list training data containing least 'series' 'time'. Use argument training data gathered correct format mvgam modelling model yet fitted. data_train Deprecated. Still works place data users recommended use data instead seamless integration R workflows newdata Optional dataframe list test data containing least 'series' 'time' forecast horizon, addition variables included linear predictor formula. included, observed values test data compared model's forecast distribution exploring biases model predictions. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows y Character. name outcome variable supplied data? Defaults 'y' lines Logical. TRUE, line plots used visualising time series. FALSE, points used. series Either integer specifying series set plotted string '', plots series available supplied data n_bins integer specifying number bins use binning observed values plotting histogram. Default use number bins returned call hist base R log_scale logical. series == '', flag used control whether time series plot shown log scale (using log(Y + 1)). can useful visualising many series may different observed ranges. Default FALSE","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot observed time series used for mvgam modelling — plot_mvgam_series","text":"set base R graphics plots. series integer, plots show observed time series, autocorrelation cumulative distribution functions, histogram series. series == '', set observed time series plots returned series shown plot single focal series highlighted, remaining series shown faint gray lines.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Plot observed time series used for mvgam modelling — plot_mvgam_series","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam smooth terms — plot_mvgam_smooth","title":"Plot mvgam smooth terms — plot_mvgam_smooth","text":"function plots posterior empirical quantiles series-specific smooth term","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam smooth terms — plot_mvgam_smooth","text":"","code":"plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam smooth terms — plot_mvgam_smooth","text":"object list object returned mvgam. See mvgam() trend_effects logical. TRUE trend_formula used model fitting, terms trend (.e. process) model plotted series integer specifying series set plotted smooth either character integer specifying smooth term plotted residuals logical. TRUE posterior quantiles partial residuals added plots 1-D smooths series ribbon rectangles. Partial residuals smooth term median Dunn-Smyth residuals obtained dropping term concerned model, leaving estimates fixed (.e. estimates term plus original median Dunn-Smyth residuals). Note mvgam works Dunn-Smyth residuals working residuals, used mgcv, magnitudes partial residuals different expect plot.gam. Interpretation similar though, partial residuals evenly scattered around smooth function function well estimated n_resid_bins integer specifying number bins group covariate plotting partial residuals. Setting argument high can make messy plots difficult interpret, setting low likely mask potentially useful patterns partial residuals. Default 25 realisations logical. TRUE, posterior realisations shown spaghetti plot, making easier visualise diversity possible functions. FALSE, default, empirical quantiles posterior distribution shown n_realisations integer specifying number posterior realisations plot, realisations = TRUE. Ignored otherwise derivatives logical. TRUE, additional plot returned show estimated 1st derivative specified smooth (Note, works univariate smooths) newdata Optional dataframe predicting smooth, containing least 'series' addition variables included linear predictor original model's formula. Note currently supported plotting univariate smooths","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot mvgam smooth terms — plot_mvgam_smooth","text":"base R graphics plot","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot mvgam smooth terms — plot_mvgam_smooth","text":"Smooth functions shown empirical quantiles (spaghetti plots) posterior partial expectations across sequence 500 values variable's min max, zeroing effects variables. present, univariate bivariate smooth plots allowed, though note bivariate smooths rely default behaviour plot.gam. nuanced visualisation, supply newdata just predicting gam model","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_trend.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam latent trend for a specified series — plot_mvgam_trend","title":"Plot mvgam latent trend for a specified series — plot_mvgam_trend","text":"Plot mvgam latent trend specified series","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_trend.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam latent trend for a specified series — plot_mvgam_trend","text":"","code":"plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, hide_xlabels = FALSE, xlab, ylab, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_trend.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam latent trend for a specified series — plot_mvgam_trend","text":"object list object returned mvgam. See mvgam() series integer specifying series set plotted newdata Optional dataframe list test data containing least 'series' 'time' addition variables included linear predictor original formula. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows realisations logical. TRUE, posterior trend realisations shown spaghetti plot, making easier visualise diversity possible trend paths. FALSE, default, empirical quantiles posterior distribution shown n_realisations integer specifying number posterior realisations plot, realisations = TRUE. Ignored otherwise n_cores integer specifying number cores generating trend forecasts parallel derivatives logical. TRUE, additional plot returned show estimated 1st derivative estimated trend hide_xlabels logical. TRUE, xlabels printed allow user add custom labels using axis base R. Ignored derivatives = TRUE xlab label x axis. ylab label y axis. ... par graphical parameters.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_uncertainty.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam forecast uncertainty contributions for a specified series — plot_mvgam_uncertainty","title":"Plot mvgam forecast uncertainty contributions for a specified series — plot_mvgam_uncertainty","text":"Plot mvgam forecast uncertainty contributions specified series","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_uncertainty.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam forecast uncertainty contributions for a specified series — plot_mvgam_uncertainty","text":"","code":"plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = \"topleft\", hide_xlabels = FALSE )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_uncertainty.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam forecast uncertainty contributions for a specified series — plot_mvgam_uncertainty","text":"object list object returned mvgam. See mvgam() series integer specifying series set plotted newdata dataframe list containing least 'series' 'time' forecast horizon, addition variables included linear predictor formula data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows legend_position location may also specified setting x single keyword list: \"none\", \"bottomright\", \"bottom\", \"bottomleft\", \"left\", \"topleft\", \"top\", \"topright\", \"right\" \"center\". places legend inside plot frame given location (\"none\"). hide_xlabels logical. TRUE, xlabels printed allow user add custom labels using axis base R","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/portal_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Portal Project rodent capture survey data — portal_data","title":"Portal Project rodent capture survey data — portal_data","text":"dataset containing timeseries total captures (across control plots) select rodent species Portal Project","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/portal_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Portal Project rodent capture survey data — portal_data","text":"","code":"portal_data"},{"path":"https://nicholasjclark.github.io/mvgam/reference/portal_data.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Portal Project rodent capture survey data — portal_data","text":"dataframe containing following fields: moon time sampling lunar cycles DM Total captures species Dipodomys merriami Total captures species Dipodomys ordii PP Total captures species Chaetodipus penicillatus OT Total captures species Onychomys torridus year Sampling year month Sampling month mintemp Monthly mean minimum temperature precipitation Monthly mean precipitation ndvi Monthly mean Normalised Difference Vegetation Index","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/portal_data.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Portal Project rodent capture survey data — portal_data","text":"https://github.com/weecology/PortalData/blob/main/SiteandMethods/Methods.md","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"Compute posterior draws expected value posterior predictive distribution (.e. conditional expectation). Can performed data used fit model (posterior predictive checks) new data. definition, predictions smaller variance posterior predictions performed posterior_predict.mvgam method. uncertainty expected value posterior predictive distribution incorporated draws computed posterior_epred residual error ignored . However, estimated means methods averaged across draws similar.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"","code":"# S3 method for mvgam posterior_epred(object, newdata, data_test, process_error = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"object list object returned mvgam. See mvgam() newdata Optional dataframe list test data containing variables included linear predictor formula. supplied, predictions generated original observations used model fit. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows process_error Logical. TRUE newdata supplied, expected uncertainty process model accounted using draws latent trend SD parameters. FALSE, uncertainty latent trend component ignored calculating predictions. newdata supplied, draws fitted model's posterior predictive distribution used (always include uncertainty latent trend components) ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"matrix dimension n_samples x new_obs, n_samples number posterior samples fitted object n_obs number observations newdata","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"Note types predictions models include trend_formula, uncertainty dynamic trend component can ignored setting process_error = FALSE. However, trend_formula supplied model, predictions component ignored. process_error = TRUE, trend predictions ignore autocorrelation coefficients GP length scale coefficients, ultimately assuming process stationary. method similar types posterior predictions returned brms models using autocorrelated error predictions newdata. function therefore suited posterior simulation GAM components mvgam model, forecasting functions plot_mvgam_fc forecast.mvgam better suited generate h-step ahead forecasts respect temporal dynamics estimated latent trends.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Draws from the Expected Value of the Posterior Predictive Distribution — posterior_epred.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"Compute posterior draws linear predictor, draws applying link functions transformations. Can performed data used fit model (posterior predictive checks) new data.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"","code":"# S3 method for mvgam posterior_linpred( object, transform = FALSE, newdata, data_test, process_error = TRUE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"object list object returned mvgam. See mvgam() transform Logical; FALSE (default), draws linear predictor returned. TRUE, draws transformed linear predictor, .e. conditional expectation, returned. newdata Optional dataframe list test data containing variables included linear predictor formula. supplied, predictions generated original observations used model fit. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows process_error Logical. TRUE newdata supplied, expected uncertainty process model accounted using draws latent trend SD parameters. FALSE, uncertainty latent trend component ignored calculating predictions. newdata supplied, draws fitted model's posterior predictive distribution used (always include uncertainty latent trend components) ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"matrix dimension n_samples x new_obs, n_samples number posterior samples fitted object n_obs number observations newdata","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"Note types predictions models include trend_formula, uncertainty dynamic trend component can ignored setting process_error = FALSE. However, trend_formula supplied model, predictions component ignored. process_error = TRUE, trend predictions ignore autocorrelation coefficients GP length scale coefficients, ultimately assuming process stationary. method similar types posterior predictions returned brms models using autocorrelated error predictions newdata. function therefore suited posterior simulation GAM components mvgam model, forecasting functions plot_mvgam_fc forecast.mvgam better suited generate h-step ahead forecasts respect temporal dynamics estimated latent trends.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Posterior Draws of the Linear Predictor — posterior_linpred.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"Compute posterior draws posterior predictive distribution. Can performed data used fit model (posterior predictive checks) new data. definition, draws higher variance draws expected value posterior predictive distribution computed posterior_epred.mvgam. residual error incorporated posterior_predict. However, estimated means methods averaged across draws similar.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"","code":"# S3 method for mvgam posterior_predict(object, newdata, data_test, process_error = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"object list object returned mvgam. See mvgam() newdata Optional dataframe list test data containing variables included linear predictor formula. supplied, predictions generated original observations used model fit. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows process_error Logical. TRUE newdata supplied, expected uncertainty process model accounted using draws latent trend SD parameters. FALSE, uncertainty latent trend component ignored calculating predictions. newdata supplied, draws fitted model's posterior predictive distribution used (always include uncertainty latent trend components) ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"matrix dimension n_samples x new_obs, n_samples number posterior samples fitted object n_obs number observations newdata","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"Note types predictions models include trend_formula, uncertainty dynamic trend component can ignored setting process_error = FALSE. However, trend_formula supplied model, predictions component ignored. process_error = TRUE, trend predictions ignore autocorrelation coefficients GP length scale coefficients, ultimately assuming process stationary. method similar types posterior predictions returned brms models using autocorrelated error predictions newdata. function therefore suited posterior simulation GAM components mvgam model, forecasting functions plot_mvgam_fc forecast.mvgam better suited generate h-step ahead forecasts respect temporal dynamics estimated latent trends.","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Draws from the Posterior Predictive Distribution — posterior_predict.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"Plot mvgam posterior predictive checks specified series","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"","code":"ppc(object, ...) # S3 method for mvgam ppc( object, newdata, data_test, series = 1, type = \"hist\", n_bins, legend_position, xlab, ylab, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"object list object returned mvgam. See mvgam() ... par graphical parameters. newdata Optional dataframe list test data containing least 'series' 'time' forecast horizon, addition variables included linear predictor formula. included, observed values test data compared model's forecast distribution exploring biases model predictions. Note useful newdata also included fitting original model. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows series integer specifying series set plotted type character specifying type posterior predictive check calculate plot. Valid options : 'rootogram', 'mean', 'hist', 'density', 'prop_zero', 'pit' 'cdf' n_bins integer specifying number bins use binning observed values plotting rootogram histogram. Default 50 bins rootogram, means >50 unique observed values, bins used prevent overplotting facilitate interpretation. Default histogram use number bins returned call hist base R legend_position location may also specified setting x single keyword list \"bottomright\", \"bottom\", \"bottomleft\", \"left\", \"topleft\", \"top\", \"topright\", \"right\" \"center\". places legend inside plot frame given location. alternatively, use \"none\" hide legend. xlab label x axis. ylab label y axis.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"base R graphics plot showing either posterior rootogram (type == 'rootogram'), predicted vs observed mean series (type == 'mean'), predicted vs observed proportion zeroes series (type == 'prop_zero'),predicted vs observed histogram series (type == 'hist'), kernel density empirical CDF estimates posterior predictions (type == 'density' type == 'cdf') Probability Integral Transform histogram (type == 'pit').","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"Posterior predictions drawn fitted mvgam compared empirical distribution observed data specified series help evaluate model's ability generate unbiased predictions. plots apart 'rootogram', posterior predictions can also compared sample observations long observations included 'data_test' original model fit supplied . Rootograms currently plotted using 'hanging' style","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot mvgam posterior predictive checks for a specified series — ppc.mvgam","text":"","code":"if (FALSE) { # Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian()) # Posterior checks ppc(mod, type = 'hist') ppc(mod, type = 'density') ppc(mod, type = 'cdf') }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Predict from the GAM component of an mvgam model — predict.mvgam","title":"Predict from the GAM component of an mvgam model — predict.mvgam","text":"Predict GAM component mvgam model","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Predict from the GAM component of an mvgam model — predict.mvgam","text":"","code":"# S3 method for mvgam predict(object, newdata, data_test, type = \"link\", process_error = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Predict from the GAM component of an mvgam model — predict.mvgam","text":"object list object returned mvgam. See mvgam() newdata Optional dataframe list test data containing variables included linear predictor formula. supplied, predictions generated original observations used model fit. data_test Deprecated. Still works place newdata users recommended use newdata instead seamless integration R workflows type value link (default) linear predictor calculated link scale. expected used, predictions reflect expectation response (mean) ignore uncertainty observation process. response used, predictions take uncertainty observation process account return predictions outcome scale. variance used, variance response respect mean (mean-variance relationship) returned. Two special cases also allowed: type latent_N return estimated latent abundances N-mixture distribution, type detection return estimated detection probability N-mixture distribution process_error Logical. TRUE dynamic trend model fit, expected uncertainty process model accounted using draws latent trend SD parameters. FALSE, uncertainty latent trend component ignored calculating predictions ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Predict from the GAM component of an mvgam model — predict.mvgam","text":"matrix dimension n_samples x new_obs, n_samples number posterior samples fitted object n_obs number test observations newdata","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Predict from the GAM component of an mvgam model — predict.mvgam","text":"Note types predictions models include trend_formula, uncertainty dynamic trend component can ignored setting process_error = FALSE. However, trend_formula supplied model, predictions component ignored. process_error = TRUE, trend predictions ignore autocorrelation coefficients GP length scale coefficients, ultimately assuming process stationary. method similar types posterior predictions returned brms models using autocorrelated error predictions newdata. function therefore suited posterior simulation GAM components mvgam model, forecasting functions plot_mvgam_fc forecast.mvgam better suited generate h-step ahead forecasts respect temporal dynamics estimated latent trends.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Summary for a fitted mvgam object — print.mvgam","title":"Summary for a fitted mvgam object — print.mvgam","text":"function takes fitted mvgam object prints quick summary","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summary for a fitted mvgam object — print.mvgam","text":"","code":"# S3 method for mvgam print(x, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summary for a fitted mvgam object — print.mvgam","text":"x list object returned mvgam ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summary for a fitted mvgam object — print.mvgam","text":"list printed -screen","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Summary for a fitted mvgam object — print.mvgam","text":"brief summary model's call printed","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Summary for a fitted mvgam object — print.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Posterior draws of mvgam residuals — residuals.mvgam","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"method extracts posterior draws Dunn-Smyth (randomized quantile) residuals order data supplied model. included additional arguments obtaining summaries computed residuals","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"","code":"# S3 method for mvgam residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"object object class mvgam summary summary statistics returned instead raw values? Default TRUE.. robust FALSE (default) mean used measure central tendency standard deviation measure variability. TRUE, median median absolute deviation (MAD) applied instead. used summary TRUE. probs percentiles computed quantile function. used summary TRUE. ... arguments passed prepare_predictions control several aspects data validation prediction.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"array randomized quantile residual values. summary = FALSE output resembles posterior_epred.mvgam predict.mvgam. summary = TRUE output n_observations x E matrix. number summary statistics E equal 2 + length(probs): Estimate column contains point estimates (either mean median depending argument robust), Est.Error column contains uncertainty estimates (either standard deviation median absolute deviation depending argument robust). remaining columns starting Q contain quantile estimates specified via argument probs.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"method gives residuals Dunn-Smyth (randomized quantile) residuals. observations missing (.e. NA) original data missing values residuals","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Posterior draws of mvgam residuals — residuals.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Extract posterior residuals resids <- residuals(mod) str(resids) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/RW.html","id":null,"dir":"Reference","previous_headings":"","what":"Specify autoregressive dynamic processes — RW","title":"Specify autoregressive dynamic processes — RW","text":"Set autoregressive autoregressive moving average trend models mvgam. functions evaluate arguments – exist purely help set model particular autoregressive trend models.","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/RW.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Specify autoregressive dynamic processes — RW","text":"","code":"RW(ma = FALSE, cor = FALSE) AR(p = 1, ma = FALSE, cor = FALSE) VAR(ma = FALSE, cor = FALSE)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/RW.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Specify autoregressive dynamic processes — RW","text":"ma Logical Include moving average terms order 1? Default FALSE. cor Logical Include correlated process errors part multivariate normal process model? TRUE n_series > 1 supplied data, fully structured covariance matrix estimated process errors. Default FALSE. p non-negative integer specifying autoregressive (AR) order. Default 1. currently larger 3","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/RW.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Specify autoregressive dynamic processes — RW","text":"object class mvgam_trend, contains list arguments interpreted parsing functions mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","title":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","text":"Compute probabilistic forecast scores mvgam objects","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","text":"","code":"# S3 method for mvgam_forecast score( object, score = \"crps\", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","text":"object mvgam_forecast object. See forecast.mvgam(). score character specifying type proper scoring rule use evaluation. Options : sis (.e. Scaled Interval Score), energy, variogram, elpd (.e. Expected log pointwise Predictive Density), drps (.e. Discrete Rank Probability Score) crps (Continuous Rank Probability Score). Note choosing elpd, supplied object must forecasts link scale expectations can calculated prior scoring. scores, forecasts supplied response scale (.e. posterior predictions) log logical. forecasts truths logged prior scoring? often appropriate comparing performance models series vary observation ranges weights optional vector weights (length(weights) == n_series) weighting pairwise correlations evaluating variogram score multivariate forecasts. Useful -weighting series larger magnitude observations less interest forecasting. Ignored score != 'variogram' interval_width proportional value [0.05,0.95] defining forecast interval calculating coverage , score = 'sis', calculating interval score n_cores integer specifying number cores calculating scores parallel ... Ignored","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","text":"list containing scores interval coverages per forecast horizon. score %% c('drps', 'crps', 'elpd'), list also contain return sum series-level scores per horizon. score %% c('energy','variogram'), series-level scores computed score returned series. scores apart elpd, in_interval column series-level slot binary indicator whether true value within forecast's corresponding posterior empirical quantiles. Intervals calculated using elpd forecasts contain linear predictors","code":""},{"path":[]},{"path":"https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute probabilistic forecast scores for mvgam objects — score.mvgam_forecast","text":"","code":"if (FALSE) { # Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam(y ~ 1, trend_model = 'RW', data = data$data_train, newdata = data$data_test) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores) }"},{"path":"https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","title":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","text":"function converts univariate multivariate time series (xts ts objects) format necessary mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","text":"","code":"series_to_mvgam(series, freq, train_prop = 0.85)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","text":"series xts ts object converted mvgam format freq integer. seasonal frequency series train_prop numeric stating proportion data use training. 0.25 0.95","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","text":"list object containing outputs needed mvgam, including 'data_train' 'data_test'","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"This function converts univariate or multivariate time series (xts or ts objects)\r\nto the format necessary for mvgam — series_to_mvgam","text":"","code":"# A ts object example data(\"sunspots\") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) #> blood bone #> [1,] 58.0 58.0 #> [2,] 62.6 62.6 #> [3,] 70.0 70.0 #> [4,] 55.7 55.7 #> [5,] 85.0 85.0 #> [6,] 83.5 83.5 series_to_mvgam(series, frequency(series), 0.85) #> $data_train #> y season year date series time #> 1 58.0 1 1749 1749-01-01 00:00:00 blood 1 #> 2 58.0 1 1749 1749-01-01 00:00:00 bone 1 #> 3 62.6 2 1749 1749-01-31 10:00:00 blood 2 #> 4 62.6 2 1749 1749-01-31 10:00:00 bone 2 #> 5 70.0 3 1749 1749-03-02 20:00:01 blood 3 #> 6 70.0 3 1749 1749-03-02 20:00:01 bone 3 #> 7 55.7 4 1749 1749-04-02 06:00:00 blood 4 #> 8 55.7 4 1749 1749-04-02 06:00:00 bone 4 #> 9 85.0 5 1749 1749-05-02 16:00:00 blood 5 #> 10 85.0 5 1749 1749-05-02 16:00:00 bone 5 #> 11 83.5 6 1749 1749-06-02 02:00:01 blood 6 #> 12 83.5 6 1749 1749-06-02 02:00:01 bone 6 #> 13 94.8 7 1749 1749-07-02 12:00:00 blood 7 #> 14 94.8 7 1749 1749-07-02 12:00:00 bone 7 #> 15 66.3 8 1749 1749-08-01 22:00:00 blood 8 #> 16 66.3 8 1749 1749-08-01 22:00:00 bone 8 #> 17 75.9 9 1749 1749-09-01 08:00:01 blood 9 #> 18 75.9 9 1749 1749-09-01 08:00:01 bone 9 #> 19 75.5 10 1749 1749-10-01 18:00:00 blood 10 #> 20 75.5 10 1749 1749-10-01 18:00:00 bone 10 #> 21 158.6 11 1749 1749-11-01 04:00:00 blood 11 #> 22 158.6 11 1749 1749-11-01 04:00:00 bone 11 #> 23 85.2 12 1749 1749-12-01 14:00:01 blood 12 #> 24 85.2 12 1749 1749-12-01 14:00:01 bone 12 #> 25 73.3 1 1750 1750-01-01 00:00:00 blood 13 #> 26 73.3 1 1750 1750-01-01 00:00:00 bone 13 #> 27 75.9 2 1750 1750-01-31 10:00:00 blood 14 #> 28 75.9 2 1750 1750-01-31 10:00:00 bone 14 #> 29 89.2 3 1750 1750-03-02 20:00:01 blood 15 #> 30 89.2 3 1750 1750-03-02 20:00:01 bone 15 #> 31 88.3 4 1750 1750-04-02 06:00:00 blood 16 #> 32 88.3 4 1750 1750-04-02 06:00:00 bone 16 #> 33 90.0 5 1750 1750-05-02 16:00:00 blood 17 #> 34 90.0 5 1750 1750-05-02 16:00:00 bone 17 #> 35 100.0 6 1750 1750-06-02 02:00:01 blood 18 #> 36 100.0 6 1750 1750-06-02 02:00:01 bone 18 #> 37 85.4 7 1750 1750-07-02 12:00:00 blood 19 #> 38 85.4 7 1750 1750-07-02 12:00:00 bone 19 #> 39 103.0 8 1750 1750-08-01 22:00:00 blood 20 #> 40 103.0 8 1750 1750-08-01 22:00:00 bone 20 #> 41 91.2 9 1750 1750-09-01 08:00:01 blood 21 #> 42 91.2 9 1750 1750-09-01 08:00:01 bone 21 #> 43 65.7 10 1750 1750-10-01 18:00:00 blood 22 #> 44 65.7 10 1750 1750-10-01 18:00:00 bone 22 #> 45 63.3 11 1750 1750-11-01 04:00:00 blood 23 #> 46 63.3 11 1750 1750-11-01 04:00:00 bone 23 #> 47 75.4 12 1750 1750-12-01 14:00:01 blood 24 #> 48 75.4 12 1750 1750-12-01 14:00:01 bone 24 #> 49 70.0 1 1751 1751-01-01 00:00:00 blood 25 #> 50 70.0 1 1751 1751-01-01 00:00:00 bone 25 #> 51 43.5 2 1751 1751-01-31 10:00:00 blood 26 #> 52 43.5 2 1751 1751-01-31 10:00:00 bone 26 #> 53 45.3 3 1751 1751-03-02 20:00:01 blood 27 #> 54 45.3 3 1751 1751-03-02 20:00:01 bone 27 #> 55 56.4 4 1751 1751-04-02 06:00:00 blood 28 #> 56 56.4 4 1751 1751-04-02 06:00:00 bone 28 #> 57 60.7 5 1751 1751-05-02 16:00:00 blood 29 #> 58 60.7 5 1751 1751-05-02 16:00:00 bone 29 #> 59 50.7 6 1751 1751-06-02 02:00:01 blood 30 #> 60 50.7 6 1751 1751-06-02 02:00:01 bone 30 #> 61 66.3 7 1751 1751-07-02 12:00:00 blood 31 #> 62 66.3 7 1751 1751-07-02 12:00:00 bone 31 #> 63 59.8 8 1751 1751-08-01 22:00:00 blood 32 #> 64 59.8 8 1751 1751-08-01 22:00:00 bone 32 #> 65 23.5 9 1751 1751-09-01 08:00:01 blood 33 #> 66 23.5 9 1751 1751-09-01 08:00:01 bone 33 #> 67 23.2 10 1751 1751-10-01 18:00:00 blood 34 #> 68 23.2 10 1751 1751-10-01 18:00:00 bone 34 #> 69 28.5 11 1751 1751-11-01 04:00:00 blood 35 #> 70 28.5 11 1751 1751-11-01 04:00:00 bone 35 #> 71 44.0 12 1751 1751-12-01 14:00:01 blood 36 #> 72 44.0 12 1751 1751-12-01 14:00:01 bone 36 #> 73 35.0 1 1752 1752-01-01 00:00:00 blood 37 #> 74 35.0 1 1752 1752-01-01 00:00:00 bone 37 #> 75 50.0 2 1752 1752-01-31 12:00:00 blood 38 #> 76 50.0 2 1752 1752-01-31 12:00:00 bone 38 #> 77 71.0 3 1752 1752-03-02 00:00:01 blood 39 #> 78 71.0 3 1752 1752-03-02 00:00:01 bone 39 #> 79 59.3 4 1752 1752-04-01 12:00:00 blood 40 #> 80 59.3 4 1752 1752-04-01 12:00:00 bone 40 #> 81 59.7 5 1752 1752-05-02 00:00:00 blood 41 #> 82 59.7 5 1752 1752-05-02 00:00:00 bone 41 #> 83 39.6 6 1752 1752-06-01 12:00:01 blood 42 #> 84 39.6 6 1752 1752-06-01 12:00:01 bone 42 #> 85 78.4 7 1752 1752-07-02 00:00:00 blood 43 #> 86 78.4 7 1752 1752-07-02 00:00:00 bone 43 #> 87 29.3 8 1752 1752-08-01 12:00:00 blood 44 #> 88 29.3 8 1752 1752-08-01 12:00:00 bone 44 #> 89 27.1 9 1752 1752-09-01 00:00:01 blood 45 #> 90 27.1 9 1752 1752-09-01 00:00:01 bone 45 #> 91 46.6 10 1752 1752-10-01 12:00:00 blood 46 #> 92 46.6 10 1752 1752-10-01 12:00:00 bone 46 #> 93 37.6 11 1752 1752-11-01 00:00:00 blood 47 #> 94 37.6 11 1752 1752-11-01 00:00:00 bone 47 #> 95 40.0 12 1752 1752-12-01 12:00:01 blood 48 #> 96 40.0 12 1752 1752-12-01 12:00:01 bone 48 #> 97 44.0 1 1753 1753-01-01 00:00:00 blood 49 #> 98 44.0 1 1753 1753-01-01 00:00:00 bone 49 #> 99 32.0 2 1753 1753-01-31 10:00:00 blood 50 #> 100 32.0 2 1753 1753-01-31 10:00:00 bone 50 #> 101 45.7 3 1753 1753-03-02 20:00:01 blood 51 #> 102 45.7 3 1753 1753-03-02 20:00:01 bone 51 #> 103 38.0 4 1753 1753-04-02 06:00:00 blood 52 #> 104 38.0 4 1753 1753-04-02 06:00:00 bone 52 #> 105 36.0 5 1753 1753-05-02 16:00:00 blood 53 #> 106 36.0 5 1753 1753-05-02 16:00:00 bone 53 #> 107 31.7 6 1753 1753-06-02 02:00:01 blood 54 #> 108 31.7 6 1753 1753-06-02 02:00:01 bone 54 #> 109 22.2 7 1753 1753-07-02 12:00:00 blood 55 #> 110 22.2 7 1753 1753-07-02 12:00:00 bone 55 #> 111 39.0 8 1753 1753-08-01 22:00:00 blood 56 #> 112 39.0 8 1753 1753-08-01 22:00:00 bone 56 #> 113 28.0 9 1753 1753-09-01 08:00:01 blood 57 #> 114 28.0 9 1753 1753-09-01 08:00:01 bone 57 #> 115 25.0 10 1753 1753-10-01 18:00:00 blood 58 #> 116 25.0 10 1753 1753-10-01 18:00:00 bone 58 #> 117 20.0 11 1753 1753-11-01 04:00:00 blood 59 #> 118 20.0 11 1753 1753-11-01 04:00:00 bone 59 #> 119 6.7 12 1753 1753-12-01 14:00:01 blood 60 #> 120 6.7 12 1753 1753-12-01 14:00:01 bone 60 #> 121 0.0 1 1754 1754-01-01 00:00:00 blood 61 #> 122 0.0 1 1754 1754-01-01 00:00:00 bone 61 #> 123 3.0 2 1754 1754-01-31 10:00:00 blood 62 #> 124 3.0 2 1754 1754-01-31 10:00:00 bone 62 #> 125 1.7 3 1754 1754-03-02 20:00:01 blood 63 #> 126 1.7 3 1754 1754-03-02 20:00:01 bone 63 #> 127 13.7 4 1754 1754-04-02 06:00:00 blood 64 #> 128 13.7 4 1754 1754-04-02 06:00:00 bone 64 #> 129 20.7 5 1754 1754-05-02 16:00:00 blood 65 #> 130 20.7 5 1754 1754-05-02 16:00:00 bone 65 #> 131 26.7 6 1754 1754-06-02 02:00:01 blood 66 #> 132 26.7 6 1754 1754-06-02 02:00:01 bone 66 #> 133 18.8 7 1754 1754-07-02 12:00:00 blood 67 #> 134 18.8 7 1754 1754-07-02 12:00:00 bone 67 #> 135 12.3 8 1754 1754-08-01 22:00:00 blood 68 #> 136 12.3 8 1754 1754-08-01 22:00:00 bone 68 #> 137 8.2 9 1754 1754-09-01 08:00:01 blood 69 #> 138 8.2 9 1754 1754-09-01 08:00:01 bone 69 #> 139 24.1 10 1754 1754-10-01 18:00:00 blood 70 #> 140 24.1 10 1754 1754-10-01 18:00:00 bone 70 #> 141 13.2 11 1754 1754-11-01 04:00:00 blood 71 #> 142 13.2 11 1754 1754-11-01 04:00:00 bone 71 #> 143 4.2 12 1754 1754-12-01 14:00:01 blood 72 #> 144 4.2 12 1754 1754-12-01 14:00:01 bone 72 #> 145 10.2 1 1755 1755-01-01 00:00:00 blood 73 #> 146 10.2 1 1755 1755-01-01 00:00:00 bone 73 #> 147 11.2 2 1755 1755-01-31 10:00:00 blood 74 #> 148 11.2 2 1755 1755-01-31 10:00:00 bone 74 #> 149 6.8 3 1755 1755-03-02 20:00:01 blood 75 #> 150 6.8 3 1755 1755-03-02 20:00:01 bone 75 #> 151 6.5 4 1755 1755-04-02 06:00:00 blood 76 #> 152 6.5 4 1755 1755-04-02 06:00:00 bone 76 #> 153 0.0 5 1755 1755-05-02 16:00:00 blood 77 #> 154 0.0 5 1755 1755-05-02 16:00:00 bone 77 #> 155 0.0 6 1755 1755-06-02 02:00:01 blood 78 #> 156 0.0 6 1755 1755-06-02 02:00:01 bone 78 #> 157 8.6 7 1755 1755-07-02 12:00:00 blood 79 #> 158 8.6 7 1755 1755-07-02 12:00:00 bone 79 #> 159 3.2 8 1755 1755-08-01 22:00:00 blood 80 #> 160 3.2 8 1755 1755-08-01 22:00:00 bone 80 #> 161 17.8 9 1755 1755-09-01 08:00:01 blood 81 #> 162 17.8 9 1755 1755-09-01 08:00:01 bone 81 #> 163 23.7 10 1755 1755-10-01 18:00:00 blood 82 #> 164 23.7 10 1755 1755-10-01 18:00:00 bone 82 #> 165 6.8 11 1755 1755-11-01 04:00:00 blood 83 #> 166 6.8 11 1755 1755-11-01 04:00:00 bone 83 #> 167 20.0 12 1755 1755-12-01 14:00:01 blood 84 #> 168 20.0 12 1755 1755-12-01 14:00:01 bone 84 #> 169 12.5 1 1756 1756-01-01 00:00:00 blood 85 #> 170 12.5 1 1756 1756-01-01 00:00:00 bone 85 #> 171 7.1 2 1756 1756-01-31 12:00:00 blood 86 #> 172 7.1 2 1756 1756-01-31 12:00:00 bone 86 #> 173 5.4 3 1756 1756-03-02 00:00:01 blood 87 #> 174 5.4 3 1756 1756-03-02 00:00:01 bone 87 #> 175 9.4 4 1756 1756-04-01 12:00:00 blood 88 #> 176 9.4 4 1756 1756-04-01 12:00:00 bone 88 #> 177 12.5 5 1756 1756-05-02 00:00:00 blood 89 #> 178 12.5 5 1756 1756-05-02 00:00:00 bone 89 #> 179 12.9 6 1756 1756-06-01 12:00:01 blood 90 #> 180 12.9 6 1756 1756-06-01 12:00:01 bone 90 #> 181 3.6 7 1756 1756-07-02 00:00:00 blood 91 #> 182 3.6 7 1756 1756-07-02 00:00:00 bone 91 #> 183 6.4 8 1756 1756-08-01 12:00:00 blood 92 #> 184 6.4 8 1756 1756-08-01 12:00:00 bone 92 #> 185 11.8 9 1756 1756-09-01 00:00:01 blood 93 #> 186 11.8 9 1756 1756-09-01 00:00:01 bone 93 #> 187 14.3 10 1756 1756-10-01 12:00:00 blood 94 #> 188 14.3 10 1756 1756-10-01 12:00:00 bone 94 #> 189 17.0 11 1756 1756-11-01 00:00:00 blood 95 #> 190 17.0 11 1756 1756-11-01 00:00:00 bone 95 #> 191 9.4 12 1756 1756-12-01 12:00:01 blood 96 #> 192 9.4 12 1756 1756-12-01 12:00:01 bone 96 #> 193 14.1 1 1757 1757-01-01 00:00:00 blood 97 #> 194 14.1 1 1757 1757-01-01 00:00:00 bone 97 #> 195 21.2 2 1757 1757-01-31 10:00:00 blood 98 #> 196 21.2 2 1757 1757-01-31 10:00:00 bone 98 #> 197 26.2 3 1757 1757-03-02 20:00:01 blood 99 #> 198 26.2 3 1757 1757-03-02 20:00:01 bone 99 #> 199 30.0 4 1757 1757-04-02 06:00:00 blood 100 #> 200 30.0 4 1757 1757-04-02 06:00:00 bone 100 #> 201 38.1 5 1757 1757-05-02 16:00:00 blood 101 #> 202 38.1 5 1757 1757-05-02 16:00:00 bone 101 #> 203 12.8 6 1757 1757-06-02 02:00:01 blood 102 #> 204 12.8 6 1757 1757-06-02 02:00:01 bone 102 #> 205 25.0 7 1757 1757-07-02 12:00:00 blood 103 #> 206 25.0 7 1757 1757-07-02 12:00:00 bone 103 #> 207 51.3 8 1757 1757-08-01 22:00:00 blood 104 #> 208 51.3 8 1757 1757-08-01 22:00:00 bone 104 #> 209 39.7 9 1757 1757-09-01 08:00:01 blood 105 #> 210 39.7 9 1757 1757-09-01 08:00:01 bone 105 #> 211 32.5 10 1757 1757-10-01 18:00:00 blood 106 #> 212 32.5 10 1757 1757-10-01 18:00:00 bone 106 #> 213 64.7 11 1757 1757-11-01 04:00:00 blood 107 #> 214 64.7 11 1757 1757-11-01 04:00:00 bone 107 #> 215 33.5 12 1757 1757-12-01 14:00:01 blood 108 #> 216 33.5 12 1757 1757-12-01 14:00:01 bone 108 #> 217 37.6 1 1758 1758-01-01 00:00:00 blood 109 #> 218 37.6 1 1758 1758-01-01 00:00:00 bone 109 #> 219 52.0 2 1758 1758-01-31 10:00:00 blood 110 #> 220 52.0 2 1758 1758-01-31 10:00:00 bone 110 #> 221 49.0 3 1758 1758-03-02 20:00:01 blood 111 #> 222 49.0 3 1758 1758-03-02 20:00:01 bone 111 #> 223 72.3 4 1758 1758-04-02 06:00:00 blood 112 #> 224 72.3 4 1758 1758-04-02 06:00:00 bone 112 #> 225 46.4 5 1758 1758-05-02 16:00:00 blood 113 #> 226 46.4 5 1758 1758-05-02 16:00:00 bone 113 #> 227 45.0 6 1758 1758-06-02 02:00:01 blood 114 #> 228 45.0 6 1758 1758-06-02 02:00:01 bone 114 #> 229 44.0 7 1758 1758-07-02 12:00:00 blood 115 #> 230 44.0 7 1758 1758-07-02 12:00:00 bone 115 #> 231 38.7 8 1758 1758-08-01 22:00:00 blood 116 #> 232 38.7 8 1758 1758-08-01 22:00:00 bone 116 #> 233 62.5 9 1758 1758-09-01 08:00:01 blood 117 #> 234 62.5 9 1758 1758-09-01 08:00:01 bone 117 #> 235 37.7 10 1758 1758-10-01 18:00:00 blood 118 #> 236 37.7 10 1758 1758-10-01 18:00:00 bone 118 #> 237 43.0 11 1758 1758-11-01 04:00:00 blood 119 #> 238 43.0 11 1758 1758-11-01 04:00:00 bone 119 #> 239 43.0 12 1758 1758-12-01 14:00:01 blood 120 #> 240 43.0 12 1758 1758-12-01 14:00:01 bone 120 #> 241 48.3 1 1759 1759-01-01 00:00:00 blood 121 #> 242 48.3 1 1759 1759-01-01 00:00:00 bone 121 #> 243 44.0 2 1759 1759-01-31 10:00:00 blood 122 #> 244 44.0 2 1759 1759-01-31 10:00:00 bone 122 #> 245 46.8 3 1759 1759-03-02 20:00:01 blood 123 #> 246 46.8 3 1759 1759-03-02 20:00:01 bone 123 #> 247 47.0 4 1759 1759-04-02 06:00:00 blood 124 #> 248 47.0 4 1759 1759-04-02 06:00:00 bone 124 #> 249 49.0 5 1759 1759-05-02 16:00:00 blood 125 #> 250 49.0 5 1759 1759-05-02 16:00:00 bone 125 #> 251 50.0 6 1759 1759-06-02 02:00:01 blood 126 #> 252 50.0 6 1759 1759-06-02 02:00:01 bone 126 #> 253 51.0 7 1759 1759-07-02 12:00:00 blood 127 #> 254 51.0 7 1759 1759-07-02 12:00:00 bone 127 #> 255 71.3 8 1759 1759-08-01 22:00:00 blood 128 #> 256 71.3 8 1759 1759-08-01 22:00:00 bone 128 #> 257 77.2 9 1759 1759-09-01 08:00:01 blood 129 #> 258 77.2 9 1759 1759-09-01 08:00:01 bone 129 #> 259 59.7 10 1759 1759-10-01 18:00:00 blood 130 #> 260 59.7 10 1759 1759-10-01 18:00:00 bone 130 #> 261 46.3 11 1759 1759-11-01 04:00:00 blood 131 #> 262 46.3 11 1759 1759-11-01 04:00:00 bone 131 #> 263 57.0 12 1759 1759-12-01 14:00:01 blood 132 #> 264 57.0 12 1759 1759-12-01 14:00:01 bone 132 #> 265 67.3 1 1760 1760-01-01 00:00:00 blood 133 #> 266 67.3 1 1760 1760-01-01 00:00:00 bone 133 #> 267 59.5 2 1760 1760-01-31 12:00:00 blood 134 #> 268 59.5 2 1760 1760-01-31 12:00:00 bone 134 #> 269 74.7 3 1760 1760-03-02 00:00:01 blood 135 #> 270 74.7 3 1760 1760-03-02 00:00:01 bone 135 #> 271 58.3 4 1760 1760-04-01 12:00:00 blood 136 #> 272 58.3 4 1760 1760-04-01 12:00:00 bone 136 #> 273 72.0 5 1760 1760-05-02 00:00:00 blood 137 #> 274 72.0 5 1760 1760-05-02 00:00:00 bone 137 #> 275 48.3 6 1760 1760-06-01 12:00:01 blood 138 #> 276 48.3 6 1760 1760-06-01 12:00:01 bone 138 #> 277 66.0 7 1760 1760-07-02 00:00:00 blood 139 #> 278 66.0 7 1760 1760-07-02 00:00:00 bone 139 #> 279 75.6 8 1760 1760-08-01 12:00:00 blood 140 #> 280 75.6 8 1760 1760-08-01 12:00:00 bone 140 #> 281 61.3 9 1760 1760-09-01 00:00:01 blood 141 #> 282 61.3 9 1760 1760-09-01 00:00:01 bone 141 #> 283 50.6 10 1760 1760-10-01 12:00:00 blood 142 #> 284 50.6 10 1760 1760-10-01 12:00:00 bone 142 #> 285 59.7 11 1760 1760-11-01 00:00:00 blood 143 #> 286 59.7 11 1760 1760-11-01 00:00:00 bone 143 #> 287 61.0 12 1760 1760-12-01 12:00:01 blood 144 #> 288 61.0 12 1760 1760-12-01 12:00:01 bone 144 #> 289 70.0 1 1761 1761-01-01 00:00:00 blood 145 #> 290 70.0 1 1761 1761-01-01 00:00:00 bone 145 #> 291 91.0 2 1761 1761-01-31 10:00:00 blood 146 #> 292 91.0 2 1761 1761-01-31 10:00:00 bone 146 #> 293 80.7 3 1761 1761-03-02 20:00:01 blood 147 #> 294 80.7 3 1761 1761-03-02 20:00:01 bone 147 #> 295 71.7 4 1761 1761-04-02 06:00:00 blood 148 #> 296 71.7 4 1761 1761-04-02 06:00:00 bone 148 #> 297 107.2 5 1761 1761-05-02 16:00:00 blood 149 #> 298 107.2 5 1761 1761-05-02 16:00:00 bone 149 #> 299 99.3 6 1761 1761-06-02 02:00:01 blood 150 #> 300 99.3 6 1761 1761-06-02 02:00:01 bone 150 #> 301 94.1 7 1761 1761-07-02 12:00:00 blood 151 #> 302 94.1 7 1761 1761-07-02 12:00:00 bone 151 #> 303 91.1 8 1761 1761-08-01 22:00:00 blood 152 #> 304 91.1 8 1761 1761-08-01 22:00:00 bone 152 #> 305 100.7 9 1761 1761-09-01 08:00:01 blood 153 #> 306 100.7 9 1761 1761-09-01 08:00:01 bone 153 #> 307 88.7 10 1761 1761-10-01 18:00:00 blood 154 #> 308 88.7 10 1761 1761-10-01 18:00:00 bone 154 #> 309 89.7 11 1761 1761-11-01 04:00:00 blood 155 #> 310 89.7 11 1761 1761-11-01 04:00:00 bone 155 #> 311 46.0 12 1761 1761-12-01 14:00:01 blood 156 #> 312 46.0 12 1761 1761-12-01 14:00:01 bone 156 #> 313 43.8 1 1762 1762-01-01 00:00:00 blood 157 #> 314 43.8 1 1762 1762-01-01 00:00:00 bone 157 #> 315 72.8 2 1762 1762-01-31 10:00:00 blood 158 #> 316 72.8 2 1762 1762-01-31 10:00:00 bone 158 #> 317 45.7 3 1762 1762-03-02 20:00:01 blood 159 #> 318 45.7 3 1762 1762-03-02 20:00:01 bone 159 #> 319 60.2 4 1762 1762-04-02 06:00:00 blood 160 #> 320 60.2 4 1762 1762-04-02 06:00:00 bone 160 #> 321 39.9 5 1762 1762-05-02 16:00:00 blood 161 #> 322 39.9 5 1762 1762-05-02 16:00:00 bone 161 #> 323 77.1 6 1762 1762-06-02 02:00:01 blood 162 #> 324 77.1 6 1762 1762-06-02 02:00:01 bone 162 #> 325 33.8 7 1762 1762-07-02 12:00:00 blood 163 #> 326 33.8 7 1762 1762-07-02 12:00:00 bone 163 #> 327 67.7 8 1762 1762-08-01 22:00:00 blood 164 #> 328 67.7 8 1762 1762-08-01 22:00:00 bone 164 #> 329 68.5 9 1762 1762-09-01 08:00:01 blood 165 #> 330 68.5 9 1762 1762-09-01 08:00:01 bone 165 #> 331 69.3 10 1762 1762-10-01 18:00:00 blood 166 #> 332 69.3 10 1762 1762-10-01 18:00:00 bone 166 #> 333 77.8 11 1762 1762-11-01 04:00:00 blood 167 #> 334 77.8 11 1762 1762-11-01 04:00:00 bone 167 #> 335 77.2 12 1762 1762-12-01 14:00:01 blood 168 #> 336 77.2 12 1762 1762-12-01 14:00:01 bone 168 #> 337 56.5 1 1763 1763-01-01 00:00:00 blood 169 #> 338 56.5 1 1763 1763-01-01 00:00:00 bone 169 #> 339 31.9 2 1763 1763-01-31 10:00:00 blood 170 #> 340 31.9 2 1763 1763-01-31 10:00:00 bone 170 #> 341 34.2 3 1763 1763-03-02 20:00:01 blood 171 #> 342 34.2 3 1763 1763-03-02 20:00:01 bone 171 #> 343 32.9 4 1763 1763-04-02 06:00:00 blood 172 #> 344 32.9 4 1763 1763-04-02 06:00:00 bone 172 #> 345 32.7 5 1763 1763-05-02 16:00:00 blood 173 #> 346 32.7 5 1763 1763-05-02 16:00:00 bone 173 #> 347 35.8 6 1763 1763-06-02 02:00:01 blood 174 #> 348 35.8 6 1763 1763-06-02 02:00:01 bone 174 #> 349 54.2 7 1763 1763-07-02 12:00:00 blood 175 #> 350 54.2 7 1763 1763-07-02 12:00:00 bone 175 #> 351 26.5 8 1763 1763-08-01 22:00:00 blood 176 #> 352 26.5 8 1763 1763-08-01 22:00:00 bone 176 #> 353 68.1 9 1763 1763-09-01 08:00:01 blood 177 #> 354 68.1 9 1763 1763-09-01 08:00:01 bone 177 #> 355 46.3 10 1763 1763-10-01 18:00:00 blood 178 #> 356 46.3 10 1763 1763-10-01 18:00:00 bone 178 #> 357 60.9 11 1763 1763-11-01 04:00:00 blood 179 #> 358 60.9 11 1763 1763-11-01 04:00:00 bone 179 #> 359 61.4 12 1763 1763-12-01 14:00:01 blood 180 #> 360 61.4 12 1763 1763-12-01 14:00:01 bone 180 #> 361 59.7 1 1764 1764-01-01 00:00:00 blood 181 #> 362 59.7 1 1764 1764-01-01 00:00:00 bone 181 #> 363 59.7 2 1764 1764-01-31 12:00:00 blood 182 #> 364 59.7 2 1764 1764-01-31 12:00:00 bone 182 #> 365 40.2 3 1764 1764-03-02 00:00:01 blood 183 #> 366 40.2 3 1764 1764-03-02 00:00:01 bone 183 #> 367 34.4 4 1764 1764-04-01 12:00:00 blood 184 #> 368 34.4 4 1764 1764-04-01 12:00:00 bone 184 #> 369 44.3 5 1764 1764-05-02 00:00:00 blood 185 #> 370 44.3 5 1764 1764-05-02 00:00:00 bone 185 #> 371 30.0 6 1764 1764-06-01 12:00:01 blood 186 #> 372 30.0 6 1764 1764-06-01 12:00:01 bone 186 #> 373 30.0 7 1764 1764-07-02 00:00:00 blood 187 #> 374 30.0 7 1764 1764-07-02 00:00:00 bone 187 #> 375 30.0 8 1764 1764-08-01 12:00:00 blood 188 #> 376 30.0 8 1764 1764-08-01 12:00:00 bone 188 #> 377 28.2 9 1764 1764-09-01 00:00:01 blood 189 #> 378 28.2 9 1764 1764-09-01 00:00:01 bone 189 #> 379 28.0 10 1764 1764-10-01 12:00:00 blood 190 #> 380 28.0 10 1764 1764-10-01 12:00:00 bone 190 #> 381 26.0 11 1764 1764-11-01 00:00:00 blood 191 #> 382 26.0 11 1764 1764-11-01 00:00:00 bone 191 #> 383 25.7 12 1764 1764-12-01 12:00:01 blood 192 #> 384 25.7 12 1764 1764-12-01 12:00:01 bone 192 #> 385 24.0 1 1765 1765-01-01 00:00:00 blood 193 #> 386 24.0 1 1765 1765-01-01 00:00:00 bone 193 #> 387 26.0 2 1765 1765-01-31 10:00:00 blood 194 #> 388 26.0 2 1765 1765-01-31 10:00:00 bone 194 #> 389 25.0 3 1765 1765-03-02 20:00:01 blood 195 #> 390 25.0 3 1765 1765-03-02 20:00:01 bone 195 #> 391 22.0 4 1765 1765-04-02 06:00:00 blood 196 #> 392 22.0 4 1765 1765-04-02 06:00:00 bone 196 #> 393 20.2 5 1765 1765-05-02 16:00:00 blood 197 #> 394 20.2 5 1765 1765-05-02 16:00:00 bone 197 #> 395 20.0 6 1765 1765-06-02 02:00:01 blood 198 #> 396 20.0 6 1765 1765-06-02 02:00:01 bone 198 #> 397 27.0 7 1765 1765-07-02 12:00:00 blood 199 #> 398 27.0 7 1765 1765-07-02 12:00:00 bone 199 #> 399 29.7 8 1765 1765-08-01 22:00:00 blood 200 #> 400 29.7 8 1765 1765-08-01 22:00:00 bone 200 #> 401 16.0 9 1765 1765-09-01 08:00:01 blood 201 #> 402 16.0 9 1765 1765-09-01 08:00:01 bone 201 #> 403 14.0 10 1765 1765-10-01 18:00:00 blood 202 #> 404 14.0 10 1765 1765-10-01 18:00:00 bone 202 #> 405 14.0 11 1765 1765-11-01 04:00:00 blood 203 #> 406 14.0 11 1765 1765-11-01 04:00:00 bone 203 #> 407 13.0 12 1765 1765-12-01 14:00:01 blood 204 #> 408 13.0 12 1765 1765-12-01 14:00:01 bone 204 #> 409 12.0 1 1766 1766-01-01 00:00:00 blood 205 #> 410 12.0 1 1766 1766-01-01 00:00:00 bone 205 #> 411 11.0 2 1766 1766-01-31 10:00:00 blood 206 #> 412 11.0 2 1766 1766-01-31 10:00:00 bone 206 #> 413 36.6 3 1766 1766-03-02 20:00:01 blood 207 #> 414 36.6 3 1766 1766-03-02 20:00:01 bone 207 #> 415 6.0 4 1766 1766-04-02 06:00:00 blood 208 #> 416 6.0 4 1766 1766-04-02 06:00:00 bone 208 #> 417 26.8 5 1766 1766-05-02 16:00:00 blood 209 #> 418 26.8 5 1766 1766-05-02 16:00:00 bone 209 #> 419 3.0 6 1766 1766-06-02 02:00:01 blood 210 #> 420 3.0 6 1766 1766-06-02 02:00:01 bone 210 #> 421 3.3 7 1766 1766-07-02 12:00:00 blood 211 #> 422 3.3 7 1766 1766-07-02 12:00:00 bone 211 #> 423 4.0 8 1766 1766-08-01 22:00:00 blood 212 #> 424 4.0 8 1766 1766-08-01 22:00:00 bone 212 #> 425 4.3 9 1766 1766-09-01 08:00:01 blood 213 #> 426 4.3 9 1766 1766-09-01 08:00:01 bone 213 #> 427 5.0 10 1766 1766-10-01 18:00:00 blood 214 #> 428 5.0 10 1766 1766-10-01 18:00:00 bone 214 #> 429 5.7 11 1766 1766-11-01 04:00:00 blood 215 #> 430 5.7 11 1766 1766-11-01 04:00:00 bone 215 #> 431 19.2 12 1766 1766-12-01 14:00:01 blood 216 #> 432 19.2 12 1766 1766-12-01 14:00:01 bone 216 #> 433 27.4 1 1767 1767-01-01 00:00:00 blood 217 #> 434 27.4 1 1767 1767-01-01 00:00:00 bone 217 #> 435 30.0 2 1767 1767-01-31 10:00:00 blood 218 #> 436 30.0 2 1767 1767-01-31 10:00:00 bone 218 #> 437 43.0 3 1767 1767-03-02 20:00:01 blood 219 #> 438 43.0 3 1767 1767-03-02 20:00:01 bone 219 #> 439 32.9 4 1767 1767-04-02 06:00:00 blood 220 #> 440 32.9 4 1767 1767-04-02 06:00:00 bone 220 #> 441 29.8 5 1767 1767-05-02 16:00:00 blood 221 #> 442 29.8 5 1767 1767-05-02 16:00:00 bone 221 #> 443 33.3 6 1767 1767-06-02 02:00:01 blood 222 #> 444 33.3 6 1767 1767-06-02 02:00:01 bone 222 #> 445 21.9 7 1767 1767-07-02 12:00:00 blood 223 #> 446 21.9 7 1767 1767-07-02 12:00:00 bone 223 #> 447 40.8 8 1767 1767-08-01 22:00:00 blood 224 #> 448 40.8 8 1767 1767-08-01 22:00:00 bone 224 #> 449 42.7 9 1767 1767-09-01 08:00:01 blood 225 #> 450 42.7 9 1767 1767-09-01 08:00:01 bone 225 #> 451 44.1 10 1767 1767-10-01 18:00:00 blood 226 #> 452 44.1 10 1767 1767-10-01 18:00:00 bone 226 #> 453 54.7 11 1767 1767-11-01 04:00:00 blood 227 #> 454 54.7 11 1767 1767-11-01 04:00:00 bone 227 #> 455 53.3 12 1767 1767-12-01 14:00:01 blood 228 #> 456 53.3 12 1767 1767-12-01 14:00:01 bone 228 #> 457 53.5 1 1768 1768-01-01 00:00:00 blood 229 #> 458 53.5 1 1768 1768-01-01 00:00:00 bone 229 #> 459 66.1 2 1768 1768-01-31 12:00:00 blood 230 #> 460 66.1 2 1768 1768-01-31 12:00:00 bone 230 #> 461 46.3 3 1768 1768-03-02 00:00:01 blood 231 #> 462 46.3 3 1768 1768-03-02 00:00:01 bone 231 #> 463 42.7 4 1768 1768-04-01 12:00:00 blood 232 #> 464 42.7 4 1768 1768-04-01 12:00:00 bone 232 #> 465 77.7 5 1768 1768-05-02 00:00:00 blood 233 #> 466 77.7 5 1768 1768-05-02 00:00:00 bone 233 #> 467 77.4 6 1768 1768-06-01 12:00:01 blood 234 #> 468 77.4 6 1768 1768-06-01 12:00:01 bone 234 #> 469 52.6 7 1768 1768-07-02 00:00:00 blood 235 #> 470 52.6 7 1768 1768-07-02 00:00:00 bone 235 #> 471 66.8 8 1768 1768-08-01 12:00:00 blood 236 #> 472 66.8 8 1768 1768-08-01 12:00:00 bone 236 #> 473 74.8 9 1768 1768-09-01 00:00:01 blood 237 #> 474 74.8 9 1768 1768-09-01 00:00:01 bone 237 #> 475 77.8 10 1768 1768-10-01 12:00:00 blood 238 #> 476 77.8 10 1768 1768-10-01 12:00:00 bone 238 #> 477 90.6 11 1768 1768-11-01 00:00:00 blood 239 #> 478 90.6 11 1768 1768-11-01 00:00:00 bone 239 #> 479 111.8 12 1768 1768-12-01 12:00:01 blood 240 #> 480 111.8 12 1768 1768-12-01 12:00:01 bone 240 #> 481 73.9 1 1769 1769-01-01 00:00:00 blood 241 #> 482 73.9 1 1769 1769-01-01 00:00:00 bone 241 #> 483 64.2 2 1769 1769-01-31 10:00:00 blood 242 #> 484 64.2 2 1769 1769-01-31 10:00:00 bone 242 #> 485 64.3 3 1769 1769-03-02 20:00:01 blood 243 #> 486 64.3 3 1769 1769-03-02 20:00:01 bone 243 #> 487 96.7 4 1769 1769-04-02 06:00:00 blood 244 #> 488 96.7 4 1769 1769-04-02 06:00:00 bone 244 #> 489 73.6 5 1769 1769-05-02 16:00:00 blood 245 #> 490 73.6 5 1769 1769-05-02 16:00:00 bone 245 #> 491 94.4 6 1769 1769-06-02 02:00:01 blood 246 #> 492 94.4 6 1769 1769-06-02 02:00:01 bone 246 #> 493 118.6 7 1769 1769-07-02 12:00:00 blood 247 #> 494 118.6 7 1769 1769-07-02 12:00:00 bone 247 #> 495 120.3 8 1769 1769-08-01 22:00:00 blood 248 #> 496 120.3 8 1769 1769-08-01 22:00:00 bone 248 #> 497 148.8 9 1769 1769-09-01 08:00:01 blood 249 #> 498 148.8 9 1769 1769-09-01 08:00:01 bone 249 #> 499 158.2 10 1769 1769-10-01 18:00:00 blood 250 #> 500 158.2 10 1769 1769-10-01 18:00:00 bone 250 #> 501 148.1 11 1769 1769-11-01 04:00:00 blood 251 #> 502 148.1 11 1769 1769-11-01 04:00:00 bone 251 #> 503 112.0 12 1769 1769-12-01 14:00:01 blood 252 #> 504 112.0 12 1769 1769-12-01 14:00:01 bone 252 #> 505 104.0 1 1770 1770-01-01 00:00:00 blood 253 #> 506 104.0 1 1770 1770-01-01 00:00:00 bone 253 #> 507 142.5 2 1770 1770-01-31 10:00:00 blood 254 #> 508 142.5 2 1770 1770-01-31 10:00:00 bone 254 #> 509 80.1 3 1770 1770-03-02 20:00:01 blood 255 #> 510 80.1 3 1770 1770-03-02 20:00:01 bone 255 #> 511 51.0 4 1770 1770-04-02 06:00:00 blood 256 #> 512 51.0 4 1770 1770-04-02 06:00:00 bone 256 #> 513 70.1 5 1770 1770-05-02 16:00:00 blood 257 #> 514 70.1 5 1770 1770-05-02 16:00:00 bone 257 #> 515 83.3 6 1770 1770-06-02 02:00:01 blood 258 #> 516 83.3 6 1770 1770-06-02 02:00:01 bone 258 #> 517 109.8 7 1770 1770-07-02 12:00:00 blood 259 #> 518 109.8 7 1770 1770-07-02 12:00:00 bone 259 #> 519 126.3 8 1770 1770-08-01 22:00:00 blood 260 #> 520 126.3 8 1770 1770-08-01 22:00:00 bone 260 #> 521 104.4 9 1770 1770-09-01 08:00:01 blood 261 #> 522 104.4 9 1770 1770-09-01 08:00:01 bone 261 #> 523 103.6 10 1770 1770-10-01 18:00:00 blood 262 #> 524 103.6 10 1770 1770-10-01 18:00:00 bone 262 #> 525 132.2 11 1770 1770-11-01 04:00:00 blood 263 #> 526 132.2 11 1770 1770-11-01 04:00:00 bone 263 #> 527 102.3 12 1770 1770-12-01 14:00:01 blood 264 #> 528 102.3 12 1770 1770-12-01 14:00:01 bone 264 #> 529 36.0 1 1771 1771-01-01 00:00:00 blood 265 #> 530 36.0 1 1771 1771-01-01 00:00:00 bone 265 #> 531 46.2 2 1771 1771-01-31 10:00:00 blood 266 #> 532 46.2 2 1771 1771-01-31 10:00:00 bone 266 #> 533 46.7 3 1771 1771-03-02 20:00:01 blood 267 #> 534 46.7 3 1771 1771-03-02 20:00:01 bone 267 #> 535 64.9 4 1771 1771-04-02 06:00:00 blood 268 #> 536 64.9 4 1771 1771-04-02 06:00:00 bone 268 #> 537 152.7 5 1771 1771-05-02 16:00:00 blood 269 #> 538 152.7 5 1771 1771-05-02 16:00:00 bone 269 #> 539 119.5 6 1771 1771-06-02 02:00:01 blood 270 #> 540 119.5 6 1771 1771-06-02 02:00:01 bone 270 #> 541 67.7 7 1771 1771-07-02 12:00:00 blood 271 #> 542 67.7 7 1771 1771-07-02 12:00:00 bone 271 #> 543 58.5 8 1771 1771-08-01 22:00:00 blood 272 #> 544 58.5 8 1771 1771-08-01 22:00:00 bone 272 #> 545 101.4 9 1771 1771-09-01 08:00:01 blood 273 #> 546 101.4 9 1771 1771-09-01 08:00:01 bone 273 #> 547 90.0 10 1771 1771-10-01 18:00:00 blood 274 #> 548 90.0 10 1771 1771-10-01 18:00:00 bone 274 #> 549 99.7 11 1771 1771-11-01 04:00:00 blood 275 #> 550 99.7 11 1771 1771-11-01 04:00:00 bone 275 #> 551 95.7 12 1771 1771-12-01 14:00:01 blood 276 #> 552 95.7 12 1771 1771-12-01 14:00:01 bone 276 #> 553 100.9 1 1772 1772-01-01 00:00:00 blood 277 #> 554 100.9 1 1772 1772-01-01 00:00:00 bone 277 #> 555 90.8 2 1772 1772-01-31 12:00:00 blood 278 #> 556 90.8 2 1772 1772-01-31 12:00:00 bone 278 #> 557 31.1 3 1772 1772-03-02 00:00:01 blood 279 #> 558 31.1 3 1772 1772-03-02 00:00:01 bone 279 #> 559 92.2 4 1772 1772-04-01 12:00:00 blood 280 #> 560 92.2 4 1772 1772-04-01 12:00:00 bone 280 #> 561 38.0 5 1772 1772-05-02 00:00:00 blood 281 #> 562 38.0 5 1772 1772-05-02 00:00:00 bone 281 #> 563 57.0 6 1772 1772-06-01 12:00:01 blood 282 #> 564 57.0 6 1772 1772-06-01 12:00:01 bone 282 #> 565 77.3 7 1772 1772-07-02 00:00:00 blood 283 #> 566 77.3 7 1772 1772-07-02 00:00:00 bone 283 #> 567 56.2 8 1772 1772-08-01 12:00:00 blood 284 #> 568 56.2 8 1772 1772-08-01 12:00:00 bone 284 #> 569 50.5 9 1772 1772-09-01 00:00:01 blood 285 #> 570 50.5 9 1772 1772-09-01 00:00:01 bone 285 #> 571 78.6 10 1772 1772-10-01 12:00:00 blood 286 #> 572 78.6 10 1772 1772-10-01 12:00:00 bone 286 #> 573 61.3 11 1772 1772-11-01 00:00:00 blood 287 #> 574 61.3 11 1772 1772-11-01 00:00:00 bone 287 #> 575 64.0 12 1772 1772-12-01 12:00:01 blood 288 #> 576 64.0 12 1772 1772-12-01 12:00:01 bone 288 #> 577 54.6 1 1773 1773-01-01 00:00:00 blood 289 #> 578 54.6 1 1773 1773-01-01 00:00:00 bone 289 #> 579 29.0 2 1773 1773-01-31 10:00:00 blood 290 #> 580 29.0 2 1773 1773-01-31 10:00:00 bone 290 #> 581 51.2 3 1773 1773-03-02 20:00:01 blood 291 #> 582 51.2 3 1773 1773-03-02 20:00:01 bone 291 #> 583 32.9 4 1773 1773-04-02 06:00:00 blood 292 #> 584 32.9 4 1773 1773-04-02 06:00:00 bone 292 #> 585 41.1 5 1773 1773-05-02 16:00:00 blood 293 #> 586 41.1 5 1773 1773-05-02 16:00:00 bone 293 #> 587 28.4 6 1773 1773-06-02 02:00:01 blood 294 #> 588 28.4 6 1773 1773-06-02 02:00:01 bone 294 #> 589 27.7 7 1773 1773-07-02 12:00:00 blood 295 #> 590 27.7 7 1773 1773-07-02 12:00:00 bone 295 #> 591 12.7 8 1773 1773-08-01 22:00:00 blood 296 #> 592 12.7 8 1773 1773-08-01 22:00:00 bone 296 #> 593 29.3 9 1773 1773-09-01 08:00:01 blood 297 #> 594 29.3 9 1773 1773-09-01 08:00:01 bone 297 #> 595 26.3 10 1773 1773-10-01 18:00:00 blood 298 #> 596 26.3 10 1773 1773-10-01 18:00:00 bone 298 #> 597 40.9 11 1773 1773-11-01 04:00:00 blood 299 #> 598 40.9 11 1773 1773-11-01 04:00:00 bone 299 #> 599 43.2 12 1773 1773-12-01 14:00:01 blood 300 #> 600 43.2 12 1773 1773-12-01 14:00:01 bone 300 #> 601 46.8 1 1774 1774-01-01 00:00:00 blood 301 #> 602 46.8 1 1774 1774-01-01 00:00:00 bone 301 #> 603 65.4 2 1774 1774-01-31 10:00:00 blood 302 #> 604 65.4 2 1774 1774-01-31 10:00:00 bone 302 #> 605 55.7 3 1774 1774-03-02 20:00:01 blood 303 #> 606 55.7 3 1774 1774-03-02 20:00:01 bone 303 #> 607 43.8 4 1774 1774-04-02 06:00:00 blood 304 #> 608 43.8 4 1774 1774-04-02 06:00:00 bone 304 #> 609 51.3 5 1774 1774-05-02 16:00:00 blood 305 #> 610 51.3 5 1774 1774-05-02 16:00:00 bone 305 #> 611 28.5 6 1774 1774-06-02 02:00:01 blood 306 #> 612 28.5 6 1774 1774-06-02 02:00:01 bone 306 #> 613 17.5 7 1774 1774-07-02 12:00:00 blood 307 #> 614 17.5 7 1774 1774-07-02 12:00:00 bone 307 #> 615 6.6 8 1774 1774-08-01 22:00:00 blood 308 #> 616 6.6 8 1774 1774-08-01 22:00:00 bone 308 #> 617 7.9 9 1774 1774-09-01 08:00:01 blood 309 #> 618 7.9 9 1774 1774-09-01 08:00:01 bone 309 #> 619 14.0 10 1774 1774-10-01 18:00:00 blood 310 #> 620 14.0 10 1774 1774-10-01 18:00:00 bone 310 #> 621 17.7 11 1774 1774-11-01 04:00:00 blood 311 #> 622 17.7 11 1774 1774-11-01 04:00:00 bone 311 #> 623 12.2 12 1774 1774-12-01 14:00:01 blood 312 #> 624 12.2 12 1774 1774-12-01 14:00:01 bone 312 #> 625 4.4 1 1775 1775-01-01 00:00:00 blood 313 #> 626 4.4 1 1775 1775-01-01 00:00:00 bone 313 #> 627 0.0 2 1775 1775-01-31 10:00:00 blood 314 #> 628 0.0 2 1775 1775-01-31 10:00:00 bone 314 #> 629 11.6 3 1775 1775-03-02 20:00:01 blood 315 #> 630 11.6 3 1775 1775-03-02 20:00:01 bone 315 #> 631 11.2 4 1775 1775-04-02 06:00:00 blood 316 #> 632 11.2 4 1775 1775-04-02 06:00:00 bone 316 #> 633 3.9 5 1775 1775-05-02 16:00:00 blood 317 #> 634 3.9 5 1775 1775-05-02 16:00:00 bone 317 #> 635 12.3 6 1775 1775-06-02 02:00:01 blood 318 #> 636 12.3 6 1775 1775-06-02 02:00:01 bone 318 #> 637 1.0 7 1775 1775-07-02 12:00:00 blood 319 #> 638 1.0 7 1775 1775-07-02 12:00:00 bone 319 #> 639 7.9 8 1775 1775-08-01 22:00:00 blood 320 #> 640 7.9 8 1775 1775-08-01 22:00:00 bone 320 #> 641 3.2 9 1775 1775-09-01 08:00:01 blood 321 #> 642 3.2 9 1775 1775-09-01 08:00:01 bone 321 #> 643 5.6 10 1775 1775-10-01 18:00:00 blood 322 #> 644 5.6 10 1775 1775-10-01 18:00:00 bone 322 #> 645 15.1 11 1775 1775-11-01 04:00:00 blood 323 #> 646 15.1 11 1775 1775-11-01 04:00:00 bone 323 #> 647 7.9 12 1775 1775-12-01 14:00:01 blood 324 #> 648 7.9 12 1775 1775-12-01 14:00:01 bone 324 #> 649 21.7 1 1776 1776-01-01 00:00:00 blood 325 #> 650 21.7 1 1776 1776-01-01 00:00:00 bone 325 #> 651 11.6 2 1776 1776-01-31 12:00:00 blood 326 #> 652 11.6 2 1776 1776-01-31 12:00:00 bone 326 #> 653 6.3 3 1776 1776-03-02 00:00:01 blood 327 #> 654 6.3 3 1776 1776-03-02 00:00:01 bone 327 #> 655 21.8 4 1776 1776-04-01 12:00:00 blood 328 #> 656 21.8 4 1776 1776-04-01 12:00:00 bone 328 #> 657 11.2 5 1776 1776-05-02 00:00:00 blood 329 #> 658 11.2 5 1776 1776-05-02 00:00:00 bone 329 #> 659 19.0 6 1776 1776-06-01 12:00:01 blood 330 #> 660 19.0 6 1776 1776-06-01 12:00:01 bone 330 #> 661 1.0 7 1776 1776-07-02 00:00:00 blood 331 #> 662 1.0 7 1776 1776-07-02 00:00:00 bone 331 #> 663 24.2 8 1776 1776-08-01 12:00:00 blood 332 #> 664 24.2 8 1776 1776-08-01 12:00:00 bone 332 #> 665 16.0 9 1776 1776-09-01 00:00:01 blood 333 #> 666 16.0 9 1776 1776-09-01 00:00:01 bone 333 #> 667 30.0 10 1776 1776-10-01 12:00:00 blood 334 #> 668 30.0 10 1776 1776-10-01 12:00:00 bone 334 #> 669 35.0 11 1776 1776-11-01 00:00:00 blood 335 #> 670 35.0 11 1776 1776-11-01 00:00:00 bone 335 #> 671 40.0 12 1776 1776-12-01 12:00:01 blood 336 #> 672 40.0 12 1776 1776-12-01 12:00:01 bone 336 #> 673 45.0 1 1777 1777-01-01 00:00:00 blood 337 #> 674 45.0 1 1777 1777-01-01 00:00:00 bone 337 #> 675 36.5 2 1777 1777-01-31 10:00:00 blood 338 #> 676 36.5 2 1777 1777-01-31 10:00:00 bone 338 #> 677 39.0 3 1777 1777-03-02 20:00:01 blood 339 #> 678 39.0 3 1777 1777-03-02 20:00:01 bone 339 #> 679 95.5 4 1777 1777-04-02 06:00:00 blood 340 #> 680 95.5 4 1777 1777-04-02 06:00:00 bone 340 #> 681 80.3 5 1777 1777-05-02 16:00:00 blood 341 #> 682 80.3 5 1777 1777-05-02 16:00:00 bone 341 #> 683 80.7 6 1777 1777-06-02 02:00:01 blood 342 #> 684 80.7 6 1777 1777-06-02 02:00:01 bone 342 #> 685 95.0 7 1777 1777-07-02 12:00:00 blood 343 #> 686 95.0 7 1777 1777-07-02 12:00:00 bone 343 #> 687 112.0 8 1777 1777-08-01 22:00:00 blood 344 #> 688 112.0 8 1777 1777-08-01 22:00:00 bone 344 #> 689 116.2 9 1777 1777-09-01 08:00:01 blood 345 #> 690 116.2 9 1777 1777-09-01 08:00:01 bone 345 #> 691 106.5 10 1777 1777-10-01 18:00:00 blood 346 #> 692 106.5 10 1777 1777-10-01 18:00:00 bone 346 #> 693 146.0 11 1777 1777-11-01 04:00:00 blood 347 #> 694 146.0 11 1777 1777-11-01 04:00:00 bone 347 #> 695 157.3 12 1777 1777-12-01 14:00:01 blood 348 #> 696 157.3 12 1777 1777-12-01 14:00:01 bone 348 #> 697 177.3 1 1778 1778-01-01 00:00:00 blood 349 #> 698 177.3 1 1778 1778-01-01 00:00:00 bone 349 #> 699 109.3 2 1778 1778-01-31 10:00:00 blood 350 #> 700 109.3 2 1778 1778-01-31 10:00:00 bone 350 #> 701 134.0 3 1778 1778-03-02 20:00:01 blood 351 #> 702 134.0 3 1778 1778-03-02 20:00:01 bone 351 #> 703 145.0 4 1778 1778-04-02 06:00:00 blood 352 #> 704 145.0 4 1778 1778-04-02 06:00:00 bone 352 #> 705 238.9 5 1778 1778-05-02 16:00:00 blood 353 #> 706 238.9 5 1778 1778-05-02 16:00:00 bone 353 #> 707 171.6 6 1778 1778-06-02 02:00:01 blood 354 #> 708 171.6 6 1778 1778-06-02 02:00:01 bone 354 #> 709 153.0 7 1778 1778-07-02 12:00:00 blood 355 #> 710 153.0 7 1778 1778-07-02 12:00:00 bone 355 #> 711 140.0 8 1778 1778-08-01 22:00:00 blood 356 #> 712 140.0 8 1778 1778-08-01 22:00:00 bone 356 #> 713 171.7 9 1778 1778-09-01 08:00:01 blood 357 #> 714 171.7 9 1778 1778-09-01 08:00:01 bone 357 #> 715 156.3 10 1778 1778-10-01 18:00:00 blood 358 #> 716 156.3 10 1778 1778-10-01 18:00:00 bone 358 #> 717 150.3 11 1778 1778-11-01 04:00:00 blood 359 #> 718 150.3 11 1778 1778-11-01 04:00:00 bone 359 #> 719 105.0 12 1778 1778-12-01 14:00:01 blood 360 #> 720 105.0 12 1778 1778-12-01 14:00:01 bone 360 #> 721 114.7 1 1779 1779-01-01 00:00:00 blood 361 #> 722 114.7 1 1779 1779-01-01 00:00:00 bone 361 #> 723 165.7 2 1779 1779-01-31 10:00:00 blood 362 #> 724 165.7 2 1779 1779-01-31 10:00:00 bone 362 #> 725 118.0 3 1779 1779-03-02 20:00:01 blood 363 #> 726 118.0 3 1779 1779-03-02 20:00:01 bone 363 #> 727 145.0 4 1779 1779-04-02 06:00:00 blood 364 #> 728 145.0 4 1779 1779-04-02 06:00:00 bone 364 #> 729 140.0 5 1779 1779-05-02 16:00:00 blood 365 #> 730 140.0 5 1779 1779-05-02 16:00:00 bone 365 #> 731 113.7 6 1779 1779-06-02 02:00:01 blood 366 #> 732 113.7 6 1779 1779-06-02 02:00:01 bone 366 #> 733 143.0 7 1779 1779-07-02 12:00:00 blood 367 #> 734 143.0 7 1779 1779-07-02 12:00:00 bone 367 #> 735 112.0 8 1779 1779-08-01 22:00:00 blood 368 #> 736 112.0 8 1779 1779-08-01 22:00:00 bone 368 #> 737 111.0 9 1779 1779-09-01 08:00:01 blood 369 #> 738 111.0 9 1779 1779-09-01 08:00:01 bone 369 #> 739 124.0 10 1779 1779-10-01 18:00:00 blood 370 #> 740 124.0 10 1779 1779-10-01 18:00:00 bone 370 #> 741 114.0 11 1779 1779-11-01 04:00:00 blood 371 #> 742 114.0 11 1779 1779-11-01 04:00:00 bone 371 #> 743 110.0 12 1779 1779-12-01 14:00:01 blood 372 #> 744 110.0 12 1779 1779-12-01 14:00:01 bone 372 #> 745 70.0 1 1780 1780-01-01 00:00:00 blood 373 #> 746 70.0 1 1780 1780-01-01 00:00:00 bone 373 #> 747 98.0 2 1780 1780-01-31 12:00:00 blood 374 #> 748 98.0 2 1780 1780-01-31 12:00:00 bone 374 #> 749 98.0 3 1780 1780-03-02 00:00:01 blood 375 #> 750 98.0 3 1780 1780-03-02 00:00:01 bone 375 #> 751 95.0 4 1780 1780-04-01 12:00:00 blood 376 #> 752 95.0 4 1780 1780-04-01 12:00:00 bone 376 #> 753 107.2 5 1780 1780-05-02 00:00:00 blood 377 #> 754 107.2 5 1780 1780-05-02 00:00:00 bone 377 #> 755 88.0 6 1780 1780-06-01 12:00:01 blood 378 #> 756 88.0 6 1780 1780-06-01 12:00:01 bone 378 #> 757 86.0 7 1780 1780-07-02 00:00:00 blood 379 #> 758 86.0 7 1780 1780-07-02 00:00:00 bone 379 #> 759 86.0 8 1780 1780-08-01 12:00:00 blood 380 #> 760 86.0 8 1780 1780-08-01 12:00:00 bone 380 #> 761 93.7 9 1780 1780-09-01 00:00:01 blood 381 #> 762 93.7 9 1780 1780-09-01 00:00:01 bone 381 #> 763 77.0 10 1780 1780-10-01 12:00:00 blood 382 #> 764 77.0 10 1780 1780-10-01 12:00:00 bone 382 #> 765 60.0 11 1780 1780-11-01 00:00:00 blood 383 #> 766 60.0 11 1780 1780-11-01 00:00:00 bone 383 #> 767 58.7 12 1780 1780-12-01 12:00:01 blood 384 #> 768 58.7 12 1780 1780-12-01 12:00:01 bone 384 #> 769 98.7 1 1781 1781-01-01 00:00:00 blood 385 #> 770 98.7 1 1781 1781-01-01 00:00:00 bone 385 #> 771 74.7 2 1781 1781-01-31 10:00:00 blood 386 #> 772 74.7 2 1781 1781-01-31 10:00:00 bone 386 #> 773 53.0 3 1781 1781-03-02 20:00:01 blood 387 #> 774 53.0 3 1781 1781-03-02 20:00:01 bone 387 #> 775 68.3 4 1781 1781-04-02 06:00:00 blood 388 #> 776 68.3 4 1781 1781-04-02 06:00:00 bone 388 #> 777 104.7 5 1781 1781-05-02 16:00:00 blood 389 #> 778 104.7 5 1781 1781-05-02 16:00:00 bone 389 #> 779 97.7 6 1781 1781-06-02 02:00:01 blood 390 #> 780 97.7 6 1781 1781-06-02 02:00:01 bone 390 #> 781 73.5 7 1781 1781-07-02 12:00:00 blood 391 #> 782 73.5 7 1781 1781-07-02 12:00:00 bone 391 #> 783 66.0 8 1781 1781-08-01 22:00:00 blood 392 #> 784 66.0 8 1781 1781-08-01 22:00:00 bone 392 #> 785 51.0 9 1781 1781-09-01 08:00:01 blood 393 #> 786 51.0 9 1781 1781-09-01 08:00:01 bone 393 #> 787 27.3 10 1781 1781-10-01 18:00:00 blood 394 #> 788 27.3 10 1781 1781-10-01 18:00:00 bone 394 #> 789 67.0 11 1781 1781-11-01 04:00:00 blood 395 #> 790 67.0 11 1781 1781-11-01 04:00:00 bone 395 #> 791 35.2 12 1781 1781-12-01 14:00:01 blood 396 #> 792 35.2 12 1781 1781-12-01 14:00:01 bone 396 #> 793 54.0 1 1782 1782-01-01 00:00:00 blood 397 #> 794 54.0 1 1782 1782-01-01 00:00:00 bone 397 #> 795 37.5 2 1782 1782-01-31 10:00:00 blood 398 #> 796 37.5 2 1782 1782-01-31 10:00:00 bone 398 #> 797 37.0 3 1782 1782-03-02 20:00:01 blood 399 #> 798 37.0 3 1782 1782-03-02 20:00:01 bone 399 #> 799 41.0 4 1782 1782-04-02 06:00:00 blood 400 #> 800 41.0 4 1782 1782-04-02 06:00:00 bone 400 #> 801 54.3 5 1782 1782-05-02 16:00:00 blood 401 #> 802 54.3 5 1782 1782-05-02 16:00:00 bone 401 #> 803 38.0 6 1782 1782-06-02 02:00:01 blood 402 #> 804 38.0 6 1782 1782-06-02 02:00:01 bone 402 #> 805 37.0 7 1782 1782-07-02 12:00:00 blood 403 #> 806 37.0 7 1782 1782-07-02 12:00:00 bone 403 #> 807 44.0 8 1782 1782-08-01 22:00:00 blood 404 #> 808 44.0 8 1782 1782-08-01 22:00:00 bone 404 #> 809 34.0 9 1782 1782-09-01 08:00:01 blood 405 #> 810 34.0 9 1782 1782-09-01 08:00:01 bone 405 #> 811 23.2 10 1782 1782-10-01 18:00:00 blood 406 #> 812 23.2 10 1782 1782-10-01 18:00:00 bone 406 #> 813 31.5 11 1782 1782-11-01 04:00:00 blood 407 #> 814 31.5 11 1782 1782-11-01 04:00:00 bone 407 #> 815 30.0 12 1782 1782-12-01 14:00:01 blood 408 #> 816 30.0 12 1782 1782-12-01 14:00:01 bone 408 #> 817 28.0 1 1783 1783-01-01 00:00:00 blood 409 #> 818 28.0 1 1783 1783-01-01 00:00:00 bone 409 #> 819 38.7 2 1783 1783-01-31 10:00:00 blood 410 #> 820 38.7 2 1783 1783-01-31 10:00:00 bone 410 #> 821 26.7 3 1783 1783-03-02 20:00:01 blood 411 #> 822 26.7 3 1783 1783-03-02 20:00:01 bone 411 #> 823 28.3 4 1783 1783-04-02 06:00:00 blood 412 #> 824 28.3 4 1783 1783-04-02 06:00:00 bone 412 #> 825 23.0 5 1783 1783-05-02 16:00:00 blood 413 #> 826 23.0 5 1783 1783-05-02 16:00:00 bone 413 #> 827 25.2 6 1783 1783-06-02 02:00:01 blood 414 #> 828 25.2 6 1783 1783-06-02 02:00:01 bone 414 #> 829 32.2 7 1783 1783-07-02 12:00:00 blood 415 #> 830 32.2 7 1783 1783-07-02 12:00:00 bone 415 #> 831 20.0 8 1783 1783-08-01 22:00:00 blood 416 #> 832 20.0 8 1783 1783-08-01 22:00:00 bone 416 #> 833 18.0 9 1783 1783-09-01 08:00:01 blood 417 #> 834 18.0 9 1783 1783-09-01 08:00:01 bone 417 #> 835 8.0 10 1783 1783-10-01 18:00:00 blood 418 #> 836 8.0 10 1783 1783-10-01 18:00:00 bone 418 #> 837 15.0 11 1783 1783-11-01 04:00:00 blood 419 #> 838 15.0 11 1783 1783-11-01 04:00:00 bone 419 #> 839 10.5 12 1783 1783-12-01 14:00:01 blood 420 #> 840 10.5 12 1783 1783-12-01 14:00:01 bone 420 #> 841 13.0 1 1784 1784-01-01 00:00:00 blood 421 #> 842 13.0 1 1784 1784-01-01 00:00:00 bone 421 #> 843 8.0 2 1784 1784-01-31 12:00:00 blood 422 #> 844 8.0 2 1784 1784-01-31 12:00:00 bone 422 #> 845 11.0 3 1784 1784-03-02 00:00:01 blood 423 #> 846 11.0 3 1784 1784-03-02 00:00:01 bone 423 #> 847 10.0 4 1784 1784-04-01 12:00:00 blood 424 #> 848 10.0 4 1784 1784-04-01 12:00:00 bone 424 #> 849 6.0 5 1784 1784-05-02 00:00:00 blood 425 #> 850 6.0 5 1784 1784-05-02 00:00:00 bone 425 #> 851 9.0 6 1784 1784-06-01 12:00:01 blood 426 #> 852 9.0 6 1784 1784-06-01 12:00:01 bone 426 #> 853 6.0 7 1784 1784-07-02 00:00:00 blood 427 #> 854 6.0 7 1784 1784-07-02 00:00:00 bone 427 #> 855 10.0 8 1784 1784-08-01 12:00:00 blood 428 #> 856 10.0 8 1784 1784-08-01 12:00:00 bone 428 #> 857 10.0 9 1784 1784-09-01 00:00:01 blood 429 #> 858 10.0 9 1784 1784-09-01 00:00:01 bone 429 #> 859 8.0 10 1784 1784-10-01 12:00:00 blood 430 #> 860 8.0 10 1784 1784-10-01 12:00:00 bone 430 #> 861 17.0 11 1784 1784-11-01 00:00:00 blood 431 #> 862 17.0 11 1784 1784-11-01 00:00:00 bone 431 #> 863 14.0 12 1784 1784-12-01 12:00:01 blood 432 #> 864 14.0 12 1784 1784-12-01 12:00:01 bone 432 #> 865 6.5 1 1785 1785-01-01 00:00:00 blood 433 #> 866 6.5 1 1785 1785-01-01 00:00:00 bone 433 #> 867 8.0 2 1785 1785-01-31 10:00:00 blood 434 #> 868 8.0 2 1785 1785-01-31 10:00:00 bone 434 #> 869 9.0 3 1785 1785-03-02 20:00:01 blood 435 #> 870 9.0 3 1785 1785-03-02 20:00:01 bone 435 #> 871 15.7 4 1785 1785-04-02 06:00:00 blood 436 #> 872 15.7 4 1785 1785-04-02 06:00:00 bone 436 #> 873 20.7 5 1785 1785-05-02 16:00:00 blood 437 #> 874 20.7 5 1785 1785-05-02 16:00:00 bone 437 #> 875 26.3 6 1785 1785-06-02 02:00:01 blood 438 #> 876 26.3 6 1785 1785-06-02 02:00:01 bone 438 #> 877 36.3 7 1785 1785-07-02 12:00:00 blood 439 #> 878 36.3 7 1785 1785-07-02 12:00:00 bone 439 #> 879 20.0 8 1785 1785-08-01 22:00:00 blood 440 #> 880 20.0 8 1785 1785-08-01 22:00:00 bone 440 #> 881 32.0 9 1785 1785-09-01 08:00:01 blood 441 #> 882 32.0 9 1785 1785-09-01 08:00:01 bone 441 #> 883 47.2 10 1785 1785-10-01 18:00:00 blood 442 #> 884 47.2 10 1785 1785-10-01 18:00:00 bone 442 #> 885 40.2 11 1785 1785-11-01 04:00:00 blood 443 #> 886 40.2 11 1785 1785-11-01 04:00:00 bone 443 #> 887 27.3 12 1785 1785-12-01 14:00:01 blood 444 #> 888 27.3 12 1785 1785-12-01 14:00:01 bone 444 #> 889 37.2 1 1786 1786-01-01 00:00:00 blood 445 #> 890 37.2 1 1786 1786-01-01 00:00:00 bone 445 #> 891 47.6 2 1786 1786-01-31 10:00:00 blood 446 #> 892 47.6 2 1786 1786-01-31 10:00:00 bone 446 #> 893 47.7 3 1786 1786-03-02 20:00:01 blood 447 #> 894 47.7 3 1786 1786-03-02 20:00:01 bone 447 #> 895 85.4 4 1786 1786-04-02 06:00:00 blood 448 #> 896 85.4 4 1786 1786-04-02 06:00:00 bone 448 #> 897 92.3 5 1786 1786-05-02 16:00:00 blood 449 #> 898 92.3 5 1786 1786-05-02 16:00:00 bone 449 #> 899 59.0 6 1786 1786-06-02 02:00:01 blood 450 #> 900 59.0 6 1786 1786-06-02 02:00:01 bone 450 #> 901 83.0 7 1786 1786-07-02 12:00:00 blood 451 #> 902 83.0 7 1786 1786-07-02 12:00:00 bone 451 #> 903 89.7 8 1786 1786-08-01 22:00:00 blood 452 #> 904 89.7 8 1786 1786-08-01 22:00:00 bone 452 #> 905 111.5 9 1786 1786-09-01 08:00:01 blood 453 #> 906 111.5 9 1786 1786-09-01 08:00:01 bone 453 #> 907 112.3 10 1786 1786-10-01 18:00:00 blood 454 #> 908 112.3 10 1786 1786-10-01 18:00:00 bone 454 #> 909 116.0 11 1786 1786-11-01 04:00:00 blood 455 #> 910 116.0 11 1786 1786-11-01 04:00:00 bone 455 #> 911 112.7 12 1786 1786-12-01 14:00:01 blood 456 #> 912 112.7 12 1786 1786-12-01 14:00:01 bone 456 #> 913 134.7 1 1787 1787-01-01 00:00:00 blood 457 #> 914 134.7 1 1787 1787-01-01 00:00:00 bone 457 #> 915 106.0 2 1787 1787-01-31 10:00:00 blood 458 #> 916 106.0 2 1787 1787-01-31 10:00:00 bone 458 #> 917 87.4 3 1787 1787-03-02 20:00:01 blood 459 #> 918 87.4 3 1787 1787-03-02 20:00:01 bone 459 #> 919 127.2 4 1787 1787-04-02 06:00:00 blood 460 #> 920 127.2 4 1787 1787-04-02 06:00:00 bone 460 #> 921 134.8 5 1787 1787-05-02 16:00:00 blood 461 #> 922 134.8 5 1787 1787-05-02 16:00:00 bone 461 #> 923 99.2 6 1787 1787-06-02 02:00:01 blood 462 #> 924 99.2 6 1787 1787-06-02 02:00:01 bone 462 #> 925 128.0 7 1787 1787-07-02 12:00:00 blood 463 #> 926 128.0 7 1787 1787-07-02 12:00:00 bone 463 #> 927 137.2 8 1787 1787-08-01 22:00:00 blood 464 #> 928 137.2 8 1787 1787-08-01 22:00:00 bone 464 #> 929 157.3 9 1787 1787-09-01 08:00:01 blood 465 #> 930 157.3 9 1787 1787-09-01 08:00:01 bone 465 #> 931 157.0 10 1787 1787-10-01 18:00:00 blood 466 #> 932 157.0 10 1787 1787-10-01 18:00:00 bone 466 #> 933 141.5 11 1787 1787-11-01 04:00:00 blood 467 #> 934 141.5 11 1787 1787-11-01 04:00:00 bone 467 #> 935 174.0 12 1787 1787-12-01 14:00:01 blood 468 #> 936 174.0 12 1787 1787-12-01 14:00:01 bone 468 #> 937 138.0 1 1788 1788-01-01 00:00:00 blood 469 #> 938 138.0 1 1788 1788-01-01 00:00:00 bone 469 #> 939 129.2 2 1788 1788-01-31 12:00:00 blood 470 #> 940 129.2 2 1788 1788-01-31 12:00:00 bone 470 #> 941 143.3 3 1788 1788-03-02 00:00:01 blood 471 #> 942 143.3 3 1788 1788-03-02 00:00:01 bone 471 #> 943 108.5 4 1788 1788-04-01 12:00:00 blood 472 #> 944 108.5 4 1788 1788-04-01 12:00:00 bone 472 #> 945 113.0 5 1788 1788-05-02 00:00:00 blood 473 #> 946 113.0 5 1788 1788-05-02 00:00:00 bone 473 #> 947 154.2 6 1788 1788-06-01 12:00:01 blood 474 #> 948 154.2 6 1788 1788-06-01 12:00:01 bone 474 #> 949 141.5 7 1788 1788-07-02 00:00:00 blood 475 #> 950 141.5 7 1788 1788-07-02 00:00:00 bone 475 #> 951 136.0 8 1788 1788-08-01 12:00:00 blood 476 #> 952 136.0 8 1788 1788-08-01 12:00:00 bone 476 #> 953 141.0 9 1788 1788-09-01 00:00:01 blood 477 #> 954 141.0 9 1788 1788-09-01 00:00:01 bone 477 #> 955 142.0 10 1788 1788-10-01 12:00:00 blood 478 #> 956 142.0 10 1788 1788-10-01 12:00:00 bone 478 #> 957 94.7 11 1788 1788-11-01 00:00:00 blood 479 #> 958 94.7 11 1788 1788-11-01 00:00:00 bone 479 #> 959 129.5 12 1788 1788-12-01 12:00:01 blood 480 #> 960 129.5 12 1788 1788-12-01 12:00:01 bone 480 #> 961 114.0 1 1789 1789-01-01 00:00:00 blood 481 #> 962 114.0 1 1789 1789-01-01 00:00:00 bone 481 #> 963 125.3 2 1789 1789-01-31 10:00:00 blood 482 #> 964 125.3 2 1789 1789-01-31 10:00:00 bone 482 #> 965 120.0 3 1789 1789-03-02 20:00:01 blood 483 #> 966 120.0 3 1789 1789-03-02 20:00:01 bone 483 #> 967 123.3 4 1789 1789-04-02 06:00:00 blood 484 #> 968 123.3 4 1789 1789-04-02 06:00:00 bone 484 #> 969 123.5 5 1789 1789-05-02 16:00:00 blood 485 #> 970 123.5 5 1789 1789-05-02 16:00:00 bone 485 #> 971 120.0 6 1789 1789-06-02 02:00:01 blood 486 #> 972 120.0 6 1789 1789-06-02 02:00:01 bone 486 #> 973 117.0 7 1789 1789-07-02 12:00:00 blood 487 #> 974 117.0 7 1789 1789-07-02 12:00:00 bone 487 #> 975 103.0 8 1789 1789-08-01 22:00:00 blood 488 #> 976 103.0 8 1789 1789-08-01 22:00:00 bone 488 #> 977 112.0 9 1789 1789-09-01 08:00:01 blood 489 #> 978 112.0 9 1789 1789-09-01 08:00:01 bone 489 #> 979 89.7 10 1789 1789-10-01 18:00:00 blood 490 #> 980 89.7 10 1789 1789-10-01 18:00:00 bone 490 #> 981 134.0 11 1789 1789-11-01 04:00:00 blood 491 #> 982 134.0 11 1789 1789-11-01 04:00:00 bone 491 #> 983 135.5 12 1789 1789-12-01 14:00:01 blood 492 #> 984 135.5 12 1789 1789-12-01 14:00:01 bone 492 #> 985 103.0 1 1790 1790-01-01 00:00:00 blood 493 #> 986 103.0 1 1790 1790-01-01 00:00:00 bone 493 #> 987 127.5 2 1790 1790-01-31 10:00:00 blood 494 #> 988 127.5 2 1790 1790-01-31 10:00:00 bone 494 #> 989 96.3 3 1790 1790-03-02 20:00:01 blood 495 #> 990 96.3 3 1790 1790-03-02 20:00:01 bone 495 #> 991 94.0 4 1790 1790-04-02 06:00:00 blood 496 #> 992 94.0 4 1790 1790-04-02 06:00:00 bone 496 #> 993 93.0 5 1790 1790-05-02 16:00:00 blood 497 #> 994 93.0 5 1790 1790-05-02 16:00:00 bone 497 #> 995 91.0 6 1790 1790-06-02 02:00:01 blood 498 #> 996 91.0 6 1790 1790-06-02 02:00:01 bone 498 #> 997 69.3 7 1790 1790-07-02 12:00:00 blood 499 #> 998 69.3 7 1790 1790-07-02 12:00:00 bone 499 #> 999 87.0 8 1790 1790-08-01 22:00:00 blood 500 #> 1000 87.0 8 1790 1790-08-01 22:00:00 bone 500 #> 1001 77.3 9 1790 1790-09-01 08:00:01 blood 501 #> 1002 77.3 9 1790 1790-09-01 08:00:01 bone 501 #> 1003 84.3 10 1790 1790-10-01 18:00:00 blood 502 #> 1004 84.3 10 1790 1790-10-01 18:00:00 bone 502 #> 1005 82.0 11 1790 1790-11-01 04:00:00 blood 503 #> 1006 82.0 11 1790 1790-11-01 04:00:00 bone 503 #> 1007 74.0 12 1790 1790-12-01 14:00:01 blood 504 #> 1008 74.0 12 1790 1790-12-01 14:00:01 bone 504 #> 1009 72.7 1 1791 1791-01-01 00:00:00 blood 505 #> 1010 72.7 1 1791 1791-01-01 00:00:00 bone 505 #> 1011 62.0 2 1791 1791-01-31 10:00:00 blood 506 #> 1012 62.0 2 1791 1791-01-31 10:00:00 bone 506 #> 1013 74.0 3 1791 1791-03-02 20:00:01 blood 507 #> 1014 74.0 3 1791 1791-03-02 20:00:01 bone 507 #> 1015 77.2 4 1791 1791-04-02 06:00:00 blood 508 #> 1016 77.2 4 1791 1791-04-02 06:00:00 bone 508 #> 1017 73.7 5 1791 1791-05-02 16:00:00 blood 509 #> 1018 73.7 5 1791 1791-05-02 16:00:00 bone 509 #> 1019 64.2 6 1791 1791-06-02 02:00:01 blood 510 #> 1020 64.2 6 1791 1791-06-02 02:00:01 bone 510 #> 1021 71.0 7 1791 1791-07-02 12:00:00 blood 511 #> 1022 71.0 7 1791 1791-07-02 12:00:00 bone 511 #> 1023 43.0 8 1791 1791-08-01 22:00:00 blood 512 #> 1024 43.0 8 1791 1791-08-01 22:00:00 bone 512 #> 1025 66.5 9 1791 1791-09-01 08:00:01 blood 513 #> 1026 66.5 9 1791 1791-09-01 08:00:01 bone 513 #> 1027 61.7 10 1791 1791-10-01 18:00:00 blood 514 #> 1028 61.7 10 1791 1791-10-01 18:00:00 bone 514 #> 1029 67.0 11 1791 1791-11-01 04:00:00 blood 515 #> 1030 67.0 11 1791 1791-11-01 04:00:00 bone 515 #> 1031 66.0 12 1791 1791-12-01 14:00:01 blood 516 #> 1032 66.0 12 1791 1791-12-01 14:00:01 bone 516 #> 1033 58.0 1 1792 1792-01-01 00:00:00 blood 517 #> 1034 58.0 1 1792 1792-01-01 00:00:00 bone 517 #> 1035 64.0 2 1792 1792-01-31 12:00:00 blood 518 #> 1036 64.0 2 1792 1792-01-31 12:00:00 bone 518 #> 1037 63.0 3 1792 1792-03-02 00:00:01 blood 519 #> 1038 63.0 3 1792 1792-03-02 00:00:01 bone 519 #> 1039 75.7 4 1792 1792-04-01 12:00:00 blood 520 #> 1040 75.7 4 1792 1792-04-01 12:00:00 bone 520 #> 1041 62.0 5 1792 1792-05-02 00:00:00 blood 521 #> 1042 62.0 5 1792 1792-05-02 00:00:00 bone 521 #> 1043 61.0 6 1792 1792-06-01 12:00:01 blood 522 #> 1044 61.0 6 1792 1792-06-01 12:00:01 bone 522 #> 1045 45.8 7 1792 1792-07-02 00:00:00 blood 523 #> 1046 45.8 7 1792 1792-07-02 00:00:00 bone 523 #> 1047 60.0 8 1792 1792-08-01 12:00:00 blood 524 #> 1048 60.0 8 1792 1792-08-01 12:00:00 bone 524 #> 1049 59.0 9 1792 1792-09-01 00:00:01 blood 525 #> 1050 59.0 9 1792 1792-09-01 00:00:01 bone 525 #> 1051 59.0 10 1792 1792-10-01 12:00:00 blood 526 #> 1052 59.0 10 1792 1792-10-01 12:00:00 bone 526 #> 1053 57.0 11 1792 1792-11-01 00:00:00 blood 527 #> 1054 57.0 11 1792 1792-11-01 00:00:00 bone 527 #> 1055 56.0 12 1792 1792-12-01 12:00:01 blood 528 #> 1056 56.0 12 1792 1792-12-01 12:00:01 bone 528 #> 1057 56.0 1 1793 1793-01-01 00:00:00 blood 529 #> 1058 56.0 1 1793 1793-01-01 00:00:00 bone 529 #> 1059 55.0 2 1793 1793-01-31 10:00:00 blood 530 #> 1060 55.0 2 1793 1793-01-31 10:00:00 bone 530 #> 1061 55.5 3 1793 1793-03-02 20:00:01 blood 531 #> 1062 55.5 3 1793 1793-03-02 20:00:01 bone 531 #> 1063 53.0 4 1793 1793-04-02 06:00:00 blood 532 #> 1064 53.0 4 1793 1793-04-02 06:00:00 bone 532 #> 1065 52.3 5 1793 1793-05-02 16:00:00 blood 533 #> 1066 52.3 5 1793 1793-05-02 16:00:00 bone 533 #> 1067 51.0 6 1793 1793-06-02 02:00:01 blood 534 #> 1068 51.0 6 1793 1793-06-02 02:00:01 bone 534 #> 1069 50.0 7 1793 1793-07-02 12:00:00 blood 535 #> 1070 50.0 7 1793 1793-07-02 12:00:00 bone 535 #> 1071 29.3 8 1793 1793-08-01 22:00:00 blood 536 #> 1072 29.3 8 1793 1793-08-01 22:00:00 bone 536 #> 1073 24.0 9 1793 1793-09-01 08:00:01 blood 537 #> 1074 24.0 9 1793 1793-09-01 08:00:01 bone 537 #> 1075 47.0 10 1793 1793-10-01 18:00:00 blood 538 #> 1076 47.0 10 1793 1793-10-01 18:00:00 bone 538 #> 1077 44.0 11 1793 1793-11-01 04:00:00 blood 539 #> 1078 44.0 11 1793 1793-11-01 04:00:00 bone 539 #> 1079 45.7 12 1793 1793-12-01 14:00:01 blood 540 #> 1080 45.7 12 1793 1793-12-01 14:00:01 bone 540 #> 1081 45.0 1 1794 1794-01-01 00:00:00 blood 541 #> 1082 45.0 1 1794 1794-01-01 00:00:00 bone 541 #> 1083 44.0 2 1794 1794-01-31 10:00:00 blood 542 #> 1084 44.0 2 1794 1794-01-31 10:00:00 bone 542 #> 1085 38.0 3 1794 1794-03-02 20:00:01 blood 543 #> 1086 38.0 3 1794 1794-03-02 20:00:01 bone 543 #> 1087 28.4 4 1794 1794-04-02 06:00:00 blood 544 #> 1088 28.4 4 1794 1794-04-02 06:00:00 bone 544 #> 1089 55.7 5 1794 1794-05-02 16:00:00 blood 545 #> 1090 55.7 5 1794 1794-05-02 16:00:00 bone 545 #> 1091 41.5 6 1794 1794-06-02 02:00:01 blood 546 #> 1092 41.5 6 1794 1794-06-02 02:00:01 bone 546 #> 1093 41.0 7 1794 1794-07-02 12:00:00 blood 547 #> 1094 41.0 7 1794 1794-07-02 12:00:00 bone 547 #> 1095 40.0 8 1794 1794-08-01 22:00:00 blood 548 #> 1096 40.0 8 1794 1794-08-01 22:00:00 bone 548 #> 1097 11.1 9 1794 1794-09-01 08:00:01 blood 549 #> 1098 11.1 9 1794 1794-09-01 08:00:01 bone 549 #> 1099 28.5 10 1794 1794-10-01 18:00:00 blood 550 #> 1100 28.5 10 1794 1794-10-01 18:00:00 bone 550 #> 1101 67.4 11 1794 1794-11-01 04:00:00 blood 551 #> 1102 67.4 11 1794 1794-11-01 04:00:00 bone 551 #> 1103 51.4 12 1794 1794-12-01 14:00:01 blood 552 #> 1104 51.4 12 1794 1794-12-01 14:00:01 bone 552 #> 1105 21.4 1 1795 1795-01-01 00:00:00 blood 553 #> 1106 21.4 1 1795 1795-01-01 00:00:00 bone 553 #> 1107 39.9 2 1795 1795-01-31 10:00:00 blood 554 #> 1108 39.9 2 1795 1795-01-31 10:00:00 bone 554 #> 1109 12.6 3 1795 1795-03-02 20:00:01 blood 555 #> 1110 12.6 3 1795 1795-03-02 20:00:01 bone 555 #> 1111 18.6 4 1795 1795-04-02 06:00:00 blood 556 #> 1112 18.6 4 1795 1795-04-02 06:00:00 bone 556 #> 1113 31.0 5 1795 1795-05-02 16:00:00 blood 557 #> 1114 31.0 5 1795 1795-05-02 16:00:00 bone 557 #> 1115 17.1 6 1795 1795-06-02 02:00:01 blood 558 #> 1116 17.1 6 1795 1795-06-02 02:00:01 bone 558 #> 1117 12.9 7 1795 1795-07-02 12:00:00 blood 559 #> 1118 12.9 7 1795 1795-07-02 12:00:00 bone 559 #> 1119 25.7 8 1795 1795-08-01 22:00:00 blood 560 #> 1120 25.7 8 1795 1795-08-01 22:00:00 bone 560 #> 1121 13.5 9 1795 1795-09-01 08:00:01 blood 561 #> 1122 13.5 9 1795 1795-09-01 08:00:01 bone 561 #> 1123 19.5 10 1795 1795-10-01 18:00:00 blood 562 #> 1124 19.5 10 1795 1795-10-01 18:00:00 bone 562 #> 1125 25.0 11 1795 1795-11-01 04:00:00 blood 563 #> 1126 25.0 11 1795 1795-11-01 04:00:00 bone 563 #> 1127 18.0 12 1795 1795-12-01 14:00:01 blood 564 #> 1128 18.0 12 1795 1795-12-01 14:00:01 bone 564 #> 1129 22.0 1 1796 1796-01-01 00:00:00 blood 565 #> 1130 22.0 1 1796 1796-01-01 00:00:00 bone 565 #> 1131 23.8 2 1796 1796-01-31 12:00:00 blood 566 #> 1132 23.8 2 1796 1796-01-31 12:00:00 bone 566 #> 1133 15.7 3 1796 1796-03-02 00:00:01 blood 567 #> 1134 15.7 3 1796 1796-03-02 00:00:01 bone 567 #> 1135 31.7 4 1796 1796-04-01 12:00:00 blood 568 #> 1136 31.7 4 1796 1796-04-01 12:00:00 bone 568 #> 1137 21.0 5 1796 1796-05-02 00:00:00 blood 569 #> 1138 21.0 5 1796 1796-05-02 00:00:00 bone 569 #> 1139 6.7 6 1796 1796-06-01 12:00:01 blood 570 #> 1140 6.7 6 1796 1796-06-01 12:00:01 bone 570 #> 1141 26.9 7 1796 1796-07-02 00:00:00 blood 571 #> 1142 26.9 7 1796 1796-07-02 00:00:00 bone 571 #> 1143 1.5 8 1796 1796-08-01 12:00:00 blood 572 #> 1144 1.5 8 1796 1796-08-01 12:00:00 bone 572 #> 1145 18.4 9 1796 1796-09-01 00:00:01 blood 573 #> 1146 18.4 9 1796 1796-09-01 00:00:01 bone 573 #> 1147 11.0 10 1796 1796-10-01 12:00:00 blood 574 #> 1148 11.0 10 1796 1796-10-01 12:00:00 bone 574 #> 1149 8.4 11 1796 1796-11-01 00:00:00 blood 575 #> 1150 8.4 11 1796 1796-11-01 00:00:00 bone 575 #> 1151 5.1 12 1796 1796-12-01 12:00:01 blood 576 #> 1152 5.1 12 1796 1796-12-01 12:00:01 bone 576 #> 1153 14.4 1 1797 1797-01-01 00:00:00 blood 577 #> 1154 14.4 1 1797 1797-01-01 00:00:00 bone 577 #> 1155 4.2 2 1797 1797-01-31 10:00:00 blood 578 #> 1156 4.2 2 1797 1797-01-31 10:00:00 bone 578 #> 1157 4.0 3 1797 1797-03-02 20:00:01 blood 579 #> 1158 4.0 3 1797 1797-03-02 20:00:01 bone 579 #> 1159 4.0 4 1797 1797-04-02 06:00:00 blood 580 #> 1160 4.0 4 1797 1797-04-02 06:00:00 bone 580 #> 1161 7.3 5 1797 1797-05-02 16:00:00 blood 581 #> 1162 7.3 5 1797 1797-05-02 16:00:00 bone 581 #> 1163 11.1 6 1797 1797-06-02 02:00:01 blood 582 #> 1164 11.1 6 1797 1797-06-02 02:00:01 bone 582 #> 1165 4.3 7 1797 1797-07-02 12:00:00 blood 583 #> 1166 4.3 7 1797 1797-07-02 12:00:00 bone 583 #> 1167 6.0 8 1797 1797-08-01 22:00:00 blood 584 #> 1168 6.0 8 1797 1797-08-01 22:00:00 bone 584 #> 1169 5.7 9 1797 1797-09-01 08:00:01 blood 585 #> 1170 5.7 9 1797 1797-09-01 08:00:01 bone 585 #> 1171 6.9 10 1797 1797-10-01 18:00:00 blood 586 #> 1172 6.9 10 1797 1797-10-01 18:00:00 bone 586 #> 1173 5.8 11 1797 1797-11-01 04:00:00 blood 587 #> 1174 5.8 11 1797 1797-11-01 04:00:00 bone 587 #> 1175 3.0 12 1797 1797-12-01 14:00:01 blood 588 #> 1176 3.0 12 1797 1797-12-01 14:00:01 bone 588 #> 1177 2.0 1 1798 1798-01-01 00:00:00 blood 589 #> 1178 2.0 1 1798 1798-01-01 00:00:00 bone 589 #> 1179 4.0 2 1798 1798-01-31 10:00:00 blood 590 #> 1180 4.0 2 1798 1798-01-31 10:00:00 bone 590 #> 1181 12.4 3 1798 1798-03-02 20:00:01 blood 591 #> 1182 12.4 3 1798 1798-03-02 20:00:01 bone 591 #> 1183 1.1 4 1798 1798-04-02 06:00:00 blood 592 #> 1184 1.1 4 1798 1798-04-02 06:00:00 bone 592 #> 1185 0.0 5 1798 1798-05-02 16:00:00 blood 593 #> 1186 0.0 5 1798 1798-05-02 16:00:00 bone 593 #> 1187 0.0 6 1798 1798-06-02 02:00:01 blood 594 #> 1188 0.0 6 1798 1798-06-02 02:00:01 bone 594 #> 1189 0.0 7 1798 1798-07-02 12:00:00 blood 595 #> 1190 0.0 7 1798 1798-07-02 12:00:00 bone 595 #> 1191 3.0 8 1798 1798-08-01 22:00:00 blood 596 #> 1192 3.0 8 1798 1798-08-01 22:00:00 bone 596 #> 1193 2.4 9 1798 1798-09-01 08:00:01 blood 597 #> 1194 2.4 9 1798 1798-09-01 08:00:01 bone 597 #> 1195 1.5 10 1798 1798-10-01 18:00:00 blood 598 #> 1196 1.5 10 1798 1798-10-01 18:00:00 bone 598 #> 1197 12.5 11 1798 1798-11-01 04:00:00 blood 599 #> 1198 12.5 11 1798 1798-11-01 04:00:00 bone 599 #> 1199 9.9 12 1798 1798-12-01 14:00:01 blood 600 #> 1200 9.9 12 1798 1798-12-01 14:00:01 bone 600 #> 1201 1.6 1 1799 1799-01-01 00:00:00 blood 601 #> 1202 1.6 1 1799 1799-01-01 00:00:00 bone 601 #> 1203 12.6 2 1799 1799-01-31 10:00:00 blood 602 #> 1204 12.6 2 1799 1799-01-31 10:00:00 bone 602 #> 1205 21.7 3 1799 1799-03-02 20:00:01 blood 603 #> 1206 21.7 3 1799 1799-03-02 20:00:01 bone 603 #> 1207 8.4 4 1799 1799-04-02 06:00:00 blood 604 #> 1208 8.4 4 1799 1799-04-02 06:00:00 bone 604 #> 1209 8.2 5 1799 1799-05-02 16:00:00 blood 605 #> 1210 8.2 5 1799 1799-05-02 16:00:00 bone 605 #> 1211 10.6 6 1799 1799-06-02 02:00:01 blood 606 #> 1212 10.6 6 1799 1799-06-02 02:00:01 bone 606 #> 1213 2.1 7 1799 1799-07-02 12:00:00 blood 607 #> 1214 2.1 7 1799 1799-07-02 12:00:00 bone 607 #> 1215 0.0 8 1799 1799-08-01 22:00:00 blood 608 #> 1216 0.0 8 1799 1799-08-01 22:00:00 bone 608 #> 1217 0.0 9 1799 1799-09-01 08:00:01 blood 609 #> 1218 0.0 9 1799 1799-09-01 08:00:01 bone 609 #> 1219 4.6 10 1799 1799-10-01 18:00:00 blood 610 #> 1220 4.6 10 1799 1799-10-01 18:00:00 bone 610 #> 1221 2.7 11 1799 1799-11-01 04:00:00 blood 611 #> 1222 2.7 11 1799 1799-11-01 04:00:00 bone 611 #> 1223 8.6 12 1799 1799-12-01 14:00:01 blood 612 #> 1224 8.6 12 1799 1799-12-01 14:00:01 bone 612 #> 1225 6.9 1 1800 1800-01-01 00:00:00 blood 613 #> 1226 6.9 1 1800 1800-01-01 00:00:00 bone 613 #> 1227 9.3 2 1800 1800-01-31 10:00:00 blood 614 #> 1228 9.3 2 1800 1800-01-31 10:00:00 bone 614 #> 1229 13.9 3 1800 1800-03-02 20:00:01 blood 615 #> 1230 13.9 3 1800 1800-03-02 20:00:01 bone 615 #> 1231 0.0 4 1800 1800-04-02 06:00:00 blood 616 #> 1232 0.0 4 1800 1800-04-02 06:00:00 bone 616 #> 1233 5.0 5 1800 1800-05-02 16:00:00 blood 617 #> 1234 5.0 5 1800 1800-05-02 16:00:00 bone 617 #> 1235 23.7 6 1800 1800-06-02 02:00:01 blood 618 #> 1236 23.7 6 1800 1800-06-02 02:00:01 bone 618 #> 1237 21.0 7 1800 1800-07-02 12:00:00 blood 619 #> 1238 21.0 7 1800 1800-07-02 12:00:00 bone 619 #> 1239 19.5 8 1800 1800-08-01 22:00:00 blood 620 #> 1240 19.5 8 1800 1800-08-01 22:00:00 bone 620 #> 1241 11.5 9 1800 1800-09-01 08:00:01 blood 621 #> 1242 11.5 9 1800 1800-09-01 08:00:01 bone 621 #> 1243 12.3 10 1800 1800-10-01 18:00:00 blood 622 #> 1244 12.3 10 1800 1800-10-01 18:00:00 bone 622 #> 1245 10.5 11 1800 1800-11-01 04:00:00 blood 623 #> 1246 10.5 11 1800 1800-11-01 04:00:00 bone 623 #> 1247 40.1 12 1800 1800-12-01 14:00:01 blood 624 #> 1248 40.1 12 1800 1800-12-01 14:00:01 bone 624 #> 1249 27.0 1 1801 1801-01-01 00:00:00 blood 625 #> 1250 27.0 1 1801 1801-01-01 00:00:00 bone 625 #> 1251 29.0 2 1801 1801-01-31 10:00:00 blood 626 #> 1252 29.0 2 1801 1801-01-31 10:00:00 bone 626 #> 1253 30.0 3 1801 1801-03-02 20:00:01 blood 627 #> 1254 30.0 3 1801 1801-03-02 20:00:01 bone 627 #> 1255 31.0 4 1801 1801-04-02 06:00:00 blood 628 #> 1256 31.0 4 1801 1801-04-02 06:00:00 bone 628 #> 1257 32.0 5 1801 1801-05-02 16:00:00 blood 629 #> 1258 32.0 5 1801 1801-05-02 16:00:00 bone 629 #> 1259 31.2 6 1801 1801-06-02 02:00:01 blood 630 #> 1260 31.2 6 1801 1801-06-02 02:00:01 bone 630 #> 1261 35.0 7 1801 1801-07-02 12:00:00 blood 631 #> 1262 35.0 7 1801 1801-07-02 12:00:00 bone 631 #> 1263 38.7 8 1801 1801-08-01 22:00:00 blood 632 #> 1264 38.7 8 1801 1801-08-01 22:00:00 bone 632 #> 1265 33.5 9 1801 1801-09-01 08:00:01 blood 633 #> 1266 33.5 9 1801 1801-09-01 08:00:01 bone 633 #> 1267 32.6 10 1801 1801-10-01 18:00:00 blood 634 #> 1268 32.6 10 1801 1801-10-01 18:00:00 bone 634 #> 1269 39.8 11 1801 1801-11-01 04:00:00 blood 635 #> 1270 39.8 11 1801 1801-11-01 04:00:00 bone 635 #> 1271 48.2 12 1801 1801-12-01 14:00:01 blood 636 #> 1272 48.2 12 1801 1801-12-01 14:00:01 bone 636 #> 1273 47.8 1 1802 1802-01-01 00:00:00 blood 637 #> 1274 47.8 1 1802 1802-01-01 00:00:00 bone 637 #> 1275 47.0 2 1802 1802-01-31 10:00:00 blood 638 #> 1276 47.0 2 1802 1802-01-31 10:00:00 bone 638 #> 1277 40.8 3 1802 1802-03-02 20:00:01 blood 639 #> 1278 40.8 3 1802 1802-03-02 20:00:01 bone 639 #> 1279 42.0 4 1802 1802-04-02 06:00:00 blood 640 #> 1280 42.0 4 1802 1802-04-02 06:00:00 bone 640 #> 1281 44.0 5 1802 1802-05-02 16:00:00 blood 641 #> 1282 44.0 5 1802 1802-05-02 16:00:00 bone 641 #> 1283 46.0 6 1802 1802-06-02 02:00:01 blood 642 #> 1284 46.0 6 1802 1802-06-02 02:00:01 bone 642 #> 1285 48.0 7 1802 1802-07-02 12:00:00 blood 643 #> 1286 48.0 7 1802 1802-07-02 12:00:00 bone 643 #> 1287 50.0 8 1802 1802-08-01 22:00:00 blood 644 #> 1288 50.0 8 1802 1802-08-01 22:00:00 bone 644 #> 1289 51.8 9 1802 1802-09-01 08:00:01 blood 645 #> 1290 51.8 9 1802 1802-09-01 08:00:01 bone 645 #> 1291 38.5 10 1802 1802-10-01 18:00:00 blood 646 #> 1292 38.5 10 1802 1802-10-01 18:00:00 bone 646 #> 1293 34.5 11 1802 1802-11-01 04:00:00 blood 647 #> 1294 34.5 11 1802 1802-11-01 04:00:00 bone 647 #> 1295 50.0 12 1802 1802-12-01 14:00:01 blood 648 #> 1296 50.0 12 1802 1802-12-01 14:00:01 bone 648 #> 1297 50.0 1 1803 1803-01-01 00:00:00 blood 649 #> 1298 50.0 1 1803 1803-01-01 00:00:00 bone 649 #> 1299 50.8 2 1803 1803-01-31 10:00:00 blood 650 #> 1300 50.8 2 1803 1803-01-31 10:00:00 bone 650 #> 1301 29.5 3 1803 1803-03-02 20:00:01 blood 651 #> 1302 29.5 3 1803 1803-03-02 20:00:01 bone 651 #> 1303 25.0 4 1803 1803-04-02 06:00:00 blood 652 #> 1304 25.0 4 1803 1803-04-02 06:00:00 bone 652 #> 1305 44.3 5 1803 1803-05-02 16:00:00 blood 653 #> 1306 44.3 5 1803 1803-05-02 16:00:00 bone 653 #> 1307 36.0 6 1803 1803-06-02 02:00:01 blood 654 #> 1308 36.0 6 1803 1803-06-02 02:00:01 bone 654 #> 1309 48.3 7 1803 1803-07-02 12:00:00 blood 655 #> 1310 48.3 7 1803 1803-07-02 12:00:00 bone 655 #> 1311 34.1 8 1803 1803-08-01 22:00:00 blood 656 #> 1312 34.1 8 1803 1803-08-01 22:00:00 bone 656 #> 1313 45.3 9 1803 1803-09-01 08:00:01 blood 657 #> 1314 45.3 9 1803 1803-09-01 08:00:01 bone 657 #> 1315 54.3 10 1803 1803-10-01 18:00:00 blood 658 #> 1316 54.3 10 1803 1803-10-01 18:00:00 bone 658 #> 1317 51.0 11 1803 1803-11-01 04:00:00 blood 659 #> 1318 51.0 11 1803 1803-11-01 04:00:00 bone 659 #> 1319 48.0 12 1803 1803-12-01 14:00:01 blood 660 #> 1320 48.0 12 1803 1803-12-01 14:00:01 bone 660 #> 1321 45.3 1 1804 1804-01-01 00:00:00 blood 661 #> 1322 45.3 1 1804 1804-01-01 00:00:00 bone 661 #> 1323 48.3 2 1804 1804-01-31 12:00:00 blood 662 #> 1324 48.3 2 1804 1804-01-31 12:00:00 bone 662 #> 1325 48.0 3 1804 1804-03-02 00:00:01 blood 663 #> 1326 48.0 3 1804 1804-03-02 00:00:01 bone 663 #> 1327 50.6 4 1804 1804-04-01 12:00:00 blood 664 #> 1328 50.6 4 1804 1804-04-01 12:00:00 bone 664 #> 1329 33.4 5 1804 1804-05-02 00:00:00 blood 665 #> 1330 33.4 5 1804 1804-05-02 00:00:00 bone 665 #> 1331 34.8 6 1804 1804-06-01 12:00:01 blood 666 #> 1332 34.8 6 1804 1804-06-01 12:00:01 bone 666 #> 1333 29.8 7 1804 1804-07-02 00:00:00 blood 667 #> 1334 29.8 7 1804 1804-07-02 00:00:00 bone 667 #> 1335 43.1 8 1804 1804-08-01 12:00:00 blood 668 #> 1336 43.1 8 1804 1804-08-01 12:00:00 bone 668 #> 1337 53.0 9 1804 1804-09-01 00:00:01 blood 669 #> 1338 53.0 9 1804 1804-09-01 00:00:01 bone 669 #> 1339 62.3 10 1804 1804-10-01 12:00:00 blood 670 #> 1340 62.3 10 1804 1804-10-01 12:00:00 bone 670 #> 1341 61.0 11 1804 1804-11-01 00:00:00 blood 671 #> 1342 61.0 11 1804 1804-11-01 00:00:00 bone 671 #> 1343 60.0 12 1804 1804-12-01 12:00:01 blood 672 #> 1344 60.0 12 1804 1804-12-01 12:00:01 bone 672 #> 1345 61.0 1 1805 1805-01-01 00:00:00 blood 673 #> 1346 61.0 1 1805 1805-01-01 00:00:00 bone 673 #> 1347 44.1 2 1805 1805-01-31 10:00:00 blood 674 #> 1348 44.1 2 1805 1805-01-31 10:00:00 bone 674 #> 1349 51.4 3 1805 1805-03-02 20:00:01 blood 675 #> 1350 51.4 3 1805 1805-03-02 20:00:01 bone 675 #> 1351 37.5 4 1805 1805-04-02 06:00:00 blood 676 #> 1352 37.5 4 1805 1805-04-02 06:00:00 bone 676 #> 1353 39.0 5 1805 1805-05-02 16:00:00 blood 677 #> 1354 39.0 5 1805 1805-05-02 16:00:00 bone 677 #> 1355 40.5 6 1805 1805-06-02 02:00:01 blood 678 #> 1356 40.5 6 1805 1805-06-02 02:00:01 bone 678 #> 1357 37.6 7 1805 1805-07-02 12:00:00 blood 679 #> 1358 37.6 7 1805 1805-07-02 12:00:00 bone 679 #> 1359 42.7 8 1805 1805-08-01 22:00:00 blood 680 #> 1360 42.7 8 1805 1805-08-01 22:00:00 bone 680 #> 1361 44.4 9 1805 1805-09-01 08:00:01 blood 681 #> 1362 44.4 9 1805 1805-09-01 08:00:01 bone 681 #> 1363 29.4 10 1805 1805-10-01 18:00:00 blood 682 #> 1364 29.4 10 1805 1805-10-01 18:00:00 bone 682 #> 1365 41.0 11 1805 1805-11-01 04:00:00 blood 683 #> 1366 41.0 11 1805 1805-11-01 04:00:00 bone 683 #> 1367 38.3 12 1805 1805-12-01 14:00:01 blood 684 #> 1368 38.3 12 1805 1805-12-01 14:00:01 bone 684 #> 1369 39.0 1 1806 1806-01-01 00:00:00 blood 685 #> 1370 39.0 1 1806 1806-01-01 00:00:00 bone 685 #> 1371 29.6 2 1806 1806-01-31 10:00:00 blood 686 #> 1372 29.6 2 1806 1806-01-31 10:00:00 bone 686 #> 1373 32.7 3 1806 1806-03-02 20:00:01 blood 687 #> 1374 32.7 3 1806 1806-03-02 20:00:01 bone 687 #> 1375 27.7 4 1806 1806-04-02 06:00:00 blood 688 #> 1376 27.7 4 1806 1806-04-02 06:00:00 bone 688 #> 1377 26.4 5 1806 1806-05-02 16:00:00 blood 689 #> 1378 26.4 5 1806 1806-05-02 16:00:00 bone 689 #> 1379 25.6 6 1806 1806-06-02 02:00:01 blood 690 #> 1380 25.6 6 1806 1806-06-02 02:00:01 bone 690 #> 1381 30.0 7 1806 1806-07-02 12:00:00 blood 691 #> 1382 30.0 7 1806 1806-07-02 12:00:00 bone 691 #> 1383 26.3 8 1806 1806-08-01 22:00:00 blood 692 #> 1384 26.3 8 1806 1806-08-01 22:00:00 bone 692 #> 1385 24.0 9 1806 1806-09-01 08:00:01 blood 693 #> 1386 24.0 9 1806 1806-09-01 08:00:01 bone 693 #> 1387 27.0 10 1806 1806-10-01 18:00:00 blood 694 #> 1388 27.0 10 1806 1806-10-01 18:00:00 bone 694 #> 1389 25.0 11 1806 1806-11-01 04:00:00 blood 695 #> 1390 25.0 11 1806 1806-11-01 04:00:00 bone 695 #> 1391 24.0 12 1806 1806-12-01 14:00:01 blood 696 #> 1392 24.0 12 1806 1806-12-01 14:00:01 bone 696 #> 1393 12.0 1 1807 1807-01-01 00:00:00 blood 697 #> 1394 12.0 1 1807 1807-01-01 00:00:00 bone 697 #> 1395 12.2 2 1807 1807-01-31 10:00:00 blood 698 #> 1396 12.2 2 1807 1807-01-31 10:00:00 bone 698 #> 1397 9.6 3 1807 1807-03-02 20:00:01 blood 699 #> 1398 9.6 3 1807 1807-03-02 20:00:01 bone 699 #> 1399 23.8 4 1807 1807-04-02 06:00:00 blood 700 #> 1400 23.8 4 1807 1807-04-02 06:00:00 bone 700 #> 1401 10.0 5 1807 1807-05-02 16:00:00 blood 701 #> 1402 10.0 5 1807 1807-05-02 16:00:00 bone 701 #> 1403 12.0 6 1807 1807-06-02 02:00:01 blood 702 #> 1404 12.0 6 1807 1807-06-02 02:00:01 bone 702 #> 1405 12.7 7 1807 1807-07-02 12:00:00 blood 703 #> 1406 12.7 7 1807 1807-07-02 12:00:00 bone 703 #> 1407 12.0 8 1807 1807-08-01 22:00:00 blood 704 #> 1408 12.0 8 1807 1807-08-01 22:00:00 bone 704 #> 1409 5.7 9 1807 1807-09-01 08:00:01 blood 705 #> 1410 5.7 9 1807 1807-09-01 08:00:01 bone 705 #> 1411 8.0 10 1807 1807-10-01 18:00:00 blood 706 #> 1412 8.0 10 1807 1807-10-01 18:00:00 bone 706 #> 1413 2.6 11 1807 1807-11-01 04:00:00 blood 707 #> 1414 2.6 11 1807 1807-11-01 04:00:00 bone 707 #> 1415 0.0 12 1807 1807-12-01 14:00:01 blood 708 #> 1416 0.0 12 1807 1807-12-01 14:00:01 bone 708 #> 1417 0.0 1 1808 1808-01-01 00:00:00 blood 709 #> 1418 0.0 1 1808 1808-01-01 00:00:00 bone 709 #> 1419 4.5 2 1808 1808-01-31 12:00:00 blood 710 #> 1420 4.5 2 1808 1808-01-31 12:00:00 bone 710 #> 1421 0.0 3 1808 1808-03-02 00:00:01 blood 711 #> 1422 0.0 3 1808 1808-03-02 00:00:01 bone 711 #> 1423 12.3 4 1808 1808-04-01 12:00:00 blood 712 #> 1424 12.3 4 1808 1808-04-01 12:00:00 bone 712 #> 1425 13.5 5 1808 1808-05-02 00:00:00 blood 713 #> 1426 13.5 5 1808 1808-05-02 00:00:00 bone 713 #> 1427 13.5 6 1808 1808-06-01 12:00:01 blood 714 #> 1428 13.5 6 1808 1808-06-01 12:00:01 bone 714 #> 1429 6.7 7 1808 1808-07-02 00:00:00 blood 715 #> 1430 6.7 7 1808 1808-07-02 00:00:00 bone 715 #> 1431 8.0 8 1808 1808-08-01 12:00:00 blood 716 #> 1432 8.0 8 1808 1808-08-01 12:00:00 bone 716 #> 1433 11.7 9 1808 1808-09-01 00:00:01 blood 717 #> 1434 11.7 9 1808 1808-09-01 00:00:01 bone 717 #> 1435 4.7 10 1808 1808-10-01 12:00:00 blood 718 #> 1436 4.7 10 1808 1808-10-01 12:00:00 bone 718 #> 1437 10.5 11 1808 1808-11-01 00:00:00 blood 719 #> 1438 10.5 11 1808 1808-11-01 00:00:00 bone 719 #> 1439 12.3 12 1808 1808-12-01 12:00:01 blood 720 #> 1440 12.3 12 1808 1808-12-01 12:00:01 bone 720 #> 1441 7.2 1 1809 1809-01-01 00:00:00 blood 721 #> 1442 7.2 1 1809 1809-01-01 00:00:00 bone 721 #> 1443 9.2 2 1809 1809-01-31 10:00:00 blood 722 #> 1444 9.2 2 1809 1809-01-31 10:00:00 bone 722 #> 1445 0.9 3 1809 1809-03-02 20:00:01 blood 723 #> 1446 0.9 3 1809 1809-03-02 20:00:01 bone 723 #> 1447 2.5 4 1809 1809-04-02 06:00:00 blood 724 #> 1448 2.5 4 1809 1809-04-02 06:00:00 bone 724 #> 1449 2.0 5 1809 1809-05-02 16:00:00 blood 725 #> 1450 2.0 5 1809 1809-05-02 16:00:00 bone 725 #> 1451 7.7 6 1809 1809-06-02 02:00:01 blood 726 #> 1452 7.7 6 1809 1809-06-02 02:00:01 bone 726 #> 1453 0.3 7 1809 1809-07-02 12:00:00 blood 727 #> 1454 0.3 7 1809 1809-07-02 12:00:00 bone 727 #> 1455 0.2 8 1809 1809-08-01 22:00:00 blood 728 #> 1456 0.2 8 1809 1809-08-01 22:00:00 bone 728 #> 1457 0.4 9 1809 1809-09-01 08:00:01 blood 729 #> 1458 0.4 9 1809 1809-09-01 08:00:01 bone 729 #> 1459 0.0 10 1809 1809-10-01 18:00:00 blood 730 #> 1460 0.0 10 1809 1809-10-01 18:00:00 bone 730 #> 1461 0.0 11 1809 1809-11-01 04:00:00 blood 731 #> 1462 0.0 11 1809 1809-11-01 04:00:00 bone 731 #> 1463 0.0 12 1809 1809-12-01 14:00:01 blood 732 #> 1464 0.0 12 1809 1809-12-01 14:00:01 bone 732 #> 1465 0.0 1 1810 1810-01-01 00:00:00 blood 733 #> 1466 0.0 1 1810 1810-01-01 00:00:00 bone 733 #> 1467 0.0 2 1810 1810-01-31 10:00:00 blood 734 #> 1468 0.0 2 1810 1810-01-31 10:00:00 bone 734 #> 1469 0.0 3 1810 1810-03-02 20:00:01 blood 735 #> 1470 0.0 3 1810 1810-03-02 20:00:01 bone 735 #> 1471 0.0 4 1810 1810-04-02 06:00:00 blood 736 #> 1472 0.0 4 1810 1810-04-02 06:00:00 bone 736 #> 1473 0.0 5 1810 1810-05-02 16:00:00 blood 737 #> 1474 0.0 5 1810 1810-05-02 16:00:00 bone 737 #> 1475 0.0 6 1810 1810-06-02 02:00:01 blood 738 #> 1476 0.0 6 1810 1810-06-02 02:00:01 bone 738 #> 1477 0.0 7 1810 1810-07-02 12:00:00 blood 739 #> 1478 0.0 7 1810 1810-07-02 12:00:00 bone 739 #> 1479 0.0 8 1810 1810-08-01 22:00:00 blood 740 #> 1480 0.0 8 1810 1810-08-01 22:00:00 bone 740 #> 1481 0.0 9 1810 1810-09-01 08:00:01 blood 741 #> 1482 0.0 9 1810 1810-09-01 08:00:01 bone 741 #> 1483 0.0 10 1810 1810-10-01 18:00:00 blood 742 #> 1484 0.0 10 1810 1810-10-01 18:00:00 bone 742 #> 1485 0.0 11 1810 1810-11-01 04:00:00 blood 743 #> 1486 0.0 11 1810 1810-11-01 04:00:00 bone 743 #> 1487 0.0 12 1810 1810-12-01 14:00:01 blood 744 #> 1488 0.0 12 1810 1810-12-01 14:00:01 bone 744 #> 1489 0.0 1 1811 1811-01-01 00:00:00 blood 745 #> 1490 0.0 1 1811 1811-01-01 00:00:00 bone 745 #> 1491 0.0 2 1811 1811-01-31 10:00:00 blood 746 #> 1492 0.0 2 1811 1811-01-31 10:00:00 bone 746 #> 1493 0.0 3 1811 1811-03-02 20:00:01 blood 747 #> 1494 0.0 3 1811 1811-03-02 20:00:01 bone 747 #> 1495 0.0 4 1811 1811-04-02 06:00:00 blood 748 #> 1496 0.0 4 1811 1811-04-02 06:00:00 bone 748 #> 1497 0.0 5 1811 1811-05-02 16:00:00 blood 749 #> 1498 0.0 5 1811 1811-05-02 16:00:00 bone 749 #> 1499 0.0 6 1811 1811-06-02 02:00:01 blood 750 #> 1500 0.0 6 1811 1811-06-02 02:00:01 bone 750 #> 1501 6.6 7 1811 1811-07-02 12:00:00 blood 751 #> 1502 6.6 7 1811 1811-07-02 12:00:00 bone 751 #> 1503 0.0 8 1811 1811-08-01 22:00:00 blood 752 #> 1504 0.0 8 1811 1811-08-01 22:00:00 bone 752 #> 1505 2.4 9 1811 1811-09-01 08:00:01 blood 753 #> 1506 2.4 9 1811 1811-09-01 08:00:01 bone 753 #> 1507 6.1 10 1811 1811-10-01 18:00:00 blood 754 #> 1508 6.1 10 1811 1811-10-01 18:00:00 bone 754 #> 1509 0.8 11 1811 1811-11-01 04:00:00 blood 755 #> 1510 0.8 11 1811 1811-11-01 04:00:00 bone 755 #> 1511 1.1 12 1811 1811-12-01 14:00:01 blood 756 #> 1512 1.1 12 1811 1811-12-01 14:00:01 bone 756 #> 1513 11.3 1 1812 1812-01-01 00:00:00 blood 757 #> 1514 11.3 1 1812 1812-01-01 00:00:00 bone 757 #> 1515 1.9 2 1812 1812-01-31 12:00:00 blood 758 #> 1516 1.9 2 1812 1812-01-31 12:00:00 bone 758 #> 1517 0.7 3 1812 1812-03-02 00:00:01 blood 759 #> 1518 0.7 3 1812 1812-03-02 00:00:01 bone 759 #> 1519 0.0 4 1812 1812-04-01 12:00:00 blood 760 #> 1520 0.0 4 1812 1812-04-01 12:00:00 bone 760 #> 1521 1.0 5 1812 1812-05-02 00:00:00 blood 761 #> 1522 1.0 5 1812 1812-05-02 00:00:00 bone 761 #> 1523 1.3 6 1812 1812-06-01 12:00:01 blood 762 #> 1524 1.3 6 1812 1812-06-01 12:00:01 bone 762 #> 1525 0.5 7 1812 1812-07-02 00:00:00 blood 763 #> 1526 0.5 7 1812 1812-07-02 00:00:00 bone 763 #> 1527 15.6 8 1812 1812-08-01 12:00:00 blood 764 #> 1528 15.6 8 1812 1812-08-01 12:00:00 bone 764 #> 1529 5.2 9 1812 1812-09-01 00:00:01 blood 765 #> 1530 5.2 9 1812 1812-09-01 00:00:01 bone 765 #> 1531 3.9 10 1812 1812-10-01 12:00:00 blood 766 #> 1532 3.9 10 1812 1812-10-01 12:00:00 bone 766 #> 1533 7.9 11 1812 1812-11-01 00:00:00 blood 767 #> 1534 7.9 11 1812 1812-11-01 00:00:00 bone 767 #> 1535 10.1 12 1812 1812-12-01 12:00:01 blood 768 #> 1536 10.1 12 1812 1812-12-01 12:00:01 bone 768 #> 1537 0.0 1 1813 1813-01-01 00:00:00 blood 769 #> 1538 0.0 1 1813 1813-01-01 00:00:00 bone 769 #> 1539 10.3 2 1813 1813-01-31 10:00:00 blood 770 #> 1540 10.3 2 1813 1813-01-31 10:00:00 bone 770 #> 1541 1.9 3 1813 1813-03-02 20:00:01 blood 771 #> 1542 1.9 3 1813 1813-03-02 20:00:01 bone 771 #> 1543 16.6 4 1813 1813-04-02 06:00:00 blood 772 #> 1544 16.6 4 1813 1813-04-02 06:00:00 bone 772 #> 1545 5.5 5 1813 1813-05-02 16:00:00 blood 773 #> 1546 5.5 5 1813 1813-05-02 16:00:00 bone 773 #> 1547 11.2 6 1813 1813-06-02 02:00:01 blood 774 #> 1548 11.2 6 1813 1813-06-02 02:00:01 bone 774 #> 1549 18.3 7 1813 1813-07-02 12:00:00 blood 775 #> 1550 18.3 7 1813 1813-07-02 12:00:00 bone 775 #> 1551 8.4 8 1813 1813-08-01 22:00:00 blood 776 #> 1552 8.4 8 1813 1813-08-01 22:00:00 bone 776 #> 1553 15.3 9 1813 1813-09-01 08:00:01 blood 777 #> 1554 15.3 9 1813 1813-09-01 08:00:01 bone 777 #> 1555 27.8 10 1813 1813-10-01 18:00:00 blood 778 #> 1556 27.8 10 1813 1813-10-01 18:00:00 bone 778 #> 1557 16.7 11 1813 1813-11-01 04:00:00 blood 779 #> 1558 16.7 11 1813 1813-11-01 04:00:00 bone 779 #> 1559 14.3 12 1813 1813-12-01 14:00:01 blood 780 #> 1560 14.3 12 1813 1813-12-01 14:00:01 bone 780 #> 1561 22.2 1 1814 1814-01-01 00:00:00 blood 781 #> 1562 22.2 1 1814 1814-01-01 00:00:00 bone 781 #> 1563 12.0 2 1814 1814-01-31 10:00:00 blood 782 #> 1564 12.0 2 1814 1814-01-31 10:00:00 bone 782 #> 1565 5.7 3 1814 1814-03-02 20:00:01 blood 783 #> 1566 5.7 3 1814 1814-03-02 20:00:01 bone 783 #> 1567 23.8 4 1814 1814-04-02 06:00:00 blood 784 #> 1568 23.8 4 1814 1814-04-02 06:00:00 bone 784 #> 1569 5.8 5 1814 1814-05-02 16:00:00 blood 785 #> 1570 5.8 5 1814 1814-05-02 16:00:00 bone 785 #> 1571 14.9 6 1814 1814-06-02 02:00:01 blood 786 #> 1572 14.9 6 1814 1814-06-02 02:00:01 bone 786 #> 1573 18.5 7 1814 1814-07-02 12:00:00 blood 787 #> 1574 18.5 7 1814 1814-07-02 12:00:00 bone 787 #> 1575 2.3 8 1814 1814-08-01 22:00:00 blood 788 #> 1576 2.3 8 1814 1814-08-01 22:00:00 bone 788 #> 1577 8.1 9 1814 1814-09-01 08:00:01 blood 789 #> 1578 8.1 9 1814 1814-09-01 08:00:01 bone 789 #> 1579 19.3 10 1814 1814-10-01 18:00:00 blood 790 #> 1580 19.3 10 1814 1814-10-01 18:00:00 bone 790 #> 1581 14.5 11 1814 1814-11-01 04:00:00 blood 791 #> 1582 14.5 11 1814 1814-11-01 04:00:00 bone 791 #> 1583 20.1 12 1814 1814-12-01 14:00:01 blood 792 #> 1584 20.1 12 1814 1814-12-01 14:00:01 bone 792 #> 1585 19.2 1 1815 1815-01-01 00:00:00 blood 793 #> 1586 19.2 1 1815 1815-01-01 00:00:00 bone 793 #> 1587 32.2 2 1815 1815-01-31 10:00:00 blood 794 #> 1588 32.2 2 1815 1815-01-31 10:00:00 bone 794 #> 1589 26.2 3 1815 1815-03-02 20:00:01 blood 795 #> 1590 26.2 3 1815 1815-03-02 20:00:01 bone 795 #> 1591 31.6 4 1815 1815-04-02 06:00:00 blood 796 #> 1592 31.6 4 1815 1815-04-02 06:00:00 bone 796 #> 1593 9.8 5 1815 1815-05-02 16:00:00 blood 797 #> 1594 9.8 5 1815 1815-05-02 16:00:00 bone 797 #> 1595 55.9 6 1815 1815-06-02 02:00:01 blood 798 #> 1596 55.9 6 1815 1815-06-02 02:00:01 bone 798 #> 1597 35.5 7 1815 1815-07-02 12:00:00 blood 799 #> 1598 35.5 7 1815 1815-07-02 12:00:00 bone 799 #> 1599 47.2 8 1815 1815-08-01 22:00:00 blood 800 #> 1600 47.2 8 1815 1815-08-01 22:00:00 bone 800 #> 1601 31.5 9 1815 1815-09-01 08:00:01 blood 801 #> 1602 31.5 9 1815 1815-09-01 08:00:01 bone 801 #> 1603 33.5 10 1815 1815-10-01 18:00:00 blood 802 #> 1604 33.5 10 1815 1815-10-01 18:00:00 bone 802 #> 1605 37.2 11 1815 1815-11-01 04:00:00 blood 803 #> 1606 37.2 11 1815 1815-11-01 04:00:00 bone 803 #> 1607 65.0 12 1815 1815-12-01 14:00:01 blood 804 #> 1608 65.0 12 1815 1815-12-01 14:00:01 bone 804 #> 1609 26.3 1 1816 1816-01-01 00:00:00 blood 805 #> 1610 26.3 1 1816 1816-01-01 00:00:00 bone 805 #> 1611 68.8 2 1816 1816-01-31 12:00:00 blood 806 #> 1612 68.8 2 1816 1816-01-31 12:00:00 bone 806 #> 1613 73.7 3 1816 1816-03-02 00:00:01 blood 807 #> 1614 73.7 3 1816 1816-03-02 00:00:01 bone 807 #> 1615 58.8 4 1816 1816-04-01 12:00:00 blood 808 #> 1616 58.8 4 1816 1816-04-01 12:00:00 bone 808 #> 1617 44.3 5 1816 1816-05-02 00:00:00 blood 809 #> 1618 44.3 5 1816 1816-05-02 00:00:00 bone 809 #> 1619 43.6 6 1816 1816-06-01 12:00:01 blood 810 #> 1620 43.6 6 1816 1816-06-01 12:00:01 bone 810 #> 1621 38.8 7 1816 1816-07-02 00:00:00 blood 811 #> 1622 38.8 7 1816 1816-07-02 00:00:00 bone 811 #> 1623 23.2 8 1816 1816-08-01 12:00:00 blood 812 #> 1624 23.2 8 1816 1816-08-01 12:00:00 bone 812 #> 1625 47.8 9 1816 1816-09-01 00:00:01 blood 813 #> 1626 47.8 9 1816 1816-09-01 00:00:01 bone 813 #> 1627 56.4 10 1816 1816-10-01 12:00:00 blood 814 #> 1628 56.4 10 1816 1816-10-01 12:00:00 bone 814 #> 1629 38.1 11 1816 1816-11-01 00:00:00 blood 815 #> 1630 38.1 11 1816 1816-11-01 00:00:00 bone 815 #> 1631 29.9 12 1816 1816-12-01 12:00:01 blood 816 #> 1632 29.9 12 1816 1816-12-01 12:00:01 bone 816 #> 1633 36.4 1 1817 1817-01-01 00:00:00 blood 817 #> 1634 36.4 1 1817 1817-01-01 00:00:00 bone 817 #> 1635 57.9 2 1817 1817-01-31 10:00:00 blood 818 #> 1636 57.9 2 1817 1817-01-31 10:00:00 bone 818 #> 1637 96.2 3 1817 1817-03-02 20:00:01 blood 819 #> 1638 96.2 3 1817 1817-03-02 20:00:01 bone 819 #> 1639 26.4 4 1817 1817-04-02 06:00:00 blood 820 #> 1640 26.4 4 1817 1817-04-02 06:00:00 bone 820 #> 1641 21.2 5 1817 1817-05-02 16:00:00 blood 821 #> 1642 21.2 5 1817 1817-05-02 16:00:00 bone 821 #> 1643 40.0 6 1817 1817-06-02 02:00:01 blood 822 #> 1644 40.0 6 1817 1817-06-02 02:00:01 bone 822 #> 1645 50.0 7 1817 1817-07-02 12:00:00 blood 823 #> 1646 50.0 7 1817 1817-07-02 12:00:00 bone 823 #> 1647 45.0 8 1817 1817-08-01 22:00:00 blood 824 #> 1648 45.0 8 1817 1817-08-01 22:00:00 bone 824 #> 1649 36.7 9 1817 1817-09-01 08:00:01 blood 825 #> 1650 36.7 9 1817 1817-09-01 08:00:01 bone 825 #> 1651 25.6 10 1817 1817-10-01 18:00:00 blood 826 #> 1652 25.6 10 1817 1817-10-01 18:00:00 bone 826 #> 1653 28.9 11 1817 1817-11-01 04:00:00 blood 827 #> 1654 28.9 11 1817 1817-11-01 04:00:00 bone 827 #> 1655 28.4 12 1817 1817-12-01 14:00:01 blood 828 #> 1656 28.4 12 1817 1817-12-01 14:00:01 bone 828 #> 1657 34.9 1 1818 1818-01-01 00:00:00 blood 829 #> 1658 34.9 1 1818 1818-01-01 00:00:00 bone 829 #> 1659 22.4 2 1818 1818-01-31 10:00:00 blood 830 #> 1660 22.4 2 1818 1818-01-31 10:00:00 bone 830 #> 1661 25.4 3 1818 1818-03-02 20:00:01 blood 831 #> 1662 25.4 3 1818 1818-03-02 20:00:01 bone 831 #> 1663 34.5 4 1818 1818-04-02 06:00:00 blood 832 #> 1664 34.5 4 1818 1818-04-02 06:00:00 bone 832 #> 1665 53.1 5 1818 1818-05-02 16:00:00 blood 833 #> 1666 53.1 5 1818 1818-05-02 16:00:00 bone 833 #> 1667 36.4 6 1818 1818-06-02 02:00:01 blood 834 #> 1668 36.4 6 1818 1818-06-02 02:00:01 bone 834 #> 1669 28.0 7 1818 1818-07-02 12:00:00 blood 835 #> 1670 28.0 7 1818 1818-07-02 12:00:00 bone 835 #> 1671 31.5 8 1818 1818-08-01 22:00:00 blood 836 #> 1672 31.5 8 1818 1818-08-01 22:00:00 bone 836 #> 1673 26.1 9 1818 1818-09-01 08:00:01 blood 837 #> 1674 26.1 9 1818 1818-09-01 08:00:01 bone 837 #> 1675 31.7 10 1818 1818-10-01 18:00:00 blood 838 #> 1676 31.7 10 1818 1818-10-01 18:00:00 bone 838 #> 1677 10.9 11 1818 1818-11-01 04:00:00 blood 839 #> 1678 10.9 11 1818 1818-11-01 04:00:00 bone 839 #> 1679 25.8 12 1818 1818-12-01 14:00:01 blood 840 #> 1680 25.8 12 1818 1818-12-01 14:00:01 bone 840 #> 1681 32.5 1 1819 1819-01-01 00:00:00 blood 841 #> 1682 32.5 1 1819 1819-01-01 00:00:00 bone 841 #> 1683 20.7 2 1819 1819-01-31 10:00:00 blood 842 #> 1684 20.7 2 1819 1819-01-31 10:00:00 bone 842 #> 1685 3.7 3 1819 1819-03-02 20:00:01 blood 843 #> 1686 3.7 3 1819 1819-03-02 20:00:01 bone 843 #> 1687 20.2 4 1819 1819-04-02 06:00:00 blood 844 #> 1688 20.2 4 1819 1819-04-02 06:00:00 bone 844 #> 1689 19.6 5 1819 1819-05-02 16:00:00 blood 845 #> 1690 19.6 5 1819 1819-05-02 16:00:00 bone 845 #> 1691 35.0 6 1819 1819-06-02 02:00:01 blood 846 #> 1692 35.0 6 1819 1819-06-02 02:00:01 bone 846 #> 1693 31.4 7 1819 1819-07-02 12:00:00 blood 847 #> 1694 31.4 7 1819 1819-07-02 12:00:00 bone 847 #> 1695 26.1 8 1819 1819-08-01 22:00:00 blood 848 #> 1696 26.1 8 1819 1819-08-01 22:00:00 bone 848 #> 1697 14.9 9 1819 1819-09-01 08:00:01 blood 849 #> 1698 14.9 9 1819 1819-09-01 08:00:01 bone 849 #> 1699 27.5 10 1819 1819-10-01 18:00:00 blood 850 #> 1700 27.5 10 1819 1819-10-01 18:00:00 bone 850 #> 1701 25.1 11 1819 1819-11-01 04:00:00 blood 851 #> 1702 25.1 11 1819 1819-11-01 04:00:00 bone 851 #> 1703 30.6 12 1819 1819-12-01 14:00:01 blood 852 #> 1704 30.6 12 1819 1819-12-01 14:00:01 bone 852 #> 1705 19.2 1 1820 1820-01-01 00:00:00 blood 853 #> 1706 19.2 1 1820 1820-01-01 00:00:00 bone 853 #> 1707 26.6 2 1820 1820-01-31 12:00:00 blood 854 #> 1708 26.6 2 1820 1820-01-31 12:00:00 bone 854 #> 1709 4.5 3 1820 1820-03-02 00:00:01 blood 855 #> 1710 4.5 3 1820 1820-03-02 00:00:01 bone 855 #> 1711 19.4 4 1820 1820-04-01 12:00:00 blood 856 #> 1712 19.4 4 1820 1820-04-01 12:00:00 bone 856 #> 1713 29.3 5 1820 1820-05-02 00:00:00 blood 857 #> 1714 29.3 5 1820 1820-05-02 00:00:00 bone 857 #> 1715 10.8 6 1820 1820-06-01 12:00:01 blood 858 #> 1716 10.8 6 1820 1820-06-01 12:00:01 bone 858 #> 1717 20.6 7 1820 1820-07-02 00:00:00 blood 859 #> 1718 20.6 7 1820 1820-07-02 00:00:00 bone 859 #> 1719 25.9 8 1820 1820-08-01 12:00:00 blood 860 #> 1720 25.9 8 1820 1820-08-01 12:00:00 bone 860 #> 1721 5.2 9 1820 1820-09-01 00:00:01 blood 861 #> 1722 5.2 9 1820 1820-09-01 00:00:01 bone 861 #> 1723 9.0 10 1820 1820-10-01 12:00:00 blood 862 #> 1724 9.0 10 1820 1820-10-01 12:00:00 bone 862 #> 1725 7.9 11 1820 1820-11-01 00:00:00 blood 863 #> 1726 7.9 11 1820 1820-11-01 00:00:00 bone 863 #> 1727 9.7 12 1820 1820-12-01 12:00:01 blood 864 #> 1728 9.7 12 1820 1820-12-01 12:00:01 bone 864 #> 1729 21.5 1 1821 1821-01-01 00:00:00 blood 865 #> 1730 21.5 1 1821 1821-01-01 00:00:00 bone 865 #> 1731 4.3 2 1821 1821-01-31 10:00:00 blood 866 #> 1732 4.3 2 1821 1821-01-31 10:00:00 bone 866 #> 1733 5.7 3 1821 1821-03-02 20:00:01 blood 867 #> 1734 5.7 3 1821 1821-03-02 20:00:01 bone 867 #> 1735 9.2 4 1821 1821-04-02 06:00:00 blood 868 #> 1736 9.2 4 1821 1821-04-02 06:00:00 bone 868 #> 1737 1.7 5 1821 1821-05-02 16:00:00 blood 869 #> 1738 1.7 5 1821 1821-05-02 16:00:00 bone 869 #> 1739 1.8 6 1821 1821-06-02 02:00:01 blood 870 #> 1740 1.8 6 1821 1821-06-02 02:00:01 bone 870 #> 1741 2.5 7 1821 1821-07-02 12:00:00 blood 871 #> 1742 2.5 7 1821 1821-07-02 12:00:00 bone 871 #> 1743 4.8 8 1821 1821-08-01 22:00:00 blood 872 #> 1744 4.8 8 1821 1821-08-01 22:00:00 bone 872 #> 1745 4.4 9 1821 1821-09-01 08:00:01 blood 873 #> 1746 4.4 9 1821 1821-09-01 08:00:01 bone 873 #> 1747 18.8 10 1821 1821-10-01 18:00:00 blood 874 #> 1748 18.8 10 1821 1821-10-01 18:00:00 bone 874 #> 1749 4.4 11 1821 1821-11-01 04:00:00 blood 875 #> 1750 4.4 11 1821 1821-11-01 04:00:00 bone 875 #> 1751 0.0 12 1821 1821-12-01 14:00:01 blood 876 #> 1752 0.0 12 1821 1821-12-01 14:00:01 bone 876 #> 1753 0.0 1 1822 1822-01-01 00:00:00 blood 877 #> 1754 0.0 1 1822 1822-01-01 00:00:00 bone 877 #> 1755 0.9 2 1822 1822-01-31 10:00:00 blood 878 #> 1756 0.9 2 1822 1822-01-31 10:00:00 bone 878 #> 1757 16.1 3 1822 1822-03-02 20:00:01 blood 879 #> 1758 16.1 3 1822 1822-03-02 20:00:01 bone 879 #> 1759 13.5 4 1822 1822-04-02 06:00:00 blood 880 #> 1760 13.5 4 1822 1822-04-02 06:00:00 bone 880 #> 1761 1.5 5 1822 1822-05-02 16:00:00 blood 881 #> 1762 1.5 5 1822 1822-05-02 16:00:00 bone 881 #> 1763 5.6 6 1822 1822-06-02 02:00:01 blood 882 #> 1764 5.6 6 1822 1822-06-02 02:00:01 bone 882 #> 1765 7.9 7 1822 1822-07-02 12:00:00 blood 883 #> 1766 7.9 7 1822 1822-07-02 12:00:00 bone 883 #> 1767 2.1 8 1822 1822-08-01 22:00:00 blood 884 #> 1768 2.1 8 1822 1822-08-01 22:00:00 bone 884 #> 1769 0.0 9 1822 1822-09-01 08:00:01 blood 885 #> 1770 0.0 9 1822 1822-09-01 08:00:01 bone 885 #> 1771 0.4 10 1822 1822-10-01 18:00:00 blood 886 #> 1772 0.4 10 1822 1822-10-01 18:00:00 bone 886 #> 1773 0.0 11 1822 1822-11-01 04:00:00 blood 887 #> 1774 0.0 11 1822 1822-11-01 04:00:00 bone 887 #> 1775 0.0 12 1822 1822-12-01 14:00:01 blood 888 #> 1776 0.0 12 1822 1822-12-01 14:00:01 bone 888 #> 1777 0.0 1 1823 1823-01-01 00:00:00 blood 889 #> 1778 0.0 1 1823 1823-01-01 00:00:00 bone 889 #> 1779 0.0 2 1823 1823-01-31 10:00:00 blood 890 #> 1780 0.0 2 1823 1823-01-31 10:00:00 bone 890 #> 1781 0.6 3 1823 1823-03-02 20:00:01 blood 891 #> 1782 0.6 3 1823 1823-03-02 20:00:01 bone 891 #> 1783 0.0 4 1823 1823-04-02 06:00:00 blood 892 #> 1784 0.0 4 1823 1823-04-02 06:00:00 bone 892 #> 1785 0.0 5 1823 1823-05-02 16:00:00 blood 893 #> 1786 0.0 5 1823 1823-05-02 16:00:00 bone 893 #> 1787 0.0 6 1823 1823-06-02 02:00:01 blood 894 #> 1788 0.0 6 1823 1823-06-02 02:00:01 bone 894 #> 1789 0.5 7 1823 1823-07-02 12:00:00 blood 895 #> 1790 0.5 7 1823 1823-07-02 12:00:00 bone 895 #> 1791 0.0 8 1823 1823-08-01 22:00:00 blood 896 #> 1792 0.0 8 1823 1823-08-01 22:00:00 bone 896 #> 1793 0.0 9 1823 1823-09-01 08:00:01 blood 897 #> 1794 0.0 9 1823 1823-09-01 08:00:01 bone 897 #> 1795 0.0 10 1823 1823-10-01 18:00:00 blood 898 #> 1796 0.0 10 1823 1823-10-01 18:00:00 bone 898 #> 1797 0.0 11 1823 1823-11-01 04:00:00 blood 899 #> 1798 0.0 11 1823 1823-11-01 04:00:00 bone 899 #> 1799 20.4 12 1823 1823-12-01 14:00:01 blood 900 #> 1800 20.4 12 1823 1823-12-01 14:00:01 bone 900 #> 1801 21.6 1 1824 1824-01-01 00:00:00 blood 901 #> 1802 21.6 1 1824 1824-01-01 00:00:00 bone 901 #> 1803 10.8 2 1824 1824-01-31 12:00:00 blood 902 #> 1804 10.8 2 1824 1824-01-31 12:00:00 bone 902 #> 1805 0.0 3 1824 1824-03-02 00:00:01 blood 903 #> 1806 0.0 3 1824 1824-03-02 00:00:01 bone 903 #> 1807 19.4 4 1824 1824-04-01 12:00:00 blood 904 #> 1808 19.4 4 1824 1824-04-01 12:00:00 bone 904 #> 1809 2.8 5 1824 1824-05-02 00:00:00 blood 905 #> 1810 2.8 5 1824 1824-05-02 00:00:00 bone 905 #> 1811 0.0 6 1824 1824-06-01 12:00:01 blood 906 #> 1812 0.0 6 1824 1824-06-01 12:00:01 bone 906 #> 1813 0.0 7 1824 1824-07-02 00:00:00 blood 907 #> 1814 0.0 7 1824 1824-07-02 00:00:00 bone 907 #> 1815 1.4 8 1824 1824-08-01 12:00:00 blood 908 #> 1816 1.4 8 1824 1824-08-01 12:00:00 bone 908 #> 1817 20.5 9 1824 1824-09-01 00:00:01 blood 909 #> 1818 20.5 9 1824 1824-09-01 00:00:01 bone 909 #> 1819 25.2 10 1824 1824-10-01 12:00:00 blood 910 #> 1820 25.2 10 1824 1824-10-01 12:00:00 bone 910 #> 1821 0.0 11 1824 1824-11-01 00:00:00 blood 911 #> 1822 0.0 11 1824 1824-11-01 00:00:00 bone 911 #> 1823 0.8 12 1824 1824-12-01 12:00:01 blood 912 #> 1824 0.8 12 1824 1824-12-01 12:00:01 bone 912 #> 1825 5.0 1 1825 1825-01-01 00:00:00 blood 913 #> 1826 5.0 1 1825 1825-01-01 00:00:00 bone 913 #> 1827 15.5 2 1825 1825-01-31 10:00:00 blood 914 #> 1828 15.5 2 1825 1825-01-31 10:00:00 bone 914 #> 1829 22.4 3 1825 1825-03-02 20:00:01 blood 915 #> 1830 22.4 3 1825 1825-03-02 20:00:01 bone 915 #> 1831 3.8 4 1825 1825-04-02 06:00:00 blood 916 #> 1832 3.8 4 1825 1825-04-02 06:00:00 bone 916 #> 1833 15.4 5 1825 1825-05-02 16:00:00 blood 917 #> 1834 15.4 5 1825 1825-05-02 16:00:00 bone 917 #> 1835 15.4 6 1825 1825-06-02 02:00:01 blood 918 #> 1836 15.4 6 1825 1825-06-02 02:00:01 bone 918 #> 1837 30.9 7 1825 1825-07-02 12:00:00 blood 919 #> 1838 30.9 7 1825 1825-07-02 12:00:00 bone 919 #> 1839 25.4 8 1825 1825-08-01 22:00:00 blood 920 #> 1840 25.4 8 1825 1825-08-01 22:00:00 bone 920 #> 1841 15.7 9 1825 1825-09-01 08:00:01 blood 921 #> 1842 15.7 9 1825 1825-09-01 08:00:01 bone 921 #> 1843 15.6 10 1825 1825-10-01 18:00:00 blood 922 #> 1844 15.6 10 1825 1825-10-01 18:00:00 bone 922 #> 1845 11.7 11 1825 1825-11-01 04:00:00 blood 923 #> 1846 11.7 11 1825 1825-11-01 04:00:00 bone 923 #> 1847 22.0 12 1825 1825-12-01 14:00:01 blood 924 #> 1848 22.0 12 1825 1825-12-01 14:00:01 bone 924 #> 1849 17.7 1 1826 1826-01-01 00:00:00 blood 925 #> 1850 17.7 1 1826 1826-01-01 00:00:00 bone 925 #> 1851 18.2 2 1826 1826-01-31 10:00:00 blood 926 #> 1852 18.2 2 1826 1826-01-31 10:00:00 bone 926 #> 1853 36.7 3 1826 1826-03-02 20:00:01 blood 927 #> 1854 36.7 3 1826 1826-03-02 20:00:01 bone 927 #> 1855 24.0 4 1826 1826-04-02 06:00:00 blood 928 #> 1856 24.0 4 1826 1826-04-02 06:00:00 bone 928 #> 1857 32.4 5 1826 1826-05-02 16:00:00 blood 929 #> 1858 32.4 5 1826 1826-05-02 16:00:00 bone 929 #> 1859 37.1 6 1826 1826-06-02 02:00:01 blood 930 #> 1860 37.1 6 1826 1826-06-02 02:00:01 bone 930 #> 1861 52.5 7 1826 1826-07-02 12:00:00 blood 931 #> 1862 52.5 7 1826 1826-07-02 12:00:00 bone 931 #> 1863 39.6 8 1826 1826-08-01 22:00:00 blood 932 #> 1864 39.6 8 1826 1826-08-01 22:00:00 bone 932 #> 1865 18.9 9 1826 1826-09-01 08:00:01 blood 933 #> 1866 18.9 9 1826 1826-09-01 08:00:01 bone 933 #> 1867 50.6 10 1826 1826-10-01 18:00:00 blood 934 #> 1868 50.6 10 1826 1826-10-01 18:00:00 bone 934 #> 1869 39.5 11 1826 1826-11-01 04:00:00 blood 935 #> 1870 39.5 11 1826 1826-11-01 04:00:00 bone 935 #> 1871 68.1 12 1826 1826-12-01 14:00:01 blood 936 #> 1872 68.1 12 1826 1826-12-01 14:00:01 bone 936 #> 1873 34.6 1 1827 1827-01-01 00:00:00 blood 937 #> 1874 34.6 1 1827 1827-01-01 00:00:00 bone 937 #> 1875 47.4 2 1827 1827-01-31 10:00:00 blood 938 #> 1876 47.4 2 1827 1827-01-31 10:00:00 bone 938 #> 1877 57.8 3 1827 1827-03-02 20:00:01 blood 939 #> 1878 57.8 3 1827 1827-03-02 20:00:01 bone 939 #> 1879 46.0 4 1827 1827-04-02 06:00:00 blood 940 #> 1880 46.0 4 1827 1827-04-02 06:00:00 bone 940 #> 1881 56.3 5 1827 1827-05-02 16:00:00 blood 941 #> 1882 56.3 5 1827 1827-05-02 16:00:00 bone 941 #> 1883 56.7 6 1827 1827-06-02 02:00:01 blood 942 #> 1884 56.7 6 1827 1827-06-02 02:00:01 bone 942 #> 1885 42.9 7 1827 1827-07-02 12:00:00 blood 943 #> 1886 42.9 7 1827 1827-07-02 12:00:00 bone 943 #> 1887 53.7 8 1827 1827-08-01 22:00:00 blood 944 #> 1888 53.7 8 1827 1827-08-01 22:00:00 bone 944 #> 1889 49.6 9 1827 1827-09-01 08:00:01 blood 945 #> 1890 49.6 9 1827 1827-09-01 08:00:01 bone 945 #> 1891 57.2 10 1827 1827-10-01 18:00:00 blood 946 #> 1892 57.2 10 1827 1827-10-01 18:00:00 bone 946 #> 1893 48.2 11 1827 1827-11-01 04:00:00 blood 947 #> 1894 48.2 11 1827 1827-11-01 04:00:00 bone 947 #> 1895 46.1 12 1827 1827-12-01 14:00:01 blood 948 #> 1896 46.1 12 1827 1827-12-01 14:00:01 bone 948 #> 1897 52.8 1 1828 1828-01-01 00:00:00 blood 949 #> 1898 52.8 1 1828 1828-01-01 00:00:00 bone 949 #> 1899 64.4 2 1828 1828-01-31 12:00:00 blood 950 #> 1900 64.4 2 1828 1828-01-31 12:00:00 bone 950 #> 1901 65.0 3 1828 1828-03-02 00:00:01 blood 951 #> 1902 65.0 3 1828 1828-03-02 00:00:01 bone 951 #> 1903 61.1 4 1828 1828-04-01 12:00:00 blood 952 #> 1904 61.1 4 1828 1828-04-01 12:00:00 bone 952 #> 1905 89.1 5 1828 1828-05-02 00:00:00 blood 953 #> 1906 89.1 5 1828 1828-05-02 00:00:00 bone 953 #> 1907 98.0 6 1828 1828-06-01 12:00:01 blood 954 #> 1908 98.0 6 1828 1828-06-01 12:00:01 bone 954 #> 1909 54.3 7 1828 1828-07-02 00:00:00 blood 955 #> 1910 54.3 7 1828 1828-07-02 00:00:00 bone 955 #> 1911 76.4 8 1828 1828-08-01 12:00:00 blood 956 #> 1912 76.4 8 1828 1828-08-01 12:00:00 bone 956 #> 1913 50.4 9 1828 1828-09-01 00:00:01 blood 957 #> 1914 50.4 9 1828 1828-09-01 00:00:01 bone 957 #> 1915 54.7 10 1828 1828-10-01 12:00:00 blood 958 #> 1916 54.7 10 1828 1828-10-01 12:00:00 bone 958 #> 1917 57.0 11 1828 1828-11-01 00:00:00 blood 959 #> 1918 57.0 11 1828 1828-11-01 00:00:00 bone 959 #> 1919 46.6 12 1828 1828-12-01 12:00:01 blood 960 #> 1920 46.6 12 1828 1828-12-01 12:00:01 bone 960 #> 1921 43.0 1 1829 1829-01-01 00:00:00 blood 961 #> 1922 43.0 1 1829 1829-01-01 00:00:00 bone 961 #> 1923 49.4 2 1829 1829-01-31 10:00:00 blood 962 #> 1924 49.4 2 1829 1829-01-31 10:00:00 bone 962 #> 1925 72.3 3 1829 1829-03-02 20:00:01 blood 963 #> 1926 72.3 3 1829 1829-03-02 20:00:01 bone 963 #> 1927 95.0 4 1829 1829-04-02 06:00:00 blood 964 #> 1928 95.0 4 1829 1829-04-02 06:00:00 bone 964 #> 1929 67.5 5 1829 1829-05-02 16:00:00 blood 965 #> 1930 67.5 5 1829 1829-05-02 16:00:00 bone 965 #> 1931 73.9 6 1829 1829-06-02 02:00:01 blood 966 #> 1932 73.9 6 1829 1829-06-02 02:00:01 bone 966 #> 1933 90.8 7 1829 1829-07-02 12:00:00 blood 967 #> 1934 90.8 7 1829 1829-07-02 12:00:00 bone 967 #> 1935 78.3 8 1829 1829-08-01 22:00:00 blood 968 #> 1936 78.3 8 1829 1829-08-01 22:00:00 bone 968 #> 1937 52.8 9 1829 1829-09-01 08:00:01 blood 969 #> 1938 52.8 9 1829 1829-09-01 08:00:01 bone 969 #> 1939 57.2 10 1829 1829-10-01 18:00:00 blood 970 #> 1940 57.2 10 1829 1829-10-01 18:00:00 bone 970 #> 1941 67.6 11 1829 1829-11-01 04:00:00 blood 971 #> 1942 67.6 11 1829 1829-11-01 04:00:00 bone 971 #> 1943 56.5 12 1829 1829-12-01 14:00:01 blood 972 #> 1944 56.5 12 1829 1829-12-01 14:00:01 bone 972 #> 1945 52.2 1 1830 1830-01-01 00:00:00 blood 973 #> 1946 52.2 1 1830 1830-01-01 00:00:00 bone 973 #> 1947 72.1 2 1830 1830-01-31 10:00:00 blood 974 #> 1948 72.1 2 1830 1830-01-31 10:00:00 bone 974 #> 1949 84.6 3 1830 1830-03-02 20:00:01 blood 975 #> 1950 84.6 3 1830 1830-03-02 20:00:01 bone 975 #> 1951 107.1 4 1830 1830-04-02 06:00:00 blood 976 #> 1952 107.1 4 1830 1830-04-02 06:00:00 bone 976 #> 1953 66.3 5 1830 1830-05-02 16:00:00 blood 977 #> 1954 66.3 5 1830 1830-05-02 16:00:00 bone 977 #> 1955 65.1 6 1830 1830-06-02 02:00:01 blood 978 #> 1956 65.1 6 1830 1830-06-02 02:00:01 bone 978 #> 1957 43.9 7 1830 1830-07-02 12:00:00 blood 979 #> 1958 43.9 7 1830 1830-07-02 12:00:00 bone 979 #> 1959 50.7 8 1830 1830-08-01 22:00:00 blood 980 #> 1960 50.7 8 1830 1830-08-01 22:00:00 bone 980 #> 1961 62.1 9 1830 1830-09-01 08:00:01 blood 981 #> 1962 62.1 9 1830 1830-09-01 08:00:01 bone 981 #> 1963 84.4 10 1830 1830-10-01 18:00:00 blood 982 #> 1964 84.4 10 1830 1830-10-01 18:00:00 bone 982 #> 1965 81.2 11 1830 1830-11-01 04:00:00 blood 983 #> 1966 81.2 11 1830 1830-11-01 04:00:00 bone 983 #> 1967 82.1 12 1830 1830-12-01 14:00:01 blood 984 #> 1968 82.1 12 1830 1830-12-01 14:00:01 bone 984 #> 1969 47.5 1 1831 1831-01-01 00:00:00 blood 985 #> 1970 47.5 1 1831 1831-01-01 00:00:00 bone 985 #> 1971 50.1 2 1831 1831-01-31 10:00:00 blood 986 #> 1972 50.1 2 1831 1831-01-31 10:00:00 bone 986 #> 1973 93.4 3 1831 1831-03-02 20:00:01 blood 987 #> 1974 93.4 3 1831 1831-03-02 20:00:01 bone 987 #> 1975 54.6 4 1831 1831-04-02 06:00:00 blood 988 #> 1976 54.6 4 1831 1831-04-02 06:00:00 bone 988 #> 1977 38.1 5 1831 1831-05-02 16:00:00 blood 989 #> 1978 38.1 5 1831 1831-05-02 16:00:00 bone 989 #> 1979 33.4 6 1831 1831-06-02 02:00:01 blood 990 #> 1980 33.4 6 1831 1831-06-02 02:00:01 bone 990 #> 1981 45.2 7 1831 1831-07-02 12:00:00 blood 991 #> 1982 45.2 7 1831 1831-07-02 12:00:00 bone 991 #> 1983 54.9 8 1831 1831-08-01 22:00:00 blood 992 #> 1984 54.9 8 1831 1831-08-01 22:00:00 bone 992 #> 1985 37.9 9 1831 1831-09-01 08:00:01 blood 993 #> 1986 37.9 9 1831 1831-09-01 08:00:01 bone 993 #> 1987 46.2 10 1831 1831-10-01 18:00:00 blood 994 #> 1988 46.2 10 1831 1831-10-01 18:00:00 bone 994 #> 1989 43.5 11 1831 1831-11-01 04:00:00 blood 995 #> 1990 43.5 11 1831 1831-11-01 04:00:00 bone 995 #> 1991 28.9 12 1831 1831-12-01 14:00:01 blood 996 #> 1992 28.9 12 1831 1831-12-01 14:00:01 bone 996 #> 1993 30.9 1 1832 1832-01-01 00:00:00 blood 997 #> 1994 30.9 1 1832 1832-01-01 00:00:00 bone 997 #> 1995 55.5 2 1832 1832-01-31 12:00:00 blood 998 #> 1996 55.5 2 1832 1832-01-31 12:00:00 bone 998 #> 1997 55.1 3 1832 1832-03-02 00:00:01 blood 999 #> 1998 55.1 3 1832 1832-03-02 00:00:01 bone 999 #> 1999 26.9 4 1832 1832-04-01 12:00:00 blood 1000 #> 2000 26.9 4 1832 1832-04-01 12:00:00 bone 1000 #> 2001 41.3 5 1832 1832-05-02 00:00:00 blood 1001 #> 2002 41.3 5 1832 1832-05-02 00:00:00 bone 1001 #> 2003 26.7 6 1832 1832-06-01 12:00:01 blood 1002 #> 2004 26.7 6 1832 1832-06-01 12:00:01 bone 1002 #> 2005 13.9 7 1832 1832-07-02 00:00:00 blood 1003 #> 2006 13.9 7 1832 1832-07-02 00:00:00 bone 1003 #> 2007 8.9 8 1832 1832-08-01 12:00:00 blood 1004 #> 2008 8.9 8 1832 1832-08-01 12:00:00 bone 1004 #> 2009 8.2 9 1832 1832-09-01 00:00:01 blood 1005 #> 2010 8.2 9 1832 1832-09-01 00:00:01 bone 1005 #> 2011 21.1 10 1832 1832-10-01 12:00:00 blood 1006 #> 2012 21.1 10 1832 1832-10-01 12:00:00 bone 1006 #> 2013 14.3 11 1832 1832-11-01 00:00:00 blood 1007 #> 2014 14.3 11 1832 1832-11-01 00:00:00 bone 1007 #> 2015 27.5 12 1832 1832-12-01 12:00:01 blood 1008 #> 2016 27.5 12 1832 1832-12-01 12:00:01 bone 1008 #> 2017 11.3 1 1833 1833-01-01 00:00:00 blood 1009 #> 2018 11.3 1 1833 1833-01-01 00:00:00 bone 1009 #> 2019 14.9 2 1833 1833-01-31 10:00:00 blood 1010 #> 2020 14.9 2 1833 1833-01-31 10:00:00 bone 1010 #> 2021 11.8 3 1833 1833-03-02 20:00:01 blood 1011 #> 2022 11.8 3 1833 1833-03-02 20:00:01 bone 1011 #> 2023 2.8 4 1833 1833-04-02 06:00:00 blood 1012 #> 2024 2.8 4 1833 1833-04-02 06:00:00 bone 1012 #> 2025 12.9 5 1833 1833-05-02 16:00:00 blood 1013 #> 2026 12.9 5 1833 1833-05-02 16:00:00 bone 1013 #> 2027 1.0 6 1833 1833-06-02 02:00:01 blood 1014 #> 2028 1.0 6 1833 1833-06-02 02:00:01 bone 1014 #> 2029 7.0 7 1833 1833-07-02 12:00:00 blood 1015 #> 2030 7.0 7 1833 1833-07-02 12:00:00 bone 1015 #> 2031 5.7 8 1833 1833-08-01 22:00:00 blood 1016 #> 2032 5.7 8 1833 1833-08-01 22:00:00 bone 1016 #> 2033 11.6 9 1833 1833-09-01 08:00:01 blood 1017 #> 2034 11.6 9 1833 1833-09-01 08:00:01 bone 1017 #> 2035 7.5 10 1833 1833-10-01 18:00:00 blood 1018 #> 2036 7.5 10 1833 1833-10-01 18:00:00 bone 1018 #> 2037 5.9 11 1833 1833-11-01 04:00:00 blood 1019 #> 2038 5.9 11 1833 1833-11-01 04:00:00 bone 1019 #> 2039 9.9 12 1833 1833-12-01 14:00:01 blood 1020 #> 2040 9.9 12 1833 1833-12-01 14:00:01 bone 1020 #> 2041 4.9 1 1834 1834-01-01 00:00:00 blood 1021 #> 2042 4.9 1 1834 1834-01-01 00:00:00 bone 1021 #> 2043 18.1 2 1834 1834-01-31 10:00:00 blood 1022 #> 2044 18.1 2 1834 1834-01-31 10:00:00 bone 1022 #> 2045 3.9 3 1834 1834-03-02 20:00:01 blood 1023 #> 2046 3.9 3 1834 1834-03-02 20:00:01 bone 1023 #> 2047 1.4 4 1834 1834-04-02 06:00:00 blood 1024 #> 2048 1.4 4 1834 1834-04-02 06:00:00 bone 1024 #> 2049 8.8 5 1834 1834-05-02 16:00:00 blood 1025 #> 2050 8.8 5 1834 1834-05-02 16:00:00 bone 1025 #> 2051 7.8 6 1834 1834-06-02 02:00:01 blood 1026 #> 2052 7.8 6 1834 1834-06-02 02:00:01 bone 1026 #> 2053 8.7 7 1834 1834-07-02 12:00:00 blood 1027 #> 2054 8.7 7 1834 1834-07-02 12:00:00 bone 1027 #> 2055 4.0 8 1834 1834-08-01 22:00:00 blood 1028 #> 2056 4.0 8 1834 1834-08-01 22:00:00 bone 1028 #> 2057 11.5 9 1834 1834-09-01 08:00:01 blood 1029 #> 2058 11.5 9 1834 1834-09-01 08:00:01 bone 1029 #> 2059 24.8 10 1834 1834-10-01 18:00:00 blood 1030 #> 2060 24.8 10 1834 1834-10-01 18:00:00 bone 1030 #> 2061 30.5 11 1834 1834-11-01 04:00:00 blood 1031 #> 2062 30.5 11 1834 1834-11-01 04:00:00 bone 1031 #> 2063 34.5 12 1834 1834-12-01 14:00:01 blood 1032 #> 2064 34.5 12 1834 1834-12-01 14:00:01 bone 1032 #> 2065 7.5 1 1835 1835-01-01 00:00:00 blood 1033 #> 2066 7.5 1 1835 1835-01-01 00:00:00 bone 1033 #> 2067 24.5 2 1835 1835-01-31 10:00:00 blood 1034 #> 2068 24.5 2 1835 1835-01-31 10:00:00 bone 1034 #> 2069 19.7 3 1835 1835-03-02 20:00:01 blood 1035 #> 2070 19.7 3 1835 1835-03-02 20:00:01 bone 1035 #> 2071 61.5 4 1835 1835-04-02 06:00:00 blood 1036 #> 2072 61.5 4 1835 1835-04-02 06:00:00 bone 1036 #> 2073 43.6 5 1835 1835-05-02 16:00:00 blood 1037 #> 2074 43.6 5 1835 1835-05-02 16:00:00 bone 1037 #> 2075 33.2 6 1835 1835-06-02 02:00:01 blood 1038 #> 2076 33.2 6 1835 1835-06-02 02:00:01 bone 1038 #> 2077 59.8 7 1835 1835-07-02 12:00:00 blood 1039 #> 2078 59.8 7 1835 1835-07-02 12:00:00 bone 1039 #> 2079 59.0 8 1835 1835-08-01 22:00:00 blood 1040 #> 2080 59.0 8 1835 1835-08-01 22:00:00 bone 1040 #> 2081 100.8 9 1835 1835-09-01 08:00:01 blood 1041 #> 2082 100.8 9 1835 1835-09-01 08:00:01 bone 1041 #> 2083 95.2 10 1835 1835-10-01 18:00:00 blood 1042 #> 2084 95.2 10 1835 1835-10-01 18:00:00 bone 1042 #> 2085 100.0 11 1835 1835-11-01 04:00:00 blood 1043 #> 2086 100.0 11 1835 1835-11-01 04:00:00 bone 1043 #> 2087 77.5 12 1835 1835-12-01 14:00:01 blood 1044 #> 2088 77.5 12 1835 1835-12-01 14:00:01 bone 1044 #> 2089 88.6 1 1836 1836-01-01 00:00:00 blood 1045 #> 2090 88.6 1 1836 1836-01-01 00:00:00 bone 1045 #> 2091 107.6 2 1836 1836-01-31 12:00:00 blood 1046 #> 2092 107.6 2 1836 1836-01-31 12:00:00 bone 1046 #> 2093 98.1 3 1836 1836-03-02 00:00:01 blood 1047 #> 2094 98.1 3 1836 1836-03-02 00:00:01 bone 1047 #> 2095 142.9 4 1836 1836-04-01 12:00:00 blood 1048 #> 2096 142.9 4 1836 1836-04-01 12:00:00 bone 1048 #> 2097 111.4 5 1836 1836-05-02 00:00:00 blood 1049 #> 2098 111.4 5 1836 1836-05-02 00:00:00 bone 1049 #> 2099 124.7 6 1836 1836-06-01 12:00:01 blood 1050 #> 2100 124.7 6 1836 1836-06-01 12:00:01 bone 1050 #> 2101 116.7 7 1836 1836-07-02 00:00:00 blood 1051 #> 2102 116.7 7 1836 1836-07-02 00:00:00 bone 1051 #> 2103 107.8 8 1836 1836-08-01 12:00:00 blood 1052 #> 2104 107.8 8 1836 1836-08-01 12:00:00 bone 1052 #> 2105 95.1 9 1836 1836-09-01 00:00:01 blood 1053 #> 2106 95.1 9 1836 1836-09-01 00:00:01 bone 1053 #> 2107 137.4 10 1836 1836-10-01 12:00:00 blood 1054 #> 2108 137.4 10 1836 1836-10-01 12:00:00 bone 1054 #> 2109 120.9 11 1836 1836-11-01 00:00:00 blood 1055 #> 2110 120.9 11 1836 1836-11-01 00:00:00 bone 1055 #> 2111 206.2 12 1836 1836-12-01 12:00:01 blood 1056 #> 2112 206.2 12 1836 1836-12-01 12:00:01 bone 1056 #> 2113 188.0 1 1837 1837-01-01 00:00:00 blood 1057 #> 2114 188.0 1 1837 1837-01-01 00:00:00 bone 1057 #> 2115 175.6 2 1837 1837-01-31 10:00:00 blood 1058 #> 2116 175.6 2 1837 1837-01-31 10:00:00 bone 1058 #> 2117 134.6 3 1837 1837-03-02 20:00:01 blood 1059 #> 2118 134.6 3 1837 1837-03-02 20:00:01 bone 1059 #> 2119 138.2 4 1837 1837-04-02 06:00:00 blood 1060 #> 2120 138.2 4 1837 1837-04-02 06:00:00 bone 1060 #> 2121 111.3 5 1837 1837-05-02 16:00:00 blood 1061 #> 2122 111.3 5 1837 1837-05-02 16:00:00 bone 1061 #> 2123 158.0 6 1837 1837-06-02 02:00:01 blood 1062 #> 2124 158.0 6 1837 1837-06-02 02:00:01 bone 1062 #> 2125 162.8 7 1837 1837-07-02 12:00:00 blood 1063 #> 2126 162.8 7 1837 1837-07-02 12:00:00 bone 1063 #> 2127 134.0 8 1837 1837-08-01 22:00:00 blood 1064 #> 2128 134.0 8 1837 1837-08-01 22:00:00 bone 1064 #> 2129 96.3 9 1837 1837-09-01 08:00:01 blood 1065 #> 2130 96.3 9 1837 1837-09-01 08:00:01 bone 1065 #> 2131 123.7 10 1837 1837-10-01 18:00:00 blood 1066 #> 2132 123.7 10 1837 1837-10-01 18:00:00 bone 1066 #> 2133 107.0 11 1837 1837-11-01 04:00:00 blood 1067 #> 2134 107.0 11 1837 1837-11-01 04:00:00 bone 1067 #> 2135 129.8 12 1837 1837-12-01 14:00:01 blood 1068 #> 2136 129.8 12 1837 1837-12-01 14:00:01 bone 1068 #> 2137 144.9 1 1838 1838-01-01 00:00:00 blood 1069 #> 2138 144.9 1 1838 1838-01-01 00:00:00 bone 1069 #> 2139 84.8 2 1838 1838-01-31 10:00:00 blood 1070 #> 2140 84.8 2 1838 1838-01-31 10:00:00 bone 1070 #> 2141 140.8 3 1838 1838-03-02 20:00:01 blood 1071 #> 2142 140.8 3 1838 1838-03-02 20:00:01 bone 1071 #> 2143 126.6 4 1838 1838-04-02 06:00:00 blood 1072 #> 2144 126.6 4 1838 1838-04-02 06:00:00 bone 1072 #> 2145 137.6 5 1838 1838-05-02 16:00:00 blood 1073 #> 2146 137.6 5 1838 1838-05-02 16:00:00 bone 1073 #> 2147 94.5 6 1838 1838-06-02 02:00:01 blood 1074 #> 2148 94.5 6 1838 1838-06-02 02:00:01 bone 1074 #> 2149 108.2 7 1838 1838-07-02 12:00:00 blood 1075 #> 2150 108.2 7 1838 1838-07-02 12:00:00 bone 1075 #> 2151 78.8 8 1838 1838-08-01 22:00:00 blood 1076 #> 2152 78.8 8 1838 1838-08-01 22:00:00 bone 1076 #> 2153 73.6 9 1838 1838-09-01 08:00:01 blood 1077 #> 2154 73.6 9 1838 1838-09-01 08:00:01 bone 1077 #> 2155 90.8 10 1838 1838-10-01 18:00:00 blood 1078 #> 2156 90.8 10 1838 1838-10-01 18:00:00 bone 1078 #> 2157 77.4 11 1838 1838-11-01 04:00:00 blood 1079 #> 2158 77.4 11 1838 1838-11-01 04:00:00 bone 1079 #> 2159 79.8 12 1838 1838-12-01 14:00:01 blood 1080 #> 2160 79.8 12 1838 1838-12-01 14:00:01 bone 1080 #> 2161 107.6 1 1839 1839-01-01 00:00:00 blood 1081 #> 2162 107.6 1 1839 1839-01-01 00:00:00 bone 1081 #> 2163 102.5 2 1839 1839-01-31 10:00:00 blood 1082 #> 2164 102.5 2 1839 1839-01-31 10:00:00 bone 1082 #> 2165 77.7 3 1839 1839-03-02 20:00:01 blood 1083 #> 2166 77.7 3 1839 1839-03-02 20:00:01 bone 1083 #> 2167 61.8 4 1839 1839-04-02 06:00:00 blood 1084 #> 2168 61.8 4 1839 1839-04-02 06:00:00 bone 1084 #> 2169 53.8 5 1839 1839-05-02 16:00:00 blood 1085 #> 2170 53.8 5 1839 1839-05-02 16:00:00 bone 1085 #> 2171 54.6 6 1839 1839-06-02 02:00:01 blood 1086 #> 2172 54.6 6 1839 1839-06-02 02:00:01 bone 1086 #> 2173 84.7 7 1839 1839-07-02 12:00:00 blood 1087 #> 2174 84.7 7 1839 1839-07-02 12:00:00 bone 1087 #> 2175 131.2 8 1839 1839-08-01 22:00:00 blood 1088 #> 2176 131.2 8 1839 1839-08-01 22:00:00 bone 1088 #> 2177 132.7 9 1839 1839-09-01 08:00:01 blood 1089 #> 2178 132.7 9 1839 1839-09-01 08:00:01 bone 1089 #> 2179 90.8 10 1839 1839-10-01 18:00:00 blood 1090 #> 2180 90.8 10 1839 1839-10-01 18:00:00 bone 1090 #> 2181 68.8 11 1839 1839-11-01 04:00:00 blood 1091 #> 2182 68.8 11 1839 1839-11-01 04:00:00 bone 1091 #> 2183 63.6 12 1839 1839-12-01 14:00:01 blood 1092 #> 2184 63.6 12 1839 1839-12-01 14:00:01 bone 1092 #> 2185 81.2 1 1840 1840-01-01 00:00:00 blood 1093 #> 2186 81.2 1 1840 1840-01-01 00:00:00 bone 1093 #> 2187 87.7 2 1840 1840-01-31 12:00:00 blood 1094 #> 2188 87.7 2 1840 1840-01-31 12:00:00 bone 1094 #> 2189 55.5 3 1840 1840-03-02 00:00:01 blood 1095 #> 2190 55.5 3 1840 1840-03-02 00:00:01 bone 1095 #> 2191 65.9 4 1840 1840-04-01 12:00:00 blood 1096 #> 2192 65.9 4 1840 1840-04-01 12:00:00 bone 1096 #> 2193 69.2 5 1840 1840-05-02 00:00:00 blood 1097 #> 2194 69.2 5 1840 1840-05-02 00:00:00 bone 1097 #> 2195 48.5 6 1840 1840-06-01 12:00:01 blood 1098 #> 2196 48.5 6 1840 1840-06-01 12:00:01 bone 1098 #> 2197 60.7 7 1840 1840-07-02 00:00:00 blood 1099 #> 2198 60.7 7 1840 1840-07-02 00:00:00 bone 1099 #> 2199 57.8 8 1840 1840-08-01 12:00:00 blood 1100 #> 2200 57.8 8 1840 1840-08-01 12:00:00 bone 1100 #> 2201 74.0 9 1840 1840-09-01 00:00:01 blood 1101 #> 2202 74.0 9 1840 1840-09-01 00:00:01 bone 1101 #> 2203 49.8 10 1840 1840-10-01 12:00:00 blood 1102 #> 2204 49.8 10 1840 1840-10-01 12:00:00 bone 1102 #> 2205 54.3 11 1840 1840-11-01 00:00:00 blood 1103 #> 2206 54.3 11 1840 1840-11-01 00:00:00 bone 1103 #> 2207 53.7 12 1840 1840-12-01 12:00:01 blood 1104 #> 2208 53.7 12 1840 1840-12-01 12:00:01 bone 1104 #> 2209 24.0 1 1841 1841-01-01 00:00:00 blood 1105 #> 2210 24.0 1 1841 1841-01-01 00:00:00 bone 1105 #> 2211 29.9 2 1841 1841-01-31 10:00:00 blood 1106 #> 2212 29.9 2 1841 1841-01-31 10:00:00 bone 1106 #> 2213 29.7 3 1841 1841-03-02 20:00:01 blood 1107 #> 2214 29.7 3 1841 1841-03-02 20:00:01 bone 1107 #> 2215 42.6 4 1841 1841-04-02 06:00:00 blood 1108 #> 2216 42.6 4 1841 1841-04-02 06:00:00 bone 1108 #> 2217 67.4 5 1841 1841-05-02 16:00:00 blood 1109 #> 2218 67.4 5 1841 1841-05-02 16:00:00 bone 1109 #> 2219 55.7 6 1841 1841-06-02 02:00:01 blood 1110 #> 2220 55.7 6 1841 1841-06-02 02:00:01 bone 1110 #> 2221 30.8 7 1841 1841-07-02 12:00:00 blood 1111 #> 2222 30.8 7 1841 1841-07-02 12:00:00 bone 1111 #> 2223 39.3 8 1841 1841-08-01 22:00:00 blood 1112 #> 2224 39.3 8 1841 1841-08-01 22:00:00 bone 1112 #> 2225 35.1 9 1841 1841-09-01 08:00:01 blood 1113 #> 2226 35.1 9 1841 1841-09-01 08:00:01 bone 1113 #> 2227 28.5 10 1841 1841-10-01 18:00:00 blood 1114 #> 2228 28.5 10 1841 1841-10-01 18:00:00 bone 1114 #> 2229 19.8 11 1841 1841-11-01 04:00:00 blood 1115 #> 2230 19.8 11 1841 1841-11-01 04:00:00 bone 1115 #> 2231 38.8 12 1841 1841-12-01 14:00:01 blood 1116 #> 2232 38.8 12 1841 1841-12-01 14:00:01 bone 1116 #> 2233 20.4 1 1842 1842-01-01 00:00:00 blood 1117 #> 2234 20.4 1 1842 1842-01-01 00:00:00 bone 1117 #> 2235 22.1 2 1842 1842-01-31 10:00:00 blood 1118 #> 2236 22.1 2 1842 1842-01-31 10:00:00 bone 1118 #> 2237 21.7 3 1842 1842-03-02 20:00:01 blood 1119 #> 2238 21.7 3 1842 1842-03-02 20:00:01 bone 1119 #> 2239 26.9 4 1842 1842-04-02 06:00:00 blood 1120 #> 2240 26.9 4 1842 1842-04-02 06:00:00 bone 1120 #> 2241 24.9 5 1842 1842-05-02 16:00:00 blood 1121 #> 2242 24.9 5 1842 1842-05-02 16:00:00 bone 1121 #> 2243 20.5 6 1842 1842-06-02 02:00:01 blood 1122 #> 2244 20.5 6 1842 1842-06-02 02:00:01 bone 1122 #> 2245 12.6 7 1842 1842-07-02 12:00:00 blood 1123 #> 2246 12.6 7 1842 1842-07-02 12:00:00 bone 1123 #> 2247 26.5 8 1842 1842-08-01 22:00:00 blood 1124 #> 2248 26.5 8 1842 1842-08-01 22:00:00 bone 1124 #> 2249 18.5 9 1842 1842-09-01 08:00:01 blood 1125 #> 2250 18.5 9 1842 1842-09-01 08:00:01 bone 1125 #> 2251 38.1 10 1842 1842-10-01 18:00:00 blood 1126 #> 2252 38.1 10 1842 1842-10-01 18:00:00 bone 1126 #> 2253 40.5 11 1842 1842-11-01 04:00:00 blood 1127 #> 2254 40.5 11 1842 1842-11-01 04:00:00 bone 1127 #> 2255 17.6 12 1842 1842-12-01 14:00:01 blood 1128 #> 2256 17.6 12 1842 1842-12-01 14:00:01 bone 1128 #> 2257 13.3 1 1843 1843-01-01 00:00:00 blood 1129 #> 2258 13.3 1 1843 1843-01-01 00:00:00 bone 1129 #> 2259 3.5 2 1843 1843-01-31 10:00:00 blood 1130 #> 2260 3.5 2 1843 1843-01-31 10:00:00 bone 1130 #> 2261 8.3 3 1843 1843-03-02 20:00:01 blood 1131 #> 2262 8.3 3 1843 1843-03-02 20:00:01 bone 1131 #> 2263 8.8 4 1843 1843-04-02 06:00:00 blood 1132 #> 2264 8.8 4 1843 1843-04-02 06:00:00 bone 1132 #> 2265 21.1 5 1843 1843-05-02 16:00:00 blood 1133 #> 2266 21.1 5 1843 1843-05-02 16:00:00 bone 1133 #> 2267 10.5 6 1843 1843-06-02 02:00:01 blood 1134 #> 2268 10.5 6 1843 1843-06-02 02:00:01 bone 1134 #> 2269 9.5 7 1843 1843-07-02 12:00:00 blood 1135 #> 2270 9.5 7 1843 1843-07-02 12:00:00 bone 1135 #> 2271 11.8 8 1843 1843-08-01 22:00:00 blood 1136 #> 2272 11.8 8 1843 1843-08-01 22:00:00 bone 1136 #> 2273 4.2 9 1843 1843-09-01 08:00:01 blood 1137 #> 2274 4.2 9 1843 1843-09-01 08:00:01 bone 1137 #> 2275 5.3 10 1843 1843-10-01 18:00:00 blood 1138 #> 2276 5.3 10 1843 1843-10-01 18:00:00 bone 1138 #> 2277 19.1 11 1843 1843-11-01 04:00:00 blood 1139 #> 2278 19.1 11 1843 1843-11-01 04:00:00 bone 1139 #> 2279 12.7 12 1843 1843-12-01 14:00:01 blood 1140 #> 2280 12.7 12 1843 1843-12-01 14:00:01 bone 1140 #> 2281 9.4 1 1844 1844-01-01 00:00:00 blood 1141 #> 2282 9.4 1 1844 1844-01-01 00:00:00 bone 1141 #> 2283 14.7 2 1844 1844-01-31 12:00:00 blood 1142 #> 2284 14.7 2 1844 1844-01-31 12:00:00 bone 1142 #> 2285 13.6 3 1844 1844-03-02 00:00:01 blood 1143 #> 2286 13.6 3 1844 1844-03-02 00:00:01 bone 1143 #> 2287 20.8 4 1844 1844-04-01 12:00:00 blood 1144 #> 2288 20.8 4 1844 1844-04-01 12:00:00 bone 1144 #> 2289 12.0 5 1844 1844-05-02 00:00:00 blood 1145 #> 2290 12.0 5 1844 1844-05-02 00:00:00 bone 1145 #> 2291 3.7 6 1844 1844-06-01 12:00:01 blood 1146 #> 2292 3.7 6 1844 1844-06-01 12:00:01 bone 1146 #> 2293 21.2 7 1844 1844-07-02 00:00:00 blood 1147 #> 2294 21.2 7 1844 1844-07-02 00:00:00 bone 1147 #> 2295 23.9 8 1844 1844-08-01 12:00:00 blood 1148 #> 2296 23.9 8 1844 1844-08-01 12:00:00 bone 1148 #> 2297 6.9 9 1844 1844-09-01 00:00:01 blood 1149 #> 2298 6.9 9 1844 1844-09-01 00:00:01 bone 1149 #> 2299 21.5 10 1844 1844-10-01 12:00:00 blood 1150 #> 2300 21.5 10 1844 1844-10-01 12:00:00 bone 1150 #> 2301 10.7 11 1844 1844-11-01 00:00:00 blood 1151 #> 2302 10.7 11 1844 1844-11-01 00:00:00 bone 1151 #> 2303 21.6 12 1844 1844-12-01 12:00:01 blood 1152 #> 2304 21.6 12 1844 1844-12-01 12:00:01 bone 1152 #> 2305 25.7 1 1845 1845-01-01 00:00:00 blood 1153 #> 2306 25.7 1 1845 1845-01-01 00:00:00 bone 1153 #> 2307 43.6 2 1845 1845-01-31 10:00:00 blood 1154 #> 2308 43.6 2 1845 1845-01-31 10:00:00 bone 1154 #> 2309 43.3 3 1845 1845-03-02 20:00:01 blood 1155 #> 2310 43.3 3 1845 1845-03-02 20:00:01 bone 1155 #> 2311 56.9 4 1845 1845-04-02 06:00:00 blood 1156 #> 2312 56.9 4 1845 1845-04-02 06:00:00 bone 1156 #> 2313 47.8 5 1845 1845-05-02 16:00:00 blood 1157 #> 2314 47.8 5 1845 1845-05-02 16:00:00 bone 1157 #> 2315 31.1 6 1845 1845-06-02 02:00:01 blood 1158 #> 2316 31.1 6 1845 1845-06-02 02:00:01 bone 1158 #> 2317 30.6 7 1845 1845-07-02 12:00:00 blood 1159 #> 2318 30.6 7 1845 1845-07-02 12:00:00 bone 1159 #> 2319 32.3 8 1845 1845-08-01 22:00:00 blood 1160 #> 2320 32.3 8 1845 1845-08-01 22:00:00 bone 1160 #> 2321 29.6 9 1845 1845-09-01 08:00:01 blood 1161 #> 2322 29.6 9 1845 1845-09-01 08:00:01 bone 1161 #> 2323 40.7 10 1845 1845-10-01 18:00:00 blood 1162 #> 2324 40.7 10 1845 1845-10-01 18:00:00 bone 1162 #> 2325 39.4 11 1845 1845-11-01 04:00:00 blood 1163 #> 2326 39.4 11 1845 1845-11-01 04:00:00 bone 1163 #> 2327 59.7 12 1845 1845-12-01 14:00:01 blood 1164 #> 2328 59.7 12 1845 1845-12-01 14:00:01 bone 1164 #> 2329 38.7 1 1846 1846-01-01 00:00:00 blood 1165 #> 2330 38.7 1 1846 1846-01-01 00:00:00 bone 1165 #> 2331 51.0 2 1846 1846-01-31 10:00:00 blood 1166 #> 2332 51.0 2 1846 1846-01-31 10:00:00 bone 1166 #> 2333 63.9 3 1846 1846-03-02 20:00:01 blood 1167 #> 2334 63.9 3 1846 1846-03-02 20:00:01 bone 1167 #> 2335 69.2 4 1846 1846-04-02 06:00:00 blood 1168 #> 2336 69.2 4 1846 1846-04-02 06:00:00 bone 1168 #> 2337 59.9 5 1846 1846-05-02 16:00:00 blood 1169 #> 2338 59.9 5 1846 1846-05-02 16:00:00 bone 1169 #> 2339 65.1 6 1846 1846-06-02 02:00:01 blood 1170 #> 2340 65.1 6 1846 1846-06-02 02:00:01 bone 1170 #> 2341 46.5 7 1846 1846-07-02 12:00:00 blood 1171 #> 2342 46.5 7 1846 1846-07-02 12:00:00 bone 1171 #> 2343 54.8 8 1846 1846-08-01 22:00:00 blood 1172 #> 2344 54.8 8 1846 1846-08-01 22:00:00 bone 1172 #> 2345 107.1 9 1846 1846-09-01 08:00:01 blood 1173 #> 2346 107.1 9 1846 1846-09-01 08:00:01 bone 1173 #> 2347 55.9 10 1846 1846-10-01 18:00:00 blood 1174 #> 2348 55.9 10 1846 1846-10-01 18:00:00 bone 1174 #> 2349 60.4 11 1846 1846-11-01 04:00:00 blood 1175 #> 2350 60.4 11 1846 1846-11-01 04:00:00 bone 1175 #> 2351 65.5 12 1846 1846-12-01 14:00:01 blood 1176 #> 2352 65.5 12 1846 1846-12-01 14:00:01 bone 1176 #> 2353 62.6 1 1847 1847-01-01 00:00:00 blood 1177 #> 2354 62.6 1 1847 1847-01-01 00:00:00 bone 1177 #> 2355 44.9 2 1847 1847-01-31 10:00:00 blood 1178 #> 2356 44.9 2 1847 1847-01-31 10:00:00 bone 1178 #> 2357 85.7 3 1847 1847-03-02 20:00:01 blood 1179 #> 2358 85.7 3 1847 1847-03-02 20:00:01 bone 1179 #> 2359 44.7 4 1847 1847-04-02 06:00:00 blood 1180 #> 2360 44.7 4 1847 1847-04-02 06:00:00 bone 1180 #> 2361 75.4 5 1847 1847-05-02 16:00:00 blood 1181 #> 2362 75.4 5 1847 1847-05-02 16:00:00 bone 1181 #> 2363 85.3 6 1847 1847-06-02 02:00:01 blood 1182 #> 2364 85.3 6 1847 1847-06-02 02:00:01 bone 1182 #> 2365 52.2 7 1847 1847-07-02 12:00:00 blood 1183 #> 2366 52.2 7 1847 1847-07-02 12:00:00 bone 1183 #> 2367 140.6 8 1847 1847-08-01 22:00:00 blood 1184 #> 2368 140.6 8 1847 1847-08-01 22:00:00 bone 1184 #> 2369 161.2 9 1847 1847-09-01 08:00:01 blood 1185 #> 2370 161.2 9 1847 1847-09-01 08:00:01 bone 1185 #> 2371 180.4 10 1847 1847-10-01 18:00:00 blood 1186 #> 2372 180.4 10 1847 1847-10-01 18:00:00 bone 1186 #> 2373 138.9 11 1847 1847-11-01 04:00:00 blood 1187 #> 2374 138.9 11 1847 1847-11-01 04:00:00 bone 1187 #> 2375 109.6 12 1847 1847-12-01 14:00:01 blood 1188 #> 2376 109.6 12 1847 1847-12-01 14:00:01 bone 1188 #> 2377 159.1 1 1848 1848-01-01 00:00:00 blood 1189 #> 2378 159.1 1 1848 1848-01-01 00:00:00 bone 1189 #> 2379 111.8 2 1848 1848-01-31 12:00:00 blood 1190 #> 2380 111.8 2 1848 1848-01-31 12:00:00 bone 1190 #> 2381 108.9 3 1848 1848-03-02 00:00:01 blood 1191 #> 2382 108.9 3 1848 1848-03-02 00:00:01 bone 1191 #> 2383 107.1 4 1848 1848-04-01 12:00:00 blood 1192 #> 2384 107.1 4 1848 1848-04-01 12:00:00 bone 1192 #> 2385 102.2 5 1848 1848-05-02 00:00:00 blood 1193 #> 2386 102.2 5 1848 1848-05-02 00:00:00 bone 1193 #> 2387 123.8 6 1848 1848-06-01 12:00:01 blood 1194 #> 2388 123.8 6 1848 1848-06-01 12:00:01 bone 1194 #> 2389 139.2 7 1848 1848-07-02 00:00:00 blood 1195 #> 2390 139.2 7 1848 1848-07-02 00:00:00 bone 1195 #> 2391 132.5 8 1848 1848-08-01 12:00:00 blood 1196 #> 2392 132.5 8 1848 1848-08-01 12:00:00 bone 1196 #> 2393 100.3 9 1848 1848-09-01 00:00:01 blood 1197 #> 2394 100.3 9 1848 1848-09-01 00:00:01 bone 1197 #> 2395 132.4 10 1848 1848-10-01 12:00:00 blood 1198 #> 2396 132.4 10 1848 1848-10-01 12:00:00 bone 1198 #> 2397 114.6 11 1848 1848-11-01 00:00:00 blood 1199 #> 2398 114.6 11 1848 1848-11-01 00:00:00 bone 1199 #> 2399 159.9 12 1848 1848-12-01 12:00:01 blood 1200 #> 2400 159.9 12 1848 1848-12-01 12:00:01 bone 1200 #> 2401 156.7 1 1849 1849-01-01 00:00:00 blood 1201 #> 2402 156.7 1 1849 1849-01-01 00:00:00 bone 1201 #> 2403 131.7 2 1849 1849-01-31 10:00:00 blood 1202 #> 2404 131.7 2 1849 1849-01-31 10:00:00 bone 1202 #> 2405 96.5 3 1849 1849-03-02 20:00:01 blood 1203 #> 2406 96.5 3 1849 1849-03-02 20:00:01 bone 1203 #> 2407 102.5 4 1849 1849-04-02 06:00:00 blood 1204 #> 2408 102.5 4 1849 1849-04-02 06:00:00 bone 1204 #> 2409 80.6 5 1849 1849-05-02 16:00:00 blood 1205 #> 2410 80.6 5 1849 1849-05-02 16:00:00 bone 1205 #> 2411 81.2 6 1849 1849-06-02 02:00:01 blood 1206 #> 2412 81.2 6 1849 1849-06-02 02:00:01 bone 1206 #> 2413 78.0 7 1849 1849-07-02 12:00:00 blood 1207 #> 2414 78.0 7 1849 1849-07-02 12:00:00 bone 1207 #> 2415 61.3 8 1849 1849-08-01 22:00:00 blood 1208 #> 2416 61.3 8 1849 1849-08-01 22:00:00 bone 1208 #> 2417 93.7 9 1849 1849-09-01 08:00:01 blood 1209 #> 2418 93.7 9 1849 1849-09-01 08:00:01 bone 1209 #> 2419 71.5 10 1849 1849-10-01 18:00:00 blood 1210 #> 2420 71.5 10 1849 1849-10-01 18:00:00 bone 1210 #> 2421 99.7 11 1849 1849-11-01 04:00:00 blood 1211 #> 2422 99.7 11 1849 1849-11-01 04:00:00 bone 1211 #> 2423 97.0 12 1849 1849-12-01 14:00:01 blood 1212 #> 2424 97.0 12 1849 1849-12-01 14:00:01 bone 1212 #> 2425 78.0 1 1850 1850-01-01 00:00:00 blood 1213 #> 2426 78.0 1 1850 1850-01-01 00:00:00 bone 1213 #> 2427 89.4 2 1850 1850-01-31 10:00:00 blood 1214 #> 2428 89.4 2 1850 1850-01-31 10:00:00 bone 1214 #> 2429 82.6 3 1850 1850-03-02 20:00:01 blood 1215 #> 2430 82.6 3 1850 1850-03-02 20:00:01 bone 1215 #> 2431 44.1 4 1850 1850-04-02 06:00:00 blood 1216 #> 2432 44.1 4 1850 1850-04-02 06:00:00 bone 1216 #> 2433 61.6 5 1850 1850-05-02 16:00:00 blood 1217 #> 2434 61.6 5 1850 1850-05-02 16:00:00 bone 1217 #> 2435 70.0 6 1850 1850-06-02 02:00:01 blood 1218 #> 2436 70.0 6 1850 1850-06-02 02:00:01 bone 1218 #> 2437 39.1 7 1850 1850-07-02 12:00:00 blood 1219 #> 2438 39.1 7 1850 1850-07-02 12:00:00 bone 1219 #> 2439 61.6 8 1850 1850-08-01 22:00:00 blood 1220 #> 2440 61.6 8 1850 1850-08-01 22:00:00 bone 1220 #> 2441 86.2 9 1850 1850-09-01 08:00:01 blood 1221 #> 2442 86.2 9 1850 1850-09-01 08:00:01 bone 1221 #> 2443 71.0 10 1850 1850-10-01 18:00:00 blood 1222 #> 2444 71.0 10 1850 1850-10-01 18:00:00 bone 1222 #> 2445 54.8 11 1850 1850-11-01 04:00:00 blood 1223 #> 2446 54.8 11 1850 1850-11-01 04:00:00 bone 1223 #> 2447 60.0 12 1850 1850-12-01 14:00:01 blood 1224 #> 2448 60.0 12 1850 1850-12-01 14:00:01 bone 1224 #> 2449 75.5 1 1851 1851-01-01 00:00:00 blood 1225 #> 2450 75.5 1 1851 1851-01-01 00:00:00 bone 1225 #> 2451 105.4 2 1851 1851-01-31 10:00:00 blood 1226 #> 2452 105.4 2 1851 1851-01-31 10:00:00 bone 1226 #> 2453 64.6 3 1851 1851-03-02 20:00:01 blood 1227 #> 2454 64.6 3 1851 1851-03-02 20:00:01 bone 1227 #> 2455 56.5 4 1851 1851-04-02 06:00:00 blood 1228 #> 2456 56.5 4 1851 1851-04-02 06:00:00 bone 1228 #> 2457 62.6 5 1851 1851-05-02 16:00:00 blood 1229 #> 2458 62.6 5 1851 1851-05-02 16:00:00 bone 1229 #> 2459 63.2 6 1851 1851-06-02 02:00:01 blood 1230 #> 2460 63.2 6 1851 1851-06-02 02:00:01 bone 1230 #> 2461 36.1 7 1851 1851-07-02 12:00:00 blood 1231 #> 2462 36.1 7 1851 1851-07-02 12:00:00 bone 1231 #> 2463 57.4 8 1851 1851-08-01 22:00:00 blood 1232 #> 2464 57.4 8 1851 1851-08-01 22:00:00 bone 1232 #> 2465 67.9 9 1851 1851-09-01 08:00:01 blood 1233 #> 2466 67.9 9 1851 1851-09-01 08:00:01 bone 1233 #> 2467 62.5 10 1851 1851-10-01 18:00:00 blood 1234 #> 2468 62.5 10 1851 1851-10-01 18:00:00 bone 1234 #> 2469 50.9 11 1851 1851-11-01 04:00:00 blood 1235 #> 2470 50.9 11 1851 1851-11-01 04:00:00 bone 1235 #> 2471 71.4 12 1851 1851-12-01 14:00:01 blood 1236 #> 2472 71.4 12 1851 1851-12-01 14:00:01 bone 1236 #> 2473 68.4 1 1852 1852-01-01 00:00:00 blood 1237 #> 2474 68.4 1 1852 1852-01-01 00:00:00 bone 1237 #> 2475 67.5 2 1852 1852-01-31 12:00:00 blood 1238 #> 2476 67.5 2 1852 1852-01-31 12:00:00 bone 1238 #> 2477 61.2 3 1852 1852-03-02 00:00:01 blood 1239 #> 2478 61.2 3 1852 1852-03-02 00:00:01 bone 1239 #> 2479 65.4 4 1852 1852-04-01 12:00:00 blood 1240 #> 2480 65.4 4 1852 1852-04-01 12:00:00 bone 1240 #> 2481 54.9 5 1852 1852-05-02 00:00:00 blood 1241 #> 2482 54.9 5 1852 1852-05-02 00:00:00 bone 1241 #> 2483 46.9 6 1852 1852-06-01 12:00:01 blood 1242 #> 2484 46.9 6 1852 1852-06-01 12:00:01 bone 1242 #> 2485 42.0 7 1852 1852-07-02 00:00:00 blood 1243 #> 2486 42.0 7 1852 1852-07-02 00:00:00 bone 1243 #> 2487 39.7 8 1852 1852-08-01 12:00:00 blood 1244 #> 2488 39.7 8 1852 1852-08-01 12:00:00 bone 1244 #> 2489 37.5 9 1852 1852-09-01 00:00:01 blood 1245 #> 2490 37.5 9 1852 1852-09-01 00:00:01 bone 1245 #> 2491 67.3 10 1852 1852-10-01 12:00:00 blood 1246 #> 2492 67.3 10 1852 1852-10-01 12:00:00 bone 1246 #> 2493 54.3 11 1852 1852-11-01 00:00:00 blood 1247 #> 2494 54.3 11 1852 1852-11-01 00:00:00 bone 1247 #> 2495 45.4 12 1852 1852-12-01 12:00:01 blood 1248 #> 2496 45.4 12 1852 1852-12-01 12:00:01 bone 1248 #> 2497 41.1 1 1853 1853-01-01 00:00:00 blood 1249 #> 2498 41.1 1 1853 1853-01-01 00:00:00 bone 1249 #> 2499 42.9 2 1853 1853-01-31 10:00:00 blood 1250 #> 2500 42.9 2 1853 1853-01-31 10:00:00 bone 1250 #> 2501 37.7 3 1853 1853-03-02 20:00:01 blood 1251 #> 2502 37.7 3 1853 1853-03-02 20:00:01 bone 1251 #> 2503 47.6 4 1853 1853-04-02 06:00:00 blood 1252 #> 2504 47.6 4 1853 1853-04-02 06:00:00 bone 1252 #> 2505 34.7 5 1853 1853-05-02 16:00:00 blood 1253 #> 2506 34.7 5 1853 1853-05-02 16:00:00 bone 1253 #> 2507 40.0 6 1853 1853-06-02 02:00:01 blood 1254 #> 2508 40.0 6 1853 1853-06-02 02:00:01 bone 1254 #> 2509 45.9 7 1853 1853-07-02 12:00:00 blood 1255 #> 2510 45.9 7 1853 1853-07-02 12:00:00 bone 1255 #> 2511 50.4 8 1853 1853-08-01 22:00:00 blood 1256 #> 2512 50.4 8 1853 1853-08-01 22:00:00 bone 1256 #> 2513 33.5 9 1853 1853-09-01 08:00:01 blood 1257 #> 2514 33.5 9 1853 1853-09-01 08:00:01 bone 1257 #> 2515 42.3 10 1853 1853-10-01 18:00:00 blood 1258 #> 2516 42.3 10 1853 1853-10-01 18:00:00 bone 1258 #> 2517 28.8 11 1853 1853-11-01 04:00:00 blood 1259 #> 2518 28.8 11 1853 1853-11-01 04:00:00 bone 1259 #> 2519 23.4 12 1853 1853-12-01 14:00:01 blood 1260 #> 2520 23.4 12 1853 1853-12-01 14:00:01 bone 1260 #> 2521 15.4 1 1854 1854-01-01 00:00:00 blood 1261 #> 2522 15.4 1 1854 1854-01-01 00:00:00 bone 1261 #> 2523 20.0 2 1854 1854-01-31 10:00:00 blood 1262 #> 2524 20.0 2 1854 1854-01-31 10:00:00 bone 1262 #> 2525 20.7 3 1854 1854-03-02 20:00:01 blood 1263 #> 2526 20.7 3 1854 1854-03-02 20:00:01 bone 1263 #> 2527 26.4 4 1854 1854-04-02 06:00:00 blood 1264 #> 2528 26.4 4 1854 1854-04-02 06:00:00 bone 1264 #> 2529 24.0 5 1854 1854-05-02 16:00:00 blood 1265 #> 2530 24.0 5 1854 1854-05-02 16:00:00 bone 1265 #> 2531 21.1 6 1854 1854-06-02 02:00:01 blood 1266 #> 2532 21.1 6 1854 1854-06-02 02:00:01 bone 1266 #> 2533 18.7 7 1854 1854-07-02 12:00:00 blood 1267 #> 2534 18.7 7 1854 1854-07-02 12:00:00 bone 1267 #> 2535 15.8 8 1854 1854-08-01 22:00:00 blood 1268 #> 2536 15.8 8 1854 1854-08-01 22:00:00 bone 1268 #> 2537 22.4 9 1854 1854-09-01 08:00:01 blood 1269 #> 2538 22.4 9 1854 1854-09-01 08:00:01 bone 1269 #> 2539 12.7 10 1854 1854-10-01 18:00:00 blood 1270 #> 2540 12.7 10 1854 1854-10-01 18:00:00 bone 1270 #> 2541 28.2 11 1854 1854-11-01 04:00:00 blood 1271 #> 2542 28.2 11 1854 1854-11-01 04:00:00 bone 1271 #> 2543 21.4 12 1854 1854-12-01 14:00:01 blood 1272 #> 2544 21.4 12 1854 1854-12-01 14:00:01 bone 1272 #> 2545 12.3 1 1855 1855-01-01 00:00:00 blood 1273 #> 2546 12.3 1 1855 1855-01-01 00:00:00 bone 1273 #> 2547 11.4 2 1855 1855-01-31 10:00:00 blood 1274 #> 2548 11.4 2 1855 1855-01-31 10:00:00 bone 1274 #> 2549 17.4 3 1855 1855-03-02 20:00:01 blood 1275 #> 2550 17.4 3 1855 1855-03-02 20:00:01 bone 1275 #> 2551 4.4 4 1855 1855-04-02 06:00:00 blood 1276 #> 2552 4.4 4 1855 1855-04-02 06:00:00 bone 1276 #> 2553 9.1 5 1855 1855-05-02 16:00:00 blood 1277 #> 2554 9.1 5 1855 1855-05-02 16:00:00 bone 1277 #> 2555 5.3 6 1855 1855-06-02 02:00:01 blood 1278 #> 2556 5.3 6 1855 1855-06-02 02:00:01 bone 1278 #> 2557 0.4 7 1855 1855-07-02 12:00:00 blood 1279 #> 2558 0.4 7 1855 1855-07-02 12:00:00 bone 1279 #> 2559 3.1 8 1855 1855-08-01 22:00:00 blood 1280 #> 2560 3.1 8 1855 1855-08-01 22:00:00 bone 1280 #> 2561 0.0 9 1855 1855-09-01 08:00:01 blood 1281 #> 2562 0.0 9 1855 1855-09-01 08:00:01 bone 1281 #> 2563 9.7 10 1855 1855-10-01 18:00:00 blood 1282 #> 2564 9.7 10 1855 1855-10-01 18:00:00 bone 1282 #> 2565 4.3 11 1855 1855-11-01 04:00:00 blood 1283 #> 2566 4.3 11 1855 1855-11-01 04:00:00 bone 1283 #> 2567 3.1 12 1855 1855-12-01 14:00:01 blood 1284 #> 2568 3.1 12 1855 1855-12-01 14:00:01 bone 1284 #> 2569 0.5 1 1856 1856-01-01 00:00:00 blood 1285 #> 2570 0.5 1 1856 1856-01-01 00:00:00 bone 1285 #> 2571 4.9 2 1856 1856-01-31 12:00:00 blood 1286 #> 2572 4.9 2 1856 1856-01-31 12:00:00 bone 1286 #> 2573 0.4 3 1856 1856-03-02 00:00:01 blood 1287 #> 2574 0.4 3 1856 1856-03-02 00:00:01 bone 1287 #> 2575 6.5 4 1856 1856-04-01 12:00:00 blood 1288 #> 2576 6.5 4 1856 1856-04-01 12:00:00 bone 1288 #> 2577 0.0 5 1856 1856-05-02 00:00:00 blood 1289 #> 2578 0.0 5 1856 1856-05-02 00:00:00 bone 1289 #> 2579 5.0 6 1856 1856-06-01 12:00:01 blood 1290 #> 2580 5.0 6 1856 1856-06-01 12:00:01 bone 1290 #> 2581 4.6 7 1856 1856-07-02 00:00:00 blood 1291 #> 2582 4.6 7 1856 1856-07-02 00:00:00 bone 1291 #> 2583 5.9 8 1856 1856-08-01 12:00:00 blood 1292 #> 2584 5.9 8 1856 1856-08-01 12:00:00 bone 1292 #> 2585 4.4 9 1856 1856-09-01 00:00:01 blood 1293 #> 2586 4.4 9 1856 1856-09-01 00:00:01 bone 1293 #> 2587 4.5 10 1856 1856-10-01 12:00:00 blood 1294 #> 2588 4.5 10 1856 1856-10-01 12:00:00 bone 1294 #> 2589 7.7 11 1856 1856-11-01 00:00:00 blood 1295 #> 2590 7.7 11 1856 1856-11-01 00:00:00 bone 1295 #> 2591 7.2 12 1856 1856-12-01 12:00:01 blood 1296 #> 2592 7.2 12 1856 1856-12-01 12:00:01 bone 1296 #> 2593 13.7 1 1857 1857-01-01 00:00:00 blood 1297 #> 2594 13.7 1 1857 1857-01-01 00:00:00 bone 1297 #> 2595 7.4 2 1857 1857-01-31 10:00:00 blood 1298 #> 2596 7.4 2 1857 1857-01-31 10:00:00 bone 1298 #> 2597 5.2 3 1857 1857-03-02 20:00:01 blood 1299 #> 2598 5.2 3 1857 1857-03-02 20:00:01 bone 1299 #> 2599 11.1 4 1857 1857-04-02 06:00:00 blood 1300 #> 2600 11.1 4 1857 1857-04-02 06:00:00 bone 1300 #> 2601 29.2 5 1857 1857-05-02 16:00:00 blood 1301 #> 2602 29.2 5 1857 1857-05-02 16:00:00 bone 1301 #> 2603 16.0 6 1857 1857-06-02 02:00:01 blood 1302 #> 2604 16.0 6 1857 1857-06-02 02:00:01 bone 1302 #> 2605 22.2 7 1857 1857-07-02 12:00:00 blood 1303 #> 2606 22.2 7 1857 1857-07-02 12:00:00 bone 1303 #> 2607 16.9 8 1857 1857-08-01 22:00:00 blood 1304 #> 2608 16.9 8 1857 1857-08-01 22:00:00 bone 1304 #> 2609 42.4 9 1857 1857-09-01 08:00:01 blood 1305 #> 2610 42.4 9 1857 1857-09-01 08:00:01 bone 1305 #> 2611 40.6 10 1857 1857-10-01 18:00:00 blood 1306 #> 2612 40.6 10 1857 1857-10-01 18:00:00 bone 1306 #> 2613 31.4 11 1857 1857-11-01 04:00:00 blood 1307 #> 2614 31.4 11 1857 1857-11-01 04:00:00 bone 1307 #> 2615 37.2 12 1857 1857-12-01 14:00:01 blood 1308 #> 2616 37.2 12 1857 1857-12-01 14:00:01 bone 1308 #> 2617 39.0 1 1858 1858-01-01 00:00:00 blood 1309 #> 2618 39.0 1 1858 1858-01-01 00:00:00 bone 1309 #> 2619 34.9 2 1858 1858-01-31 10:00:00 blood 1310 #> 2620 34.9 2 1858 1858-01-31 10:00:00 bone 1310 #> 2621 57.5 3 1858 1858-03-02 20:00:01 blood 1311 #> 2622 57.5 3 1858 1858-03-02 20:00:01 bone 1311 #> 2623 38.3 4 1858 1858-04-02 06:00:00 blood 1312 #> 2624 38.3 4 1858 1858-04-02 06:00:00 bone 1312 #> 2625 41.4 5 1858 1858-05-02 16:00:00 blood 1313 #> 2626 41.4 5 1858 1858-05-02 16:00:00 bone 1313 #> 2627 44.5 6 1858 1858-06-02 02:00:01 blood 1314 #> 2628 44.5 6 1858 1858-06-02 02:00:01 bone 1314 #> 2629 56.7 7 1858 1858-07-02 12:00:00 blood 1315 #> 2630 56.7 7 1858 1858-07-02 12:00:00 bone 1315 #> 2631 55.3 8 1858 1858-08-01 22:00:00 blood 1316 #> 2632 55.3 8 1858 1858-08-01 22:00:00 bone 1316 #> 2633 80.1 9 1858 1858-09-01 08:00:01 blood 1317 #> 2634 80.1 9 1858 1858-09-01 08:00:01 bone 1317 #> 2635 91.2 10 1858 1858-10-01 18:00:00 blood 1318 #> 2636 91.2 10 1858 1858-10-01 18:00:00 bone 1318 #> 2637 51.9 11 1858 1858-11-01 04:00:00 blood 1319 #> 2638 51.9 11 1858 1858-11-01 04:00:00 bone 1319 #> 2639 66.9 12 1858 1858-12-01 14:00:01 blood 1320 #> 2640 66.9 12 1858 1858-12-01 14:00:01 bone 1320 #> 2641 83.7 1 1859 1859-01-01 00:00:00 blood 1321 #> 2642 83.7 1 1859 1859-01-01 00:00:00 bone 1321 #> 2643 87.6 2 1859 1859-01-31 10:00:00 blood 1322 #> 2644 87.6 2 1859 1859-01-31 10:00:00 bone 1322 #> 2645 90.3 3 1859 1859-03-02 20:00:01 blood 1323 #> 2646 90.3 3 1859 1859-03-02 20:00:01 bone 1323 #> 2647 85.7 4 1859 1859-04-02 06:00:00 blood 1324 #> 2648 85.7 4 1859 1859-04-02 06:00:00 bone 1324 #> 2649 91.0 5 1859 1859-05-02 16:00:00 blood 1325 #> 2650 91.0 5 1859 1859-05-02 16:00:00 bone 1325 #> 2651 87.1 6 1859 1859-06-02 02:00:01 blood 1326 #> 2652 87.1 6 1859 1859-06-02 02:00:01 bone 1326 #> 2653 95.2 7 1859 1859-07-02 12:00:00 blood 1327 #> 2654 95.2 7 1859 1859-07-02 12:00:00 bone 1327 #> 2655 106.8 8 1859 1859-08-01 22:00:00 blood 1328 #> 2656 106.8 8 1859 1859-08-01 22:00:00 bone 1328 #> 2657 105.8 9 1859 1859-09-01 08:00:01 blood 1329 #> 2658 105.8 9 1859 1859-09-01 08:00:01 bone 1329 #> 2659 114.6 10 1859 1859-10-01 18:00:00 blood 1330 #> 2660 114.6 10 1859 1859-10-01 18:00:00 bone 1330 #> 2661 97.2 11 1859 1859-11-01 04:00:00 blood 1331 #> 2662 97.2 11 1859 1859-11-01 04:00:00 bone 1331 #> 2663 81.0 12 1859 1859-12-01 14:00:01 blood 1332 #> 2664 81.0 12 1859 1859-12-01 14:00:01 bone 1332 #> 2665 81.5 1 1860 1860-01-01 00:00:00 blood 1333 #> 2666 81.5 1 1860 1860-01-01 00:00:00 bone 1333 #> 2667 88.0 2 1860 1860-01-31 12:00:01 blood 1334 #> 2668 88.0 2 1860 1860-01-31 12:00:01 bone 1334 #> 2669 98.9 3 1860 1860-03-02 00:00:01 blood 1335 #> 2670 98.9 3 1860 1860-03-02 00:00:01 bone 1335 #> 2671 71.4 4 1860 1860-04-01 12:00:00 blood 1336 #> 2672 71.4 4 1860 1860-04-01 12:00:00 bone 1336 #> 2673 107.1 5 1860 1860-05-02 00:00:01 blood 1337 #> 2674 107.1 5 1860 1860-05-02 00:00:01 bone 1337 #> 2675 108.6 6 1860 1860-06-01 12:00:01 blood 1338 #> 2676 108.6 6 1860 1860-06-01 12:00:01 bone 1338 #> 2677 116.7 7 1860 1860-07-02 00:00:00 blood 1339 #> 2678 116.7 7 1860 1860-07-02 00:00:00 bone 1339 #> 2679 100.3 8 1860 1860-08-01 12:00:01 blood 1340 #> 2680 100.3 8 1860 1860-08-01 12:00:01 bone 1340 #> 2681 92.2 9 1860 1860-09-01 00:00:01 blood 1341 #> 2682 92.2 9 1860 1860-09-01 00:00:01 bone 1341 #> 2683 90.1 10 1860 1860-10-01 12:00:00 blood 1342 #> 2684 90.1 10 1860 1860-10-01 12:00:00 bone 1342 #> 2685 97.9 11 1860 1860-11-01 00:00:01 blood 1343 #> 2686 97.9 11 1860 1860-11-01 00:00:01 bone 1343 #> 2687 95.6 12 1860 1860-12-01 12:00:01 blood 1344 #> 2688 95.6 12 1860 1860-12-01 12:00:01 bone 1344 #> 2689 62.3 1 1861 1861-01-01 00:00:00 blood 1345 #> 2690 62.3 1 1861 1861-01-01 00:00:00 bone 1345 #> 2691 77.8 2 1861 1861-01-31 10:00:01 blood 1346 #> 2692 77.8 2 1861 1861-01-31 10:00:01 bone 1346 #> 2693 101.0 3 1861 1861-03-02 20:00:01 blood 1347 #> 2694 101.0 3 1861 1861-03-02 20:00:01 bone 1347 #> 2695 98.5 4 1861 1861-04-02 06:00:00 blood 1348 #> 2696 98.5 4 1861 1861-04-02 06:00:00 bone 1348 #> 2697 56.8 5 1861 1861-05-02 16:00:01 blood 1349 #> 2698 56.8 5 1861 1861-05-02 16:00:01 bone 1349 #> 2699 87.8 6 1861 1861-06-02 02:00:01 blood 1350 #> 2700 87.8 6 1861 1861-06-02 02:00:01 bone 1350 #> 2701 78.0 7 1861 1861-07-02 12:00:00 blood 1351 #> 2702 78.0 7 1861 1861-07-02 12:00:00 bone 1351 #> 2703 82.5 8 1861 1861-08-01 22:00:01 blood 1352 #> 2704 82.5 8 1861 1861-08-01 22:00:01 bone 1352 #> 2705 79.9 9 1861 1861-09-01 08:00:01 blood 1353 #> 2706 79.9 9 1861 1861-09-01 08:00:01 bone 1353 #> 2707 67.2 10 1861 1861-10-01 18:00:00 blood 1354 #> 2708 67.2 10 1861 1861-10-01 18:00:00 bone 1354 #> 2709 53.7 11 1861 1861-11-01 04:00:01 blood 1355 #> 2710 53.7 11 1861 1861-11-01 04:00:01 bone 1355 #> 2711 80.5 12 1861 1861-12-01 14:00:01 blood 1356 #> 2712 80.5 12 1861 1861-12-01 14:00:01 bone 1356 #> 2713 63.1 1 1862 1862-01-01 00:00:00 blood 1357 #> 2714 63.1 1 1862 1862-01-01 00:00:00 bone 1357 #> 2715 64.5 2 1862 1862-01-31 10:00:01 blood 1358 #> 2716 64.5 2 1862 1862-01-31 10:00:01 bone 1358 #> 2717 43.6 3 1862 1862-03-02 20:00:01 blood 1359 #> 2718 43.6 3 1862 1862-03-02 20:00:01 bone 1359 #> 2719 53.7 4 1862 1862-04-02 06:00:00 blood 1360 #> 2720 53.7 4 1862 1862-04-02 06:00:00 bone 1360 #> 2721 64.4 5 1862 1862-05-02 16:00:01 blood 1361 #> 2722 64.4 5 1862 1862-05-02 16:00:01 bone 1361 #> 2723 84.0 6 1862 1862-06-02 02:00:01 blood 1362 #> 2724 84.0 6 1862 1862-06-02 02:00:01 bone 1362 #> 2725 73.4 7 1862 1862-07-02 12:00:00 blood 1363 #> 2726 73.4 7 1862 1862-07-02 12:00:00 bone 1363 #> 2727 62.5 8 1862 1862-08-01 22:00:01 blood 1364 #> 2728 62.5 8 1862 1862-08-01 22:00:01 bone 1364 #> 2729 66.6 9 1862 1862-09-01 08:00:01 blood 1365 #> 2730 66.6 9 1862 1862-09-01 08:00:01 bone 1365 #> 2731 42.0 10 1862 1862-10-01 18:00:00 blood 1366 #> 2732 42.0 10 1862 1862-10-01 18:00:00 bone 1366 #> 2733 50.6 11 1862 1862-11-01 04:00:01 blood 1367 #> 2734 50.6 11 1862 1862-11-01 04:00:01 bone 1367 #> 2735 40.9 12 1862 1862-12-01 14:00:01 blood 1368 #> 2736 40.9 12 1862 1862-12-01 14:00:01 bone 1368 #> 2737 48.3 1 1863 1863-01-01 00:00:00 blood 1369 #> 2738 48.3 1 1863 1863-01-01 00:00:00 bone 1369 #> 2739 56.7 2 1863 1863-01-31 10:00:01 blood 1370 #> 2740 56.7 2 1863 1863-01-31 10:00:01 bone 1370 #> 2741 66.4 3 1863 1863-03-02 20:00:01 blood 1371 #> 2742 66.4 3 1863 1863-03-02 20:00:01 bone 1371 #> 2743 40.6 4 1863 1863-04-02 06:00:00 blood 1372 #> 2744 40.6 4 1863 1863-04-02 06:00:00 bone 1372 #> 2745 53.8 5 1863 1863-05-02 16:00:01 blood 1373 #> 2746 53.8 5 1863 1863-05-02 16:00:01 bone 1373 #> 2747 40.8 6 1863 1863-06-02 02:00:01 blood 1374 #> 2748 40.8 6 1863 1863-06-02 02:00:01 bone 1374 #> 2749 32.7 7 1863 1863-07-02 12:00:00 blood 1375 #> 2750 32.7 7 1863 1863-07-02 12:00:00 bone 1375 #> 2751 48.1 8 1863 1863-08-01 22:00:01 blood 1376 #> 2752 48.1 8 1863 1863-08-01 22:00:01 bone 1376 #> 2753 22.0 9 1863 1863-09-01 08:00:01 blood 1377 #> 2754 22.0 9 1863 1863-09-01 08:00:01 bone 1377 #> 2755 39.9 10 1863 1863-10-01 18:00:00 blood 1378 #> 2756 39.9 10 1863 1863-10-01 18:00:00 bone 1378 #> 2757 37.7 11 1863 1863-11-01 04:00:01 blood 1379 #> 2758 37.7 11 1863 1863-11-01 04:00:01 bone 1379 #> 2759 41.2 12 1863 1863-12-01 14:00:01 blood 1380 #> 2760 41.2 12 1863 1863-12-01 14:00:01 bone 1380 #> 2761 57.7 1 1864 1864-01-01 00:00:00 blood 1381 #> 2762 57.7 1 1864 1864-01-01 00:00:00 bone 1381 #> 2763 47.1 2 1864 1864-01-31 12:00:01 blood 1382 #> 2764 47.1 2 1864 1864-01-31 12:00:01 bone 1382 #> 2765 66.3 3 1864 1864-03-02 00:00:01 blood 1383 #> 2766 66.3 3 1864 1864-03-02 00:00:01 bone 1383 #> 2767 35.8 4 1864 1864-04-01 12:00:00 blood 1384 #> 2768 35.8 4 1864 1864-04-01 12:00:00 bone 1384 #> 2769 40.6 5 1864 1864-05-02 00:00:01 blood 1385 #> 2770 40.6 5 1864 1864-05-02 00:00:01 bone 1385 #> 2771 57.8 6 1864 1864-06-01 12:00:01 blood 1386 #> 2772 57.8 6 1864 1864-06-01 12:00:01 bone 1386 #> 2773 54.7 7 1864 1864-07-02 00:00:00 blood 1387 #> 2774 54.7 7 1864 1864-07-02 00:00:00 bone 1387 #> 2775 54.8 8 1864 1864-08-01 12:00:01 blood 1388 #> 2776 54.8 8 1864 1864-08-01 12:00:01 bone 1388 #> 2777 28.5 9 1864 1864-09-01 00:00:01 blood 1389 #> 2778 28.5 9 1864 1864-09-01 00:00:01 bone 1389 #> 2779 33.9 10 1864 1864-10-01 12:00:00 blood 1390 #> 2780 33.9 10 1864 1864-10-01 12:00:00 bone 1390 #> 2781 57.6 11 1864 1864-11-01 00:00:01 blood 1391 #> 2782 57.6 11 1864 1864-11-01 00:00:01 bone 1391 #> 2783 28.6 12 1864 1864-12-01 12:00:01 blood 1392 #> 2784 28.6 12 1864 1864-12-01 12:00:01 bone 1392 #> 2785 48.7 1 1865 1865-01-01 00:00:00 blood 1393 #> 2786 48.7 1 1865 1865-01-01 00:00:00 bone 1393 #> 2787 39.3 2 1865 1865-01-31 10:00:01 blood 1394 #> 2788 39.3 2 1865 1865-01-31 10:00:01 bone 1394 #> 2789 39.5 3 1865 1865-03-02 20:00:01 blood 1395 #> 2790 39.5 3 1865 1865-03-02 20:00:01 bone 1395 #> 2791 29.4 4 1865 1865-04-02 06:00:00 blood 1396 #> 2792 29.4 4 1865 1865-04-02 06:00:00 bone 1396 #> 2793 34.5 5 1865 1865-05-02 16:00:01 blood 1397 #> 2794 34.5 5 1865 1865-05-02 16:00:01 bone 1397 #> 2795 33.6 6 1865 1865-06-02 02:00:01 blood 1398 #> 2796 33.6 6 1865 1865-06-02 02:00:01 bone 1398 #> 2797 26.8 7 1865 1865-07-02 12:00:00 blood 1399 #> 2798 26.8 7 1865 1865-07-02 12:00:00 bone 1399 #> 2799 37.8 8 1865 1865-08-01 22:00:01 blood 1400 #> 2800 37.8 8 1865 1865-08-01 22:00:01 bone 1400 #> 2801 21.6 9 1865 1865-09-01 08:00:01 blood 1401 #> 2802 21.6 9 1865 1865-09-01 08:00:01 bone 1401 #> 2803 17.1 10 1865 1865-10-01 18:00:00 blood 1402 #> 2804 17.1 10 1865 1865-10-01 18:00:00 bone 1402 #> 2805 24.6 11 1865 1865-11-01 04:00:01 blood 1403 #> 2806 24.6 11 1865 1865-11-01 04:00:01 bone 1403 #> 2807 12.8 12 1865 1865-12-01 14:00:01 blood 1404 #> 2808 12.8 12 1865 1865-12-01 14:00:01 bone 1404 #> 2809 31.6 1 1866 1866-01-01 00:00:00 blood 1405 #> 2810 31.6 1 1866 1866-01-01 00:00:00 bone 1405 #> 2811 38.4 2 1866 1866-01-31 10:00:01 blood 1406 #> 2812 38.4 2 1866 1866-01-31 10:00:01 bone 1406 #> 2813 24.6 3 1866 1866-03-02 20:00:01 blood 1407 #> 2814 24.6 3 1866 1866-03-02 20:00:01 bone 1407 #> 2815 17.6 4 1866 1866-04-02 06:00:00 blood 1408 #> 2816 17.6 4 1866 1866-04-02 06:00:00 bone 1408 #> 2817 12.9 5 1866 1866-05-02 16:00:01 blood 1409 #> 2818 12.9 5 1866 1866-05-02 16:00:01 bone 1409 #> 2819 16.5 6 1866 1866-06-02 02:00:01 blood 1410 #> 2820 16.5 6 1866 1866-06-02 02:00:01 bone 1410 #> 2821 9.3 7 1866 1866-07-02 12:00:00 blood 1411 #> 2822 9.3 7 1866 1866-07-02 12:00:00 bone 1411 #> 2823 12.7 8 1866 1866-08-01 22:00:01 blood 1412 #> 2824 12.7 8 1866 1866-08-01 22:00:01 bone 1412 #> 2825 7.3 9 1866 1866-09-01 08:00:01 blood 1413 #> 2826 7.3 9 1866 1866-09-01 08:00:01 bone 1413 #> 2827 14.1 10 1866 1866-10-01 18:00:00 blood 1414 #> 2828 14.1 10 1866 1866-10-01 18:00:00 bone 1414 #> 2829 9.0 11 1866 1866-11-01 04:00:01 blood 1415 #> 2830 9.0 11 1866 1866-11-01 04:00:01 bone 1415 #> 2831 1.5 12 1866 1866-12-01 14:00:01 blood 1416 #> 2832 1.5 12 1866 1866-12-01 14:00:01 bone 1416 #> 2833 0.0 1 1867 1867-01-01 00:00:00 blood 1417 #> 2834 0.0 1 1867 1867-01-01 00:00:00 bone 1417 #> 2835 0.7 2 1867 1867-01-31 10:00:01 blood 1418 #> 2836 0.7 2 1867 1867-01-31 10:00:01 bone 1418 #> 2837 9.2 3 1867 1867-03-02 20:00:01 blood 1419 #> 2838 9.2 3 1867 1867-03-02 20:00:01 bone 1419 #> 2839 5.1 4 1867 1867-04-02 06:00:00 blood 1420 #> 2840 5.1 4 1867 1867-04-02 06:00:00 bone 1420 #> 2841 2.9 5 1867 1867-05-02 16:00:01 blood 1421 #> 2842 2.9 5 1867 1867-05-02 16:00:01 bone 1421 #> 2843 1.5 6 1867 1867-06-02 02:00:01 blood 1422 #> 2844 1.5 6 1867 1867-06-02 02:00:01 bone 1422 #> 2845 5.0 7 1867 1867-07-02 12:00:00 blood 1423 #> 2846 5.0 7 1867 1867-07-02 12:00:00 bone 1423 #> 2847 4.9 8 1867 1867-08-01 22:00:01 blood 1424 #> 2848 4.9 8 1867 1867-08-01 22:00:01 bone 1424 #> 2849 9.8 9 1867 1867-09-01 08:00:01 blood 1425 #> 2850 9.8 9 1867 1867-09-01 08:00:01 bone 1425 #> 2851 13.5 10 1867 1867-10-01 18:00:00 blood 1426 #> 2852 13.5 10 1867 1867-10-01 18:00:00 bone 1426 #> 2853 9.3 11 1867 1867-11-01 04:00:01 blood 1427 #> 2854 9.3 11 1867 1867-11-01 04:00:01 bone 1427 #> 2855 25.2 12 1867 1867-12-01 14:00:01 blood 1428 #> 2856 25.2 12 1867 1867-12-01 14:00:01 bone 1428 #> 2857 15.6 1 1868 1868-01-01 00:00:00 blood 1429 #> 2858 15.6 1 1868 1868-01-01 00:00:00 bone 1429 #> 2859 15.8 2 1868 1868-01-31 12:00:01 blood 1430 #> 2860 15.8 2 1868 1868-01-31 12:00:01 bone 1430 #> 2861 26.5 3 1868 1868-03-02 00:00:01 blood 1431 #> 2862 26.5 3 1868 1868-03-02 00:00:01 bone 1431 #> 2863 36.6 4 1868 1868-04-01 12:00:00 blood 1432 #> 2864 36.6 4 1868 1868-04-01 12:00:00 bone 1432 #> 2865 26.7 5 1868 1868-05-02 00:00:01 blood 1433 #> 2866 26.7 5 1868 1868-05-02 00:00:01 bone 1433 #> 2867 31.1 6 1868 1868-06-01 12:00:01 blood 1434 #> 2868 31.1 6 1868 1868-06-01 12:00:01 bone 1434 #> 2869 28.6 7 1868 1868-07-02 00:00:00 blood 1435 #> 2870 28.6 7 1868 1868-07-02 00:00:00 bone 1435 #> 2871 34.4 8 1868 1868-08-01 12:00:01 blood 1436 #> 2872 34.4 8 1868 1868-08-01 12:00:01 bone 1436 #> 2873 43.8 9 1868 1868-09-01 00:00:01 blood 1437 #> 2874 43.8 9 1868 1868-09-01 00:00:01 bone 1437 #> 2875 61.7 10 1868 1868-10-01 12:00:00 blood 1438 #> 2876 61.7 10 1868 1868-10-01 12:00:00 bone 1438 #> 2877 59.1 11 1868 1868-11-01 00:00:01 blood 1439 #> 2878 59.1 11 1868 1868-11-01 00:00:01 bone 1439 #> 2879 67.6 12 1868 1868-12-01 12:00:01 blood 1440 #> 2880 67.6 12 1868 1868-12-01 12:00:01 bone 1440 #> 2881 60.9 1 1869 1869-01-01 00:00:00 blood 1441 #> 2882 60.9 1 1869 1869-01-01 00:00:00 bone 1441 #> 2883 59.3 2 1869 1869-01-31 10:00:01 blood 1442 #> 2884 59.3 2 1869 1869-01-31 10:00:01 bone 1442 #> 2885 52.7 3 1869 1869-03-02 20:00:01 blood 1443 #> 2886 52.7 3 1869 1869-03-02 20:00:01 bone 1443 #> 2887 41.0 4 1869 1869-04-02 06:00:00 blood 1444 #> 2888 41.0 4 1869 1869-04-02 06:00:00 bone 1444 #> 2889 104.0 5 1869 1869-05-02 16:00:01 blood 1445 #> 2890 104.0 5 1869 1869-05-02 16:00:01 bone 1445 #> 2891 108.4 6 1869 1869-06-02 02:00:01 blood 1446 #> 2892 108.4 6 1869 1869-06-02 02:00:01 bone 1446 #> 2893 59.2 7 1869 1869-07-02 12:00:00 blood 1447 #> 2894 59.2 7 1869 1869-07-02 12:00:00 bone 1447 #> 2895 79.6 8 1869 1869-08-01 22:00:01 blood 1448 #> 2896 79.6 8 1869 1869-08-01 22:00:01 bone 1448 #> 2897 80.6 9 1869 1869-09-01 08:00:01 blood 1449 #> 2898 80.6 9 1869 1869-09-01 08:00:01 bone 1449 #> 2899 59.4 10 1869 1869-10-01 18:00:00 blood 1450 #> 2900 59.4 10 1869 1869-10-01 18:00:00 bone 1450 #> 2901 77.4 11 1869 1869-11-01 04:00:01 blood 1451 #> 2902 77.4 11 1869 1869-11-01 04:00:01 bone 1451 #> 2903 104.3 12 1869 1869-12-01 14:00:01 blood 1452 #> 2904 104.3 12 1869 1869-12-01 14:00:01 bone 1452 #> 2905 77.3 1 1870 1870-01-01 00:00:00 blood 1453 #> 2906 77.3 1 1870 1870-01-01 00:00:00 bone 1453 #> 2907 114.9 2 1870 1870-01-31 10:00:01 blood 1454 #> 2908 114.9 2 1870 1870-01-31 10:00:01 bone 1454 #> 2909 159.4 3 1870 1870-03-02 20:00:01 blood 1455 #> 2910 159.4 3 1870 1870-03-02 20:00:01 bone 1455 #> 2911 160.0 4 1870 1870-04-02 06:00:00 blood 1456 #> 2912 160.0 4 1870 1870-04-02 06:00:00 bone 1456 #> 2913 176.0 5 1870 1870-05-02 16:00:01 blood 1457 #> 2914 176.0 5 1870 1870-05-02 16:00:01 bone 1457 #> 2915 135.6 6 1870 1870-06-02 02:00:01 blood 1458 #> 2916 135.6 6 1870 1870-06-02 02:00:01 bone 1458 #> 2917 132.4 7 1870 1870-07-02 12:00:00 blood 1459 #> 2918 132.4 7 1870 1870-07-02 12:00:00 bone 1459 #> 2919 153.8 8 1870 1870-08-01 22:00:01 blood 1460 #> 2920 153.8 8 1870 1870-08-01 22:00:01 bone 1460 #> 2921 136.0 9 1870 1870-09-01 08:00:01 blood 1461 #> 2922 136.0 9 1870 1870-09-01 08:00:01 bone 1461 #> 2923 146.4 10 1870 1870-10-01 18:00:00 blood 1462 #> 2924 146.4 10 1870 1870-10-01 18:00:00 bone 1462 #> 2925 147.5 11 1870 1870-11-01 04:00:01 blood 1463 #> 2926 147.5 11 1870 1870-11-01 04:00:01 bone 1463 #> 2927 130.0 12 1870 1870-12-01 14:00:01 blood 1464 #> 2928 130.0 12 1870 1870-12-01 14:00:01 bone 1464 #> 2929 88.3 1 1871 1871-01-01 00:00:00 blood 1465 #> 2930 88.3 1 1871 1871-01-01 00:00:00 bone 1465 #> 2931 125.3 2 1871 1871-01-31 10:00:01 blood 1466 #> 2932 125.3 2 1871 1871-01-31 10:00:01 bone 1466 #> 2933 143.2 3 1871 1871-03-02 20:00:01 blood 1467 #> 2934 143.2 3 1871 1871-03-02 20:00:01 bone 1467 #> 2935 162.4 4 1871 1871-04-02 06:00:00 blood 1468 #> 2936 162.4 4 1871 1871-04-02 06:00:00 bone 1468 #> 2937 145.5 5 1871 1871-05-02 16:00:01 blood 1469 #> 2938 145.5 5 1871 1871-05-02 16:00:01 bone 1469 #> 2939 91.7 6 1871 1871-06-02 02:00:01 blood 1470 #> 2940 91.7 6 1871 1871-06-02 02:00:01 bone 1470 #> 2941 103.0 7 1871 1871-07-02 12:00:00 blood 1471 #> 2942 103.0 7 1871 1871-07-02 12:00:00 bone 1471 #> 2943 110.0 8 1871 1871-08-01 22:00:01 blood 1472 #> 2944 110.0 8 1871 1871-08-01 22:00:01 bone 1472 #> 2945 80.3 9 1871 1871-09-01 08:00:01 blood 1473 #> 2946 80.3 9 1871 1871-09-01 08:00:01 bone 1473 #> 2947 89.0 10 1871 1871-10-01 18:00:00 blood 1474 #> 2948 89.0 10 1871 1871-10-01 18:00:00 bone 1474 #> 2949 105.4 11 1871 1871-11-01 04:00:01 blood 1475 #> 2950 105.4 11 1871 1871-11-01 04:00:01 bone 1475 #> 2951 90.3 12 1871 1871-12-01 14:00:01 blood 1476 #> 2952 90.3 12 1871 1871-12-01 14:00:01 bone 1476 #> 2953 79.5 1 1872 1872-01-01 00:00:00 blood 1477 #> 2954 79.5 1 1872 1872-01-01 00:00:00 bone 1477 #> 2955 120.1 2 1872 1872-01-31 12:00:01 blood 1478 #> 2956 120.1 2 1872 1872-01-31 12:00:01 bone 1478 #> 2957 88.4 3 1872 1872-03-02 00:00:01 blood 1479 #> 2958 88.4 3 1872 1872-03-02 00:00:01 bone 1479 #> 2959 102.1 4 1872 1872-04-01 12:00:00 blood 1480 #> 2960 102.1 4 1872 1872-04-01 12:00:00 bone 1480 #> 2961 107.6 5 1872 1872-05-02 00:00:01 blood 1481 #> 2962 107.6 5 1872 1872-05-02 00:00:01 bone 1481 #> 2963 109.9 6 1872 1872-06-01 12:00:01 blood 1482 #> 2964 109.9 6 1872 1872-06-01 12:00:01 bone 1482 #> 2965 105.5 7 1872 1872-07-02 00:00:00 blood 1483 #> 2966 105.5 7 1872 1872-07-02 00:00:00 bone 1483 #> 2967 92.9 8 1872 1872-08-01 12:00:01 blood 1484 #> 2968 92.9 8 1872 1872-08-01 12:00:01 bone 1484 #> 2969 114.6 9 1872 1872-09-01 00:00:01 blood 1485 #> 2970 114.6 9 1872 1872-09-01 00:00:01 bone 1485 #> 2971 103.5 10 1872 1872-10-01 12:00:00 blood 1486 #> 2972 103.5 10 1872 1872-10-01 12:00:00 bone 1486 #> 2973 112.0 11 1872 1872-11-01 00:00:01 blood 1487 #> 2974 112.0 11 1872 1872-11-01 00:00:01 bone 1487 #> 2975 83.9 12 1872 1872-12-01 12:00:01 blood 1488 #> 2976 83.9 12 1872 1872-12-01 12:00:01 bone 1488 #> 2977 86.7 1 1873 1873-01-01 00:00:00 blood 1489 #> 2978 86.7 1 1873 1873-01-01 00:00:00 bone 1489 #> 2979 107.0 2 1873 1873-01-31 10:00:01 blood 1490 #> 2980 107.0 2 1873 1873-01-31 10:00:01 bone 1490 #> 2981 98.3 3 1873 1873-03-02 20:00:01 blood 1491 #> 2982 98.3 3 1873 1873-03-02 20:00:01 bone 1491 #> 2983 76.2 4 1873 1873-04-02 06:00:00 blood 1492 #> 2984 76.2 4 1873 1873-04-02 06:00:00 bone 1492 #> 2985 47.9 5 1873 1873-05-02 16:00:01 blood 1493 #> 2986 47.9 5 1873 1873-05-02 16:00:01 bone 1493 #> 2987 44.8 6 1873 1873-06-02 02:00:01 blood 1494 #> 2988 44.8 6 1873 1873-06-02 02:00:01 bone 1494 #> 2989 66.9 7 1873 1873-07-02 12:00:00 blood 1495 #> 2990 66.9 7 1873 1873-07-02 12:00:00 bone 1495 #> 2991 68.2 8 1873 1873-08-01 22:00:01 blood 1496 #> 2992 68.2 8 1873 1873-08-01 22:00:01 bone 1496 #> 2993 47.5 9 1873 1873-09-01 08:00:01 blood 1497 #> 2994 47.5 9 1873 1873-09-01 08:00:01 bone 1497 #> 2995 47.4 10 1873 1873-10-01 18:00:00 blood 1498 #> 2996 47.4 10 1873 1873-10-01 18:00:00 bone 1498 #> 2997 55.4 11 1873 1873-11-01 04:00:01 blood 1499 #> 2998 55.4 11 1873 1873-11-01 04:00:01 bone 1499 #> 2999 49.2 12 1873 1873-12-01 14:00:01 blood 1500 #> 3000 49.2 12 1873 1873-12-01 14:00:01 bone 1500 #> 3001 60.8 1 1874 1874-01-01 00:00:00 blood 1501 #> 3002 60.8 1 1874 1874-01-01 00:00:00 bone 1501 #> 3003 64.2 2 1874 1874-01-31 10:00:01 blood 1502 #> 3004 64.2 2 1874 1874-01-31 10:00:01 bone 1502 #> 3005 46.4 3 1874 1874-03-02 20:00:01 blood 1503 #> 3006 46.4 3 1874 1874-03-02 20:00:01 bone 1503 #> 3007 32.0 4 1874 1874-04-02 06:00:00 blood 1504 #> 3008 32.0 4 1874 1874-04-02 06:00:00 bone 1504 #> 3009 44.6 5 1874 1874-05-02 16:00:01 blood 1505 #> 3010 44.6 5 1874 1874-05-02 16:00:01 bone 1505 #> 3011 38.2 6 1874 1874-06-02 02:00:01 blood 1506 #> 3012 38.2 6 1874 1874-06-02 02:00:01 bone 1506 #> 3013 67.8 7 1874 1874-07-02 12:00:00 blood 1507 #> 3014 67.8 7 1874 1874-07-02 12:00:00 bone 1507 #> 3015 61.3 8 1874 1874-08-01 22:00:01 blood 1508 #> 3016 61.3 8 1874 1874-08-01 22:00:01 bone 1508 #> 3017 28.0 9 1874 1874-09-01 08:00:01 blood 1509 #> 3018 28.0 9 1874 1874-09-01 08:00:01 bone 1509 #> 3019 34.3 10 1874 1874-10-01 18:00:00 blood 1510 #> 3020 34.3 10 1874 1874-10-01 18:00:00 bone 1510 #> 3021 28.9 11 1874 1874-11-01 04:00:01 blood 1511 #> 3022 28.9 11 1874 1874-11-01 04:00:01 bone 1511 #> 3023 29.3 12 1874 1874-12-01 14:00:01 blood 1512 #> 3024 29.3 12 1874 1874-12-01 14:00:01 bone 1512 #> 3025 14.6 1 1875 1875-01-01 00:00:00 blood 1513 #> 3026 14.6 1 1875 1875-01-01 00:00:00 bone 1513 #> 3027 22.2 2 1875 1875-01-31 10:00:01 blood 1514 #> 3028 22.2 2 1875 1875-01-31 10:00:01 bone 1514 #> 3029 33.8 3 1875 1875-03-02 20:00:01 blood 1515 #> 3030 33.8 3 1875 1875-03-02 20:00:01 bone 1515 #> 3031 29.1 4 1875 1875-04-02 06:00:00 blood 1516 #> 3032 29.1 4 1875 1875-04-02 06:00:00 bone 1516 #> 3033 11.5 5 1875 1875-05-02 16:00:01 blood 1517 #> 3034 11.5 5 1875 1875-05-02 16:00:01 bone 1517 #> 3035 23.9 6 1875 1875-06-02 02:00:01 blood 1518 #> 3036 23.9 6 1875 1875-06-02 02:00:01 bone 1518 #> 3037 12.5 7 1875 1875-07-02 12:00:00 blood 1519 #> 3038 12.5 7 1875 1875-07-02 12:00:00 bone 1519 #> 3039 14.6 8 1875 1875-08-01 22:00:01 blood 1520 #> 3040 14.6 8 1875 1875-08-01 22:00:01 bone 1520 #> 3041 2.4 9 1875 1875-09-01 08:00:01 blood 1521 #> 3042 2.4 9 1875 1875-09-01 08:00:01 bone 1521 #> 3043 12.7 10 1875 1875-10-01 18:00:00 blood 1522 #> 3044 12.7 10 1875 1875-10-01 18:00:00 bone 1522 #> 3045 17.7 11 1875 1875-11-01 04:00:01 blood 1523 #> 3046 17.7 11 1875 1875-11-01 04:00:01 bone 1523 #> 3047 9.9 12 1875 1875-12-01 14:00:01 blood 1524 #> 3048 9.9 12 1875 1875-12-01 14:00:01 bone 1524 #> 3049 14.3 1 1876 1876-01-01 00:00:00 blood 1525 #> 3050 14.3 1 1876 1876-01-01 00:00:00 bone 1525 #> 3051 15.0 2 1876 1876-01-31 12:00:01 blood 1526 #> 3052 15.0 2 1876 1876-01-31 12:00:01 bone 1526 #> 3053 31.2 3 1876 1876-03-02 00:00:01 blood 1527 #> 3054 31.2 3 1876 1876-03-02 00:00:01 bone 1527 #> 3055 2.3 4 1876 1876-04-01 12:00:00 blood 1528 #> 3056 2.3 4 1876 1876-04-01 12:00:00 bone 1528 #> 3057 5.1 5 1876 1876-05-02 00:00:01 blood 1529 #> 3058 5.1 5 1876 1876-05-02 00:00:01 bone 1529 #> 3059 1.6 6 1876 1876-06-01 12:00:01 blood 1530 #> 3060 1.6 6 1876 1876-06-01 12:00:01 bone 1530 #> 3061 15.2 7 1876 1876-07-02 00:00:00 blood 1531 #> 3062 15.2 7 1876 1876-07-02 00:00:00 bone 1531 #> 3063 8.8 8 1876 1876-08-01 12:00:01 blood 1532 #> 3064 8.8 8 1876 1876-08-01 12:00:01 bone 1532 #> 3065 9.9 9 1876 1876-09-01 00:00:01 blood 1533 #> 3066 9.9 9 1876 1876-09-01 00:00:01 bone 1533 #> 3067 14.3 10 1876 1876-10-01 12:00:00 blood 1534 #> 3068 14.3 10 1876 1876-10-01 12:00:00 bone 1534 #> 3069 9.9 11 1876 1876-11-01 00:00:01 blood 1535 #> 3070 9.9 11 1876 1876-11-01 00:00:01 bone 1535 #> 3071 8.2 12 1876 1876-12-01 12:00:01 blood 1536 #> 3072 8.2 12 1876 1876-12-01 12:00:01 bone 1536 #> 3073 24.4 1 1877 1877-01-01 00:00:00 blood 1537 #> 3074 24.4 1 1877 1877-01-01 00:00:00 bone 1537 #> 3075 8.7 2 1877 1877-01-31 10:00:01 blood 1538 #> 3076 8.7 2 1877 1877-01-31 10:00:01 bone 1538 #> 3077 11.7 3 1877 1877-03-02 20:00:01 blood 1539 #> 3078 11.7 3 1877 1877-03-02 20:00:01 bone 1539 #> 3079 15.8 4 1877 1877-04-02 06:00:00 blood 1540 #> 3080 15.8 4 1877 1877-04-02 06:00:00 bone 1540 #> 3081 21.2 5 1877 1877-05-02 16:00:01 blood 1541 #> 3082 21.2 5 1877 1877-05-02 16:00:01 bone 1541 #> 3083 13.4 6 1877 1877-06-02 02:00:01 blood 1542 #> 3084 13.4 6 1877 1877-06-02 02:00:01 bone 1542 #> 3085 5.9 7 1877 1877-07-02 12:00:00 blood 1543 #> 3086 5.9 7 1877 1877-07-02 12:00:00 bone 1543 #> 3087 6.3 8 1877 1877-08-01 22:00:01 blood 1544 #> 3088 6.3 8 1877 1877-08-01 22:00:01 bone 1544 #> 3089 16.4 9 1877 1877-09-01 08:00:01 blood 1545 #> 3090 16.4 9 1877 1877-09-01 08:00:01 bone 1545 #> 3091 6.7 10 1877 1877-10-01 18:00:00 blood 1546 #> 3092 6.7 10 1877 1877-10-01 18:00:00 bone 1546 #> 3093 14.5 11 1877 1877-11-01 04:00:01 blood 1547 #> 3094 14.5 11 1877 1877-11-01 04:00:01 bone 1547 #> 3095 2.3 12 1877 1877-12-01 14:00:01 blood 1548 #> 3096 2.3 12 1877 1877-12-01 14:00:01 bone 1548 #> 3097 3.3 1 1878 1878-01-01 00:00:00 blood 1549 #> 3098 3.3 1 1878 1878-01-01 00:00:00 bone 1549 #> 3099 6.0 2 1878 1878-01-31 10:00:01 blood 1550 #> 3100 6.0 2 1878 1878-01-31 10:00:01 bone 1550 #> 3101 7.8 3 1878 1878-03-02 20:00:01 blood 1551 #> 3102 7.8 3 1878 1878-03-02 20:00:01 bone 1551 #> 3103 0.1 4 1878 1878-04-02 06:00:00 blood 1552 #> 3104 0.1 4 1878 1878-04-02 06:00:00 bone 1552 #> 3105 5.8 5 1878 1878-05-02 16:00:01 blood 1553 #> 3106 5.8 5 1878 1878-05-02 16:00:01 bone 1553 #> 3107 6.4 6 1878 1878-06-02 02:00:01 blood 1554 #> 3108 6.4 6 1878 1878-06-02 02:00:01 bone 1554 #> 3109 0.1 7 1878 1878-07-02 12:00:00 blood 1555 #> 3110 0.1 7 1878 1878-07-02 12:00:00 bone 1555 #> 3111 0.0 8 1878 1878-08-01 22:00:01 blood 1556 #> 3112 0.0 8 1878 1878-08-01 22:00:01 bone 1556 #> 3113 5.3 9 1878 1878-09-01 08:00:01 blood 1557 #> 3114 5.3 9 1878 1878-09-01 08:00:01 bone 1557 #> 3115 1.1 10 1878 1878-10-01 18:00:00 blood 1558 #> 3116 1.1 10 1878 1878-10-01 18:00:00 bone 1558 #> 3117 4.1 11 1878 1878-11-01 04:00:01 blood 1559 #> 3118 4.1 11 1878 1878-11-01 04:00:01 bone 1559 #> 3119 0.5 12 1878 1878-12-01 14:00:01 blood 1560 #> 3120 0.5 12 1878 1878-12-01 14:00:01 bone 1560 #> 3121 0.8 1 1879 1879-01-01 00:00:00 blood 1561 #> 3122 0.8 1 1879 1879-01-01 00:00:00 bone 1561 #> 3123 0.6 2 1879 1879-01-31 10:00:01 blood 1562 #> 3124 0.6 2 1879 1879-01-31 10:00:01 bone 1562 #> 3125 0.0 3 1879 1879-03-02 20:00:01 blood 1563 #> 3126 0.0 3 1879 1879-03-02 20:00:01 bone 1563 #> 3127 6.2 4 1879 1879-04-02 06:00:00 blood 1564 #> 3128 6.2 4 1879 1879-04-02 06:00:00 bone 1564 #> 3129 2.4 5 1879 1879-05-02 16:00:01 blood 1565 #> 3130 2.4 5 1879 1879-05-02 16:00:01 bone 1565 #> 3131 4.8 6 1879 1879-06-02 02:00:01 blood 1566 #> 3132 4.8 6 1879 1879-06-02 02:00:01 bone 1566 #> 3133 7.5 7 1879 1879-07-02 12:00:00 blood 1567 #> 3134 7.5 7 1879 1879-07-02 12:00:00 bone 1567 #> 3135 10.7 8 1879 1879-08-01 22:00:01 blood 1568 #> 3136 10.7 8 1879 1879-08-01 22:00:01 bone 1568 #> 3137 6.1 9 1879 1879-09-01 08:00:01 blood 1569 #> 3138 6.1 9 1879 1879-09-01 08:00:01 bone 1569 #> 3139 12.3 10 1879 1879-10-01 18:00:00 blood 1570 #> 3140 12.3 10 1879 1879-10-01 18:00:00 bone 1570 #> 3141 12.9 11 1879 1879-11-01 04:00:01 blood 1571 #> 3142 12.9 11 1879 1879-11-01 04:00:01 bone 1571 #> 3143 7.2 12 1879 1879-12-01 14:00:01 blood 1572 #> 3144 7.2 12 1879 1879-12-01 14:00:01 bone 1572 #> 3145 24.0 1 1880 1880-01-01 00:00:00 blood 1573 #> 3146 24.0 1 1880 1880-01-01 00:00:00 bone 1573 #> 3147 27.5 2 1880 1880-01-31 12:00:01 blood 1574 #> 3148 27.5 2 1880 1880-01-31 12:00:01 bone 1574 #> 3149 19.5 3 1880 1880-03-02 00:00:01 blood 1575 #> 3150 19.5 3 1880 1880-03-02 00:00:01 bone 1575 #> 3151 19.3 4 1880 1880-04-01 12:00:00 blood 1576 #> 3152 19.3 4 1880 1880-04-01 12:00:00 bone 1576 #> 3153 23.5 5 1880 1880-05-02 00:00:01 blood 1577 #> 3154 23.5 5 1880 1880-05-02 00:00:01 bone 1577 #> 3155 34.1 6 1880 1880-06-01 12:00:01 blood 1578 #> 3156 34.1 6 1880 1880-06-01 12:00:01 bone 1578 #> 3157 21.9 7 1880 1880-07-02 00:00:00 blood 1579 #> 3158 21.9 7 1880 1880-07-02 00:00:00 bone 1579 #> 3159 48.1 8 1880 1880-08-01 12:00:01 blood 1580 #> 3160 48.1 8 1880 1880-08-01 12:00:01 bone 1580 #> 3161 66.0 9 1880 1880-09-01 00:00:01 blood 1581 #> 3162 66.0 9 1880 1880-09-01 00:00:01 bone 1581 #> 3163 43.0 10 1880 1880-10-01 12:00:00 blood 1582 #> 3164 43.0 10 1880 1880-10-01 12:00:00 bone 1582 #> 3165 30.7 11 1880 1880-11-01 00:00:01 blood 1583 #> 3166 30.7 11 1880 1880-11-01 00:00:01 bone 1583 #> 3167 29.6 12 1880 1880-12-01 12:00:01 blood 1584 #> 3168 29.6 12 1880 1880-12-01 12:00:01 bone 1584 #> 3169 36.4 1 1881 1881-01-01 00:00:00 blood 1585 #> 3170 36.4 1 1881 1881-01-01 00:00:00 bone 1585 #> 3171 53.2 2 1881 1881-01-31 10:00:01 blood 1586 #> 3172 53.2 2 1881 1881-01-31 10:00:01 bone 1586 #> 3173 51.5 3 1881 1881-03-02 20:00:01 blood 1587 #> 3174 51.5 3 1881 1881-03-02 20:00:01 bone 1587 #> 3175 51.7 4 1881 1881-04-02 06:00:00 blood 1588 #> 3176 51.7 4 1881 1881-04-02 06:00:00 bone 1588 #> 3177 43.5 5 1881 1881-05-02 16:00:01 blood 1589 #> 3178 43.5 5 1881 1881-05-02 16:00:01 bone 1589 #> 3179 60.5 6 1881 1881-06-02 02:00:01 blood 1590 #> 3180 60.5 6 1881 1881-06-02 02:00:01 bone 1590 #> 3181 76.9 7 1881 1881-07-02 12:00:00 blood 1591 #> 3182 76.9 7 1881 1881-07-02 12:00:00 bone 1591 #> 3183 58.0 8 1881 1881-08-01 22:00:01 blood 1592 #> 3184 58.0 8 1881 1881-08-01 22:00:01 bone 1592 #> 3185 53.2 9 1881 1881-09-01 08:00:01 blood 1593 #> 3186 53.2 9 1881 1881-09-01 08:00:01 bone 1593 #> 3187 64.0 10 1881 1881-10-01 18:00:00 blood 1594 #> 3188 64.0 10 1881 1881-10-01 18:00:00 bone 1594 #> 3189 54.8 11 1881 1881-11-01 04:00:01 blood 1595 #> 3190 54.8 11 1881 1881-11-01 04:00:01 bone 1595 #> 3191 47.3 12 1881 1881-12-01 14:00:01 blood 1596 #> 3192 47.3 12 1881 1881-12-01 14:00:01 bone 1596 #> 3193 45.0 1 1882 1882-01-01 00:00:00 blood 1597 #> 3194 45.0 1 1882 1882-01-01 00:00:00 bone 1597 #> 3195 69.3 2 1882 1882-01-31 10:00:01 blood 1598 #> 3196 69.3 2 1882 1882-01-31 10:00:01 bone 1598 #> 3197 67.5 3 1882 1882-03-02 20:00:01 blood 1599 #> 3198 67.5 3 1882 1882-03-02 20:00:01 bone 1599 #> 3199 95.8 4 1882 1882-04-02 06:00:00 blood 1600 #> 3200 95.8 4 1882 1882-04-02 06:00:00 bone 1600 #> 3201 64.1 5 1882 1882-05-02 16:00:01 blood 1601 #> 3202 64.1 5 1882 1882-05-02 16:00:01 bone 1601 #> 3203 45.2 6 1882 1882-06-02 02:00:01 blood 1602 #> 3204 45.2 6 1882 1882-06-02 02:00:01 bone 1602 #> 3205 45.4 7 1882 1882-07-02 12:00:00 blood 1603 #> 3206 45.4 7 1882 1882-07-02 12:00:00 bone 1603 #> 3207 40.4 8 1882 1882-08-01 22:00:01 blood 1604 #> 3208 40.4 8 1882 1882-08-01 22:00:01 bone 1604 #> 3209 57.7 9 1882 1882-09-01 08:00:01 blood 1605 #> 3210 57.7 9 1882 1882-09-01 08:00:01 bone 1605 #> 3211 59.2 10 1882 1882-10-01 18:00:00 blood 1606 #> 3212 59.2 10 1882 1882-10-01 18:00:00 bone 1606 #> 3213 84.4 11 1882 1882-11-01 04:00:01 blood 1607 #> 3214 84.4 11 1882 1882-11-01 04:00:01 bone 1607 #> 3215 41.8 12 1882 1882-12-01 14:00:01 blood 1608 #> 3216 41.8 12 1882 1882-12-01 14:00:01 bone 1608 #> 3217 60.6 1 1883 1883-01-01 00:00:00 blood 1609 #> 3218 60.6 1 1883 1883-01-01 00:00:00 bone 1609 #> 3219 46.9 2 1883 1883-01-31 10:00:01 blood 1610 #> 3220 46.9 2 1883 1883-01-31 10:00:01 bone 1610 #> 3221 42.8 3 1883 1883-03-02 20:00:01 blood 1611 #> 3222 42.8 3 1883 1883-03-02 20:00:01 bone 1611 #> 3223 82.1 4 1883 1883-04-02 06:00:00 blood 1612 #> 3224 82.1 4 1883 1883-04-02 06:00:00 bone 1612 #> 3225 32.1 5 1883 1883-05-02 16:00:01 blood 1613 #> 3226 32.1 5 1883 1883-05-02 16:00:01 bone 1613 #> 3227 76.5 6 1883 1883-06-02 02:00:01 blood 1614 #> 3228 76.5 6 1883 1883-06-02 02:00:01 bone 1614 #> 3229 80.6 7 1883 1883-07-02 12:00:00 blood 1615 #> 3230 80.6 7 1883 1883-07-02 12:00:00 bone 1615 #> 3231 46.0 8 1883 1883-08-01 22:00:01 blood 1616 #> 3232 46.0 8 1883 1883-08-01 22:00:01 bone 1616 #> 3233 52.6 9 1883 1883-09-01 08:00:01 blood 1617 #> 3234 52.6 9 1883 1883-09-01 08:00:01 bone 1617 #> 3235 83.8 10 1883 1883-10-01 18:00:00 blood 1618 #> 3236 83.8 10 1883 1883-10-01 18:00:00 bone 1618 #> 3237 84.5 11 1883 1883-11-01 04:00:01 blood 1619 #> 3238 84.5 11 1883 1883-11-01 04:00:01 bone 1619 #> 3239 75.9 12 1883 1883-12-01 14:00:01 blood 1620 #> 3240 75.9 12 1883 1883-12-01 14:00:01 bone 1620 #> 3241 91.5 1 1884 1884-01-01 00:00:00 blood 1621 #> 3242 91.5 1 1884 1884-01-01 00:00:00 bone 1621 #> 3243 86.9 2 1884 1884-01-31 12:00:01 blood 1622 #> 3244 86.9 2 1884 1884-01-31 12:00:01 bone 1622 #> 3245 86.8 3 1884 1884-03-02 00:00:01 blood 1623 #> 3246 86.8 3 1884 1884-03-02 00:00:01 bone 1623 #> 3247 76.1 4 1884 1884-04-01 12:00:00 blood 1624 #> 3248 76.1 4 1884 1884-04-01 12:00:00 bone 1624 #> 3249 66.5 5 1884 1884-05-02 00:00:01 blood 1625 #> 3250 66.5 5 1884 1884-05-02 00:00:01 bone 1625 #> 3251 51.2 6 1884 1884-06-01 12:00:01 blood 1626 #> 3252 51.2 6 1884 1884-06-01 12:00:01 bone 1626 #> 3253 53.1 7 1884 1884-07-02 00:00:00 blood 1627 #> 3254 53.1 7 1884 1884-07-02 00:00:00 bone 1627 #> 3255 55.8 8 1884 1884-08-01 12:00:01 blood 1628 #> 3256 55.8 8 1884 1884-08-01 12:00:01 bone 1628 #> 3257 61.9 9 1884 1884-09-01 00:00:01 blood 1629 #> 3258 61.9 9 1884 1884-09-01 00:00:01 bone 1629 #> 3259 47.8 10 1884 1884-10-01 12:00:00 blood 1630 #> 3260 47.8 10 1884 1884-10-01 12:00:00 bone 1630 #> 3261 36.6 11 1884 1884-11-01 00:00:01 blood 1631 #> 3262 36.6 11 1884 1884-11-01 00:00:01 bone 1631 #> 3263 47.2 12 1884 1884-12-01 12:00:01 blood 1632 #> 3264 47.2 12 1884 1884-12-01 12:00:01 bone 1632 #> 3265 42.8 1 1885 1885-01-01 00:00:00 blood 1633 #> 3266 42.8 1 1885 1885-01-01 00:00:00 bone 1633 #> 3267 71.8 2 1885 1885-01-31 10:00:01 blood 1634 #> 3268 71.8 2 1885 1885-01-31 10:00:01 bone 1634 #> 3269 49.8 3 1885 1885-03-02 20:00:01 blood 1635 #> 3270 49.8 3 1885 1885-03-02 20:00:01 bone 1635 #> 3271 55.0 4 1885 1885-04-02 06:00:00 blood 1636 #> 3272 55.0 4 1885 1885-04-02 06:00:00 bone 1636 #> 3273 73.0 5 1885 1885-05-02 16:00:01 blood 1637 #> 3274 73.0 5 1885 1885-05-02 16:00:01 bone 1637 #> 3275 83.7 6 1885 1885-06-02 02:00:01 blood 1638 #> 3276 83.7 6 1885 1885-06-02 02:00:01 bone 1638 #> 3277 66.5 7 1885 1885-07-02 12:00:00 blood 1639 #> 3278 66.5 7 1885 1885-07-02 12:00:00 bone 1639 #> 3279 50.0 8 1885 1885-08-01 22:00:01 blood 1640 #> 3280 50.0 8 1885 1885-08-01 22:00:01 bone 1640 #> 3281 39.6 9 1885 1885-09-01 08:00:01 blood 1641 #> 3282 39.6 9 1885 1885-09-01 08:00:01 bone 1641 #> 3283 38.7 10 1885 1885-10-01 18:00:00 blood 1642 #> 3284 38.7 10 1885 1885-10-01 18:00:00 bone 1642 #> 3285 33.3 11 1885 1885-11-01 04:00:01 blood 1643 #> 3286 33.3 11 1885 1885-11-01 04:00:01 bone 1643 #> 3287 21.7 12 1885 1885-12-01 14:00:01 blood 1644 #> 3288 21.7 12 1885 1885-12-01 14:00:01 bone 1644 #> 3289 29.9 1 1886 1886-01-01 00:00:00 blood 1645 #> 3290 29.9 1 1886 1886-01-01 00:00:00 bone 1645 #> 3291 25.9 2 1886 1886-01-31 10:00:01 blood 1646 #> 3292 25.9 2 1886 1886-01-31 10:00:01 bone 1646 #> 3293 57.3 3 1886 1886-03-02 20:00:01 blood 1647 #> 3294 57.3 3 1886 1886-03-02 20:00:01 bone 1647 #> 3295 43.7 4 1886 1886-04-02 06:00:00 blood 1648 #> 3296 43.7 4 1886 1886-04-02 06:00:00 bone 1648 #> 3297 30.7 5 1886 1886-05-02 16:00:01 blood 1649 #> 3298 30.7 5 1886 1886-05-02 16:00:01 bone 1649 #> 3299 27.1 6 1886 1886-06-02 02:00:01 blood 1650 #> 3300 27.1 6 1886 1886-06-02 02:00:01 bone 1650 #> 3301 30.3 7 1886 1886-07-02 12:00:00 blood 1651 #> 3302 30.3 7 1886 1886-07-02 12:00:00 bone 1651 #> 3303 16.9 8 1886 1886-08-01 22:00:01 blood 1652 #> 3304 16.9 8 1886 1886-08-01 22:00:01 bone 1652 #> 3305 21.4 9 1886 1886-09-01 08:00:01 blood 1653 #> 3306 21.4 9 1886 1886-09-01 08:00:01 bone 1653 #> 3307 8.6 10 1886 1886-10-01 18:00:00 blood 1654 #> 3308 8.6 10 1886 1886-10-01 18:00:00 bone 1654 #> 3309 0.3 11 1886 1886-11-01 04:00:01 blood 1655 #> 3310 0.3 11 1886 1886-11-01 04:00:01 bone 1655 #> 3311 12.4 12 1886 1886-12-01 14:00:01 blood 1656 #> 3312 12.4 12 1886 1886-12-01 14:00:01 bone 1656 #> 3313 10.3 1 1887 1887-01-01 00:00:00 blood 1657 #> 3314 10.3 1 1887 1887-01-01 00:00:00 bone 1657 #> 3315 13.2 2 1887 1887-01-31 10:00:01 blood 1658 #> 3316 13.2 2 1887 1887-01-31 10:00:01 bone 1658 #> 3317 4.2 3 1887 1887-03-02 20:00:01 blood 1659 #> 3318 4.2 3 1887 1887-03-02 20:00:01 bone 1659 #> 3319 6.9 4 1887 1887-04-02 06:00:00 blood 1660 #> 3320 6.9 4 1887 1887-04-02 06:00:00 bone 1660 #> 3321 20.0 5 1887 1887-05-02 16:00:01 blood 1661 #> 3322 20.0 5 1887 1887-05-02 16:00:01 bone 1661 #> 3323 15.7 6 1887 1887-06-02 02:00:01 blood 1662 #> 3324 15.7 6 1887 1887-06-02 02:00:01 bone 1662 #> 3325 23.3 7 1887 1887-07-02 12:00:00 blood 1663 #> 3326 23.3 7 1887 1887-07-02 12:00:00 bone 1663 #> 3327 21.4 8 1887 1887-08-01 22:00:01 blood 1664 #> 3328 21.4 8 1887 1887-08-01 22:00:01 bone 1664 #> 3329 7.4 9 1887 1887-09-01 08:00:01 blood 1665 #> 3330 7.4 9 1887 1887-09-01 08:00:01 bone 1665 #> 3331 6.6 10 1887 1887-10-01 18:00:00 blood 1666 #> 3332 6.6 10 1887 1887-10-01 18:00:00 bone 1666 #> 3333 6.9 11 1887 1887-11-01 04:00:01 blood 1667 #> 3334 6.9 11 1887 1887-11-01 04:00:01 bone 1667 #> 3335 20.7 12 1887 1887-12-01 14:00:01 blood 1668 #> 3336 20.7 12 1887 1887-12-01 14:00:01 bone 1668 #> 3337 12.7 1 1888 1888-01-01 00:00:00 blood 1669 #> 3338 12.7 1 1888 1888-01-01 00:00:00 bone 1669 #> 3339 7.1 2 1888 1888-01-31 12:00:01 blood 1670 #> 3340 7.1 2 1888 1888-01-31 12:00:01 bone 1670 #> 3341 7.8 3 1888 1888-03-02 00:00:01 blood 1671 #> 3342 7.8 3 1888 1888-03-02 00:00:01 bone 1671 #> 3343 5.1 4 1888 1888-04-01 12:00:00 blood 1672 #> 3344 5.1 4 1888 1888-04-01 12:00:00 bone 1672 #> 3345 7.0 5 1888 1888-05-02 00:00:01 blood 1673 #> 3346 7.0 5 1888 1888-05-02 00:00:01 bone 1673 #> 3347 7.1 6 1888 1888-06-01 12:00:01 blood 1674 #> 3348 7.1 6 1888 1888-06-01 12:00:01 bone 1674 #> 3349 3.1 7 1888 1888-07-02 00:00:00 blood 1675 #> 3350 3.1 7 1888 1888-07-02 00:00:00 bone 1675 #> 3351 2.8 8 1888 1888-08-01 12:00:01 blood 1676 #> 3352 2.8 8 1888 1888-08-01 12:00:01 bone 1676 #> 3353 8.8 9 1888 1888-09-01 00:00:01 blood 1677 #> 3354 8.8 9 1888 1888-09-01 00:00:01 bone 1677 #> 3355 2.1 10 1888 1888-10-01 12:00:00 blood 1678 #> 3356 2.1 10 1888 1888-10-01 12:00:00 bone 1678 #> 3357 10.7 11 1888 1888-11-01 00:00:01 blood 1679 #> 3358 10.7 11 1888 1888-11-01 00:00:01 bone 1679 #> 3359 6.7 12 1888 1888-12-01 12:00:01 blood 1680 #> 3360 6.7 12 1888 1888-12-01 12:00:01 bone 1680 #> 3361 0.8 1 1889 1889-01-01 00:00:00 blood 1681 #> 3362 0.8 1 1889 1889-01-01 00:00:00 bone 1681 #> 3363 8.5 2 1889 1889-01-31 10:00:01 blood 1682 #> 3364 8.5 2 1889 1889-01-31 10:00:01 bone 1682 #> 3365 7.0 3 1889 1889-03-02 20:00:01 blood 1683 #> 3366 7.0 3 1889 1889-03-02 20:00:01 bone 1683 #> 3367 4.3 4 1889 1889-04-02 06:00:00 blood 1684 #> 3368 4.3 4 1889 1889-04-02 06:00:00 bone 1684 #> 3369 2.4 5 1889 1889-05-02 16:00:01 blood 1685 #> 3370 2.4 5 1889 1889-05-02 16:00:01 bone 1685 #> 3371 6.4 6 1889 1889-06-02 02:00:01 blood 1686 #> 3372 6.4 6 1889 1889-06-02 02:00:01 bone 1686 #> 3373 9.7 7 1889 1889-07-02 12:00:00 blood 1687 #> 3374 9.7 7 1889 1889-07-02 12:00:00 bone 1687 #> 3375 20.6 8 1889 1889-08-01 22:00:01 blood 1688 #> 3376 20.6 8 1889 1889-08-01 22:00:01 bone 1688 #> 3377 6.5 9 1889 1889-09-01 08:00:01 blood 1689 #> 3378 6.5 9 1889 1889-09-01 08:00:01 bone 1689 #> 3379 2.1 10 1889 1889-10-01 18:00:00 blood 1690 #> 3380 2.1 10 1889 1889-10-01 18:00:00 bone 1690 #> 3381 0.2 11 1889 1889-11-01 04:00:01 blood 1691 #> 3382 0.2 11 1889 1889-11-01 04:00:01 bone 1691 #> 3383 6.7 12 1889 1889-12-01 14:00:01 blood 1692 #> 3384 6.7 12 1889 1889-12-01 14:00:01 bone 1692 #> 3385 5.3 1 1890 1890-01-01 00:00:00 blood 1693 #> 3386 5.3 1 1890 1890-01-01 00:00:00 bone 1693 #> 3387 0.6 2 1890 1890-01-31 10:00:01 blood 1694 #> 3388 0.6 2 1890 1890-01-31 10:00:01 bone 1694 #> 3389 5.1 3 1890 1890-03-02 20:00:01 blood 1695 #> 3390 5.1 3 1890 1890-03-02 20:00:01 bone 1695 #> 3391 1.6 4 1890 1890-04-02 06:00:00 blood 1696 #> 3392 1.6 4 1890 1890-04-02 06:00:00 bone 1696 #> 3393 4.8 5 1890 1890-05-02 16:00:01 blood 1697 #> 3394 4.8 5 1890 1890-05-02 16:00:01 bone 1697 #> 3395 1.3 6 1890 1890-06-02 02:00:01 blood 1698 #> 3396 1.3 6 1890 1890-06-02 02:00:01 bone 1698 #> 3397 11.6 7 1890 1890-07-02 12:00:00 blood 1699 #> 3398 11.6 7 1890 1890-07-02 12:00:00 bone 1699 #> 3399 8.5 8 1890 1890-08-01 22:00:01 blood 1700 #> 3400 8.5 8 1890 1890-08-01 22:00:01 bone 1700 #> 3401 17.2 9 1890 1890-09-01 08:00:01 blood 1701 #> 3402 17.2 9 1890 1890-09-01 08:00:01 bone 1701 #> 3403 11.2 10 1890 1890-10-01 18:00:00 blood 1702 #> 3404 11.2 10 1890 1890-10-01 18:00:00 bone 1702 #> 3405 9.6 11 1890 1890-11-01 04:00:01 blood 1703 #> 3406 9.6 11 1890 1890-11-01 04:00:01 bone 1703 #> 3407 7.8 12 1890 1890-12-01 14:00:01 blood 1704 #> 3408 7.8 12 1890 1890-12-01 14:00:01 bone 1704 #> 3409 13.5 1 1891 1891-01-01 00:00:00 blood 1705 #> 3410 13.5 1 1891 1891-01-01 00:00:00 bone 1705 #> 3411 22.2 2 1891 1891-01-31 10:00:01 blood 1706 #> 3412 22.2 2 1891 1891-01-31 10:00:01 bone 1706 #> 3413 10.4 3 1891 1891-03-02 20:00:01 blood 1707 #> 3414 10.4 3 1891 1891-03-02 20:00:01 bone 1707 #> 3415 20.5 4 1891 1891-04-02 06:00:00 blood 1708 #> 3416 20.5 4 1891 1891-04-02 06:00:00 bone 1708 #> 3417 41.1 5 1891 1891-05-02 16:00:01 blood 1709 #> 3418 41.1 5 1891 1891-05-02 16:00:01 bone 1709 #> 3419 48.3 6 1891 1891-06-02 02:00:01 blood 1710 #> 3420 48.3 6 1891 1891-06-02 02:00:01 bone 1710 #> 3421 58.8 7 1891 1891-07-02 12:00:00 blood 1711 #> 3422 58.8 7 1891 1891-07-02 12:00:00 bone 1711 #> 3423 33.2 8 1891 1891-08-01 22:00:01 blood 1712 #> 3424 33.2 8 1891 1891-08-01 22:00:01 bone 1712 #> 3425 53.8 9 1891 1891-09-01 08:00:01 blood 1713 #> 3426 53.8 9 1891 1891-09-01 08:00:01 bone 1713 #> 3427 51.5 10 1891 1891-10-01 18:00:00 blood 1714 #> 3428 51.5 10 1891 1891-10-01 18:00:00 bone 1714 #> 3429 41.9 11 1891 1891-11-01 04:00:01 blood 1715 #> 3430 41.9 11 1891 1891-11-01 04:00:01 bone 1715 #> 3431 32.3 12 1891 1891-12-01 14:00:01 blood 1716 #> 3432 32.3 12 1891 1891-12-01 14:00:01 bone 1716 #> 3433 69.1 1 1892 1892-01-01 00:00:00 blood 1717 #> 3434 69.1 1 1892 1892-01-01 00:00:00 bone 1717 #> 3435 75.6 2 1892 1892-01-31 12:00:01 blood 1718 #> 3436 75.6 2 1892 1892-01-31 12:00:01 bone 1718 #> 3437 49.9 3 1892 1892-03-02 00:00:01 blood 1719 #> 3438 49.9 3 1892 1892-03-02 00:00:01 bone 1719 #> 3439 69.6 4 1892 1892-04-01 12:00:00 blood 1720 #> 3440 69.6 4 1892 1892-04-01 12:00:00 bone 1720 #> 3441 79.6 5 1892 1892-05-02 00:00:01 blood 1721 #> 3442 79.6 5 1892 1892-05-02 00:00:01 bone 1721 #> 3443 76.3 6 1892 1892-06-01 12:00:01 blood 1722 #> 3444 76.3 6 1892 1892-06-01 12:00:01 bone 1722 #> 3445 76.8 7 1892 1892-07-02 00:00:00 blood 1723 #> 3446 76.8 7 1892 1892-07-02 00:00:00 bone 1723 #> 3447 101.4 8 1892 1892-08-01 12:00:01 blood 1724 #> 3448 101.4 8 1892 1892-08-01 12:00:01 bone 1724 #> 3449 62.8 9 1892 1892-09-01 00:00:01 blood 1725 #> 3450 62.8 9 1892 1892-09-01 00:00:01 bone 1725 #> 3451 70.5 10 1892 1892-10-01 12:00:00 blood 1726 #> 3452 70.5 10 1892 1892-10-01 12:00:00 bone 1726 #> 3453 65.4 11 1892 1892-11-01 00:00:01 blood 1727 #> 3454 65.4 11 1892 1892-11-01 00:00:01 bone 1727 #> 3455 78.6 12 1892 1892-12-01 12:00:01 blood 1728 #> 3456 78.6 12 1892 1892-12-01 12:00:01 bone 1728 #> 3457 75.0 1 1893 1893-01-01 00:00:00 blood 1729 #> 3458 75.0 1 1893 1893-01-01 00:00:00 bone 1729 #> 3459 73.0 2 1893 1893-01-31 10:00:01 blood 1730 #> 3460 73.0 2 1893 1893-01-31 10:00:01 bone 1730 #> 3461 65.7 3 1893 1893-03-02 20:00:01 blood 1731 #> 3462 65.7 3 1893 1893-03-02 20:00:01 bone 1731 #> 3463 88.1 4 1893 1893-04-02 06:00:00 blood 1732 #> 3464 88.1 4 1893 1893-04-02 06:00:00 bone 1732 #> 3465 84.7 5 1893 1893-05-02 16:00:01 blood 1733 #> 3466 84.7 5 1893 1893-05-02 16:00:01 bone 1733 #> 3467 88.2 6 1893 1893-06-02 02:00:01 blood 1734 #> 3468 88.2 6 1893 1893-06-02 02:00:01 bone 1734 #> 3469 88.8 7 1893 1893-07-02 12:00:00 blood 1735 #> 3470 88.8 7 1893 1893-07-02 12:00:00 bone 1735 #> 3471 129.2 8 1893 1893-08-01 22:00:01 blood 1736 #> 3472 129.2 8 1893 1893-08-01 22:00:01 bone 1736 #> 3473 77.9 9 1893 1893-09-01 08:00:01 blood 1737 #> 3474 77.9 9 1893 1893-09-01 08:00:01 bone 1737 #> 3475 79.7 10 1893 1893-10-01 18:00:00 blood 1738 #> 3476 79.7 10 1893 1893-10-01 18:00:00 bone 1738 #> 3477 75.1 11 1893 1893-11-01 04:00:01 blood 1739 #> 3478 75.1 11 1893 1893-11-01 04:00:01 bone 1739 #> 3479 93.8 12 1893 1893-12-01 14:00:01 blood 1740 #> 3480 93.8 12 1893 1893-12-01 14:00:01 bone 1740 #> 3481 83.2 1 1894 1894-01-01 00:00:00 blood 1741 #> 3482 83.2 1 1894 1894-01-01 00:00:00 bone 1741 #> 3483 84.6 2 1894 1894-01-31 10:00:01 blood 1742 #> 3484 84.6 2 1894 1894-01-31 10:00:01 bone 1742 #> 3485 52.3 3 1894 1894-03-02 20:00:01 blood 1743 #> 3486 52.3 3 1894 1894-03-02 20:00:01 bone 1743 #> 3487 81.6 4 1894 1894-04-02 06:00:00 blood 1744 #> 3488 81.6 4 1894 1894-04-02 06:00:00 bone 1744 #> 3489 101.2 5 1894 1894-05-02 16:00:01 blood 1745 #> 3490 101.2 5 1894 1894-05-02 16:00:01 bone 1745 #> 3491 98.9 6 1894 1894-06-02 02:00:01 blood 1746 #> 3492 98.9 6 1894 1894-06-02 02:00:01 bone 1746 #> 3493 106.0 7 1894 1894-07-02 12:00:00 blood 1747 #> 3494 106.0 7 1894 1894-07-02 12:00:00 bone 1747 #> 3495 70.3 8 1894 1894-08-01 22:00:01 blood 1748 #> 3496 70.3 8 1894 1894-08-01 22:00:01 bone 1748 #> 3497 65.9 9 1894 1894-09-01 08:00:01 blood 1749 #> 3498 65.9 9 1894 1894-09-01 08:00:01 bone 1749 #> 3499 75.5 10 1894 1894-10-01 18:00:00 blood 1750 #> 3500 75.5 10 1894 1894-10-01 18:00:00 bone 1750 #> 3501 56.6 11 1894 1894-11-01 04:00:01 blood 1751 #> 3502 56.6 11 1894 1894-11-01 04:00:01 bone 1751 #> 3503 60.0 12 1894 1894-12-01 14:00:01 blood 1752 #> 3504 60.0 12 1894 1894-12-01 14:00:01 bone 1752 #> 3505 63.3 1 1895 1895-01-01 00:00:00 blood 1753 #> 3506 63.3 1 1895 1895-01-01 00:00:00 bone 1753 #> 3507 67.2 2 1895 1895-01-31 10:00:01 blood 1754 #> 3508 67.2 2 1895 1895-01-31 10:00:01 bone 1754 #> 3509 61.0 3 1895 1895-03-02 20:00:01 blood 1755 #> 3510 61.0 3 1895 1895-03-02 20:00:01 bone 1755 #> 3511 76.9 4 1895 1895-04-02 06:00:00 blood 1756 #> 3512 76.9 4 1895 1895-04-02 06:00:00 bone 1756 #> 3513 67.5 5 1895 1895-05-02 16:00:01 blood 1757 #> 3514 67.5 5 1895 1895-05-02 16:00:01 bone 1757 #> 3515 71.5 6 1895 1895-06-02 02:00:01 blood 1758 #> 3516 71.5 6 1895 1895-06-02 02:00:01 bone 1758 #> 3517 47.8 7 1895 1895-07-02 12:00:00 blood 1759 #> 3518 47.8 7 1895 1895-07-02 12:00:00 bone 1759 #> 3519 68.9 8 1895 1895-08-01 22:00:01 blood 1760 #> 3520 68.9 8 1895 1895-08-01 22:00:01 bone 1760 #> 3521 57.7 9 1895 1895-09-01 08:00:01 blood 1761 #> 3522 57.7 9 1895 1895-09-01 08:00:01 bone 1761 #> 3523 67.9 10 1895 1895-10-01 18:00:00 blood 1762 #> 3524 67.9 10 1895 1895-10-01 18:00:00 bone 1762 #> 3525 47.2 11 1895 1895-11-01 04:00:01 blood 1763 #> 3526 47.2 11 1895 1895-11-01 04:00:01 bone 1763 #> 3527 70.7 12 1895 1895-12-01 14:00:01 blood 1764 #> 3528 70.7 12 1895 1895-12-01 14:00:01 bone 1764 #> 3529 29.0 1 1896 1896-01-01 00:00:00 blood 1765 #> 3530 29.0 1 1896 1896-01-01 00:00:00 bone 1765 #> 3531 57.4 2 1896 1896-01-31 12:00:01 blood 1766 #> 3532 57.4 2 1896 1896-01-31 12:00:01 bone 1766 #> 3533 52.0 3 1896 1896-03-02 00:00:01 blood 1767 #> 3534 52.0 3 1896 1896-03-02 00:00:01 bone 1767 #> 3535 43.8 4 1896 1896-04-01 12:00:00 blood 1768 #> 3536 43.8 4 1896 1896-04-01 12:00:00 bone 1768 #> 3537 27.7 5 1896 1896-05-02 00:00:01 blood 1769 #> 3538 27.7 5 1896 1896-05-02 00:00:01 bone 1769 #> 3539 49.0 6 1896 1896-06-01 12:00:01 blood 1770 #> 3540 49.0 6 1896 1896-06-01 12:00:01 bone 1770 #> 3541 45.0 7 1896 1896-07-02 00:00:00 blood 1771 #> 3542 45.0 7 1896 1896-07-02 00:00:00 bone 1771 #> 3543 27.2 8 1896 1896-08-01 12:00:01 blood 1772 #> 3544 27.2 8 1896 1896-08-01 12:00:01 bone 1772 #> 3545 61.3 9 1896 1896-09-01 00:00:01 blood 1773 #> 3546 61.3 9 1896 1896-09-01 00:00:01 bone 1773 #> 3547 28.4 10 1896 1896-10-01 12:00:00 blood 1774 #> 3548 28.4 10 1896 1896-10-01 12:00:00 bone 1774 #> 3549 38.0 11 1896 1896-11-01 00:00:01 blood 1775 #> 3550 38.0 11 1896 1896-11-01 00:00:01 bone 1775 #> 3551 42.6 12 1896 1896-12-01 12:00:01 blood 1776 #> 3552 42.6 12 1896 1896-12-01 12:00:01 bone 1776 #> 3553 40.6 1 1897 1897-01-01 00:00:00 blood 1777 #> 3554 40.6 1 1897 1897-01-01 00:00:00 bone 1777 #> 3555 29.4 2 1897 1897-01-31 10:00:01 blood 1778 #> 3556 29.4 2 1897 1897-01-31 10:00:01 bone 1778 #> 3557 29.1 3 1897 1897-03-02 20:00:01 blood 1779 #> 3558 29.1 3 1897 1897-03-02 20:00:01 bone 1779 #> 3559 31.0 4 1897 1897-04-02 06:00:00 blood 1780 #> 3560 31.0 4 1897 1897-04-02 06:00:00 bone 1780 #> 3561 20.0 5 1897 1897-05-02 16:00:01 blood 1781 #> 3562 20.0 5 1897 1897-05-02 16:00:01 bone 1781 #> 3563 11.3 6 1897 1897-06-02 02:00:01 blood 1782 #> 3564 11.3 6 1897 1897-06-02 02:00:01 bone 1782 #> 3565 27.6 7 1897 1897-07-02 12:00:00 blood 1783 #> 3566 27.6 7 1897 1897-07-02 12:00:00 bone 1783 #> 3567 21.8 8 1897 1897-08-01 22:00:01 blood 1784 #> 3568 21.8 8 1897 1897-08-01 22:00:01 bone 1784 #> 3569 48.1 9 1897 1897-09-01 08:00:01 blood 1785 #> 3570 48.1 9 1897 1897-09-01 08:00:01 bone 1785 #> 3571 14.3 10 1897 1897-10-01 18:00:00 blood 1786 #> 3572 14.3 10 1897 1897-10-01 18:00:00 bone 1786 #> 3573 8.4 11 1897 1897-11-01 04:00:01 blood 1787 #> 3574 8.4 11 1897 1897-11-01 04:00:01 bone 1787 #> 3575 33.3 12 1897 1897-12-01 14:00:01 blood 1788 #> 3576 33.3 12 1897 1897-12-01 14:00:01 bone 1788 #> 3577 30.2 1 1898 1898-01-01 00:00:00 blood 1789 #> 3578 30.2 1 1898 1898-01-01 00:00:00 bone 1789 #> 3579 36.4 2 1898 1898-01-31 10:00:01 blood 1790 #> 3580 36.4 2 1898 1898-01-31 10:00:01 bone 1790 #> 3581 38.3 3 1898 1898-03-02 20:00:01 blood 1791 #> 3582 38.3 3 1898 1898-03-02 20:00:01 bone 1791 #> 3583 14.5 4 1898 1898-04-02 06:00:00 blood 1792 #> 3584 14.5 4 1898 1898-04-02 06:00:00 bone 1792 #> 3585 25.8 5 1898 1898-05-02 16:00:01 blood 1793 #> 3586 25.8 5 1898 1898-05-02 16:00:01 bone 1793 #> 3587 22.3 6 1898 1898-06-02 02:00:01 blood 1794 #> 3588 22.3 6 1898 1898-06-02 02:00:01 bone 1794 #> 3589 9.0 7 1898 1898-07-02 12:00:00 blood 1795 #> 3590 9.0 7 1898 1898-07-02 12:00:00 bone 1795 #> 3591 31.4 8 1898 1898-08-01 22:00:01 blood 1796 #> 3592 31.4 8 1898 1898-08-01 22:00:01 bone 1796 #> 3593 34.8 9 1898 1898-09-01 08:00:01 blood 1797 #> 3594 34.8 9 1898 1898-09-01 08:00:01 bone 1797 #> 3595 34.4 10 1898 1898-10-01 18:00:00 blood 1798 #> 3596 34.4 10 1898 1898-10-01 18:00:00 bone 1798 #> 3597 30.9 11 1898 1898-11-01 04:00:01 blood 1799 #> 3598 30.9 11 1898 1898-11-01 04:00:01 bone 1799 #> 3599 12.6 12 1898 1898-12-01 14:00:01 blood 1800 #> 3600 12.6 12 1898 1898-12-01 14:00:01 bone 1800 #> 3601 19.5 1 1899 1899-01-01 00:00:00 blood 1801 #> 3602 19.5 1 1899 1899-01-01 00:00:00 bone 1801 #> 3603 9.2 2 1899 1899-01-31 10:00:01 blood 1802 #> 3604 9.2 2 1899 1899-01-31 10:00:01 bone 1802 #> 3605 18.1 3 1899 1899-03-02 20:00:01 blood 1803 #> 3606 18.1 3 1899 1899-03-02 20:00:01 bone 1803 #> 3607 14.2 4 1899 1899-04-02 06:00:00 blood 1804 #> 3608 14.2 4 1899 1899-04-02 06:00:00 bone 1804 #> 3609 7.7 5 1899 1899-05-02 16:00:01 blood 1805 #> 3610 7.7 5 1899 1899-05-02 16:00:01 bone 1805 #> 3611 20.5 6 1899 1899-06-02 02:00:01 blood 1806 #> 3612 20.5 6 1899 1899-06-02 02:00:01 bone 1806 #> 3613 13.5 7 1899 1899-07-02 12:00:00 blood 1807 #> 3614 13.5 7 1899 1899-07-02 12:00:00 bone 1807 #> 3615 2.9 8 1899 1899-08-01 22:00:01 blood 1808 #> 3616 2.9 8 1899 1899-08-01 22:00:01 bone 1808 #> 3617 8.4 9 1899 1899-09-01 08:00:01 blood 1809 #> 3618 8.4 9 1899 1899-09-01 08:00:01 bone 1809 #> 3619 13.0 10 1899 1899-10-01 18:00:00 blood 1810 #> 3620 13.0 10 1899 1899-10-01 18:00:00 bone 1810 #> 3621 7.8 11 1899 1899-11-01 04:00:01 blood 1811 #> 3622 7.8 11 1899 1899-11-01 04:00:01 bone 1811 #> 3623 10.5 12 1899 1899-12-01 14:00:01 blood 1812 #> 3624 10.5 12 1899 1899-12-01 14:00:01 bone 1812 #> 3625 9.4 1 1900 1900-01-01 00:00:00 blood 1813 #> 3626 9.4 1 1900 1900-01-01 00:00:00 bone 1813 #> 3627 13.6 2 1900 1900-01-31 10:00:01 blood 1814 #> 3628 13.6 2 1900 1900-01-31 10:00:01 bone 1814 #> 3629 8.6 3 1900 1900-03-02 20:00:01 blood 1815 #> 3630 8.6 3 1900 1900-03-02 20:00:01 bone 1815 #> 3631 16.0 4 1900 1900-04-02 06:00:00 blood 1816 #> 3632 16.0 4 1900 1900-04-02 06:00:00 bone 1816 #> 3633 15.2 5 1900 1900-05-02 16:00:01 blood 1817 #> 3634 15.2 5 1900 1900-05-02 16:00:01 bone 1817 #> 3635 12.1 6 1900 1900-06-02 02:00:01 blood 1818 #> 3636 12.1 6 1900 1900-06-02 02:00:01 bone 1818 #> 3637 8.3 7 1900 1900-07-02 12:00:00 blood 1819 #> 3638 8.3 7 1900 1900-07-02 12:00:00 bone 1819 #> 3639 4.3 8 1900 1900-08-01 22:00:01 blood 1820 #> 3640 4.3 8 1900 1900-08-01 22:00:01 bone 1820 #> 3641 8.3 9 1900 1900-09-01 08:00:01 blood 1821 #> 3642 8.3 9 1900 1900-09-01 08:00:01 bone 1821 #> 3643 12.9 10 1900 1900-10-01 18:00:00 blood 1822 #> 3644 12.9 10 1900 1900-10-01 18:00:00 bone 1822 #> 3645 4.5 11 1900 1900-11-01 04:00:01 blood 1823 #> 3646 4.5 11 1900 1900-11-01 04:00:01 bone 1823 #> 3647 0.3 12 1900 1900-12-01 14:00:01 blood 1824 #> 3648 0.3 12 1900 1900-12-01 14:00:01 bone 1824 #> 3649 0.2 1 1901 1901-01-01 00:00:00 blood 1825 #> 3650 0.2 1 1901 1901-01-01 00:00:00 bone 1825 #> 3651 2.4 2 1901 1901-01-31 10:00:01 blood 1826 #> 3652 2.4 2 1901 1901-01-31 10:00:01 bone 1826 #> 3653 4.5 3 1901 1901-03-02 20:00:01 blood 1827 #> 3654 4.5 3 1901 1901-03-02 20:00:01 bone 1827 #> 3655 0.0 4 1901 1901-04-02 06:00:00 blood 1828 #> 3656 0.0 4 1901 1901-04-02 06:00:00 bone 1828 #> 3657 10.2 5 1901 1901-05-02 16:00:01 blood 1829 #> 3658 10.2 5 1901 1901-05-02 16:00:01 bone 1829 #> 3659 5.8 6 1901 1901-06-02 02:00:01 blood 1830 #> 3660 5.8 6 1901 1901-06-02 02:00:01 bone 1830 #> 3661 0.7 7 1901 1901-07-02 12:00:00 blood 1831 #> 3662 0.7 7 1901 1901-07-02 12:00:00 bone 1831 #> 3663 1.0 8 1901 1901-08-01 22:00:01 blood 1832 #> 3664 1.0 8 1901 1901-08-01 22:00:01 bone 1832 #> 3665 0.6 9 1901 1901-09-01 08:00:01 blood 1833 #> 3666 0.6 9 1901 1901-09-01 08:00:01 bone 1833 #> 3667 3.7 10 1901 1901-10-01 18:00:00 blood 1834 #> 3668 3.7 10 1901 1901-10-01 18:00:00 bone 1834 #> 3669 3.8 11 1901 1901-11-01 04:00:01 blood 1835 #> 3670 3.8 11 1901 1901-11-01 04:00:01 bone 1835 #> 3671 0.0 12 1901 1901-12-01 14:00:01 blood 1836 #> 3672 0.0 12 1901 1901-12-01 14:00:01 bone 1836 #> 3673 5.2 1 1902 1902-01-01 00:00:00 blood 1837 #> 3674 5.2 1 1902 1902-01-01 00:00:00 bone 1837 #> 3675 0.0 2 1902 1902-01-31 10:00:01 blood 1838 #> 3676 0.0 2 1902 1902-01-31 10:00:01 bone 1838 #> 3677 12.4 3 1902 1902-03-02 20:00:01 blood 1839 #> 3678 12.4 3 1902 1902-03-02 20:00:01 bone 1839 #> 3679 0.0 4 1902 1902-04-02 06:00:00 blood 1840 #> 3680 0.0 4 1902 1902-04-02 06:00:00 bone 1840 #> 3681 2.8 5 1902 1902-05-02 16:00:01 blood 1841 #> 3682 2.8 5 1902 1902-05-02 16:00:01 bone 1841 #> 3683 1.4 6 1902 1902-06-02 02:00:01 blood 1842 #> 3684 1.4 6 1902 1902-06-02 02:00:01 bone 1842 #> 3685 0.9 7 1902 1902-07-02 12:00:00 blood 1843 #> 3686 0.9 7 1902 1902-07-02 12:00:00 bone 1843 #> 3687 2.3 8 1902 1902-08-01 22:00:01 blood 1844 #> 3688 2.3 8 1902 1902-08-01 22:00:01 bone 1844 #> 3689 7.6 9 1902 1902-09-01 08:00:01 blood 1845 #> 3690 7.6 9 1902 1902-09-01 08:00:01 bone 1845 #> 3691 16.3 10 1902 1902-10-01 18:00:00 blood 1846 #> 3692 16.3 10 1902 1902-10-01 18:00:00 bone 1846 #> 3693 10.3 11 1902 1902-11-01 04:00:01 blood 1847 #> 3694 10.3 11 1902 1902-11-01 04:00:01 bone 1847 #> 3695 1.1 12 1902 1902-12-01 14:00:01 blood 1848 #> 3696 1.1 12 1902 1902-12-01 14:00:01 bone 1848 #> 3697 8.3 1 1903 1903-01-01 00:00:00 blood 1849 #> 3698 8.3 1 1903 1903-01-01 00:00:00 bone 1849 #> 3699 17.0 2 1903 1903-01-31 10:00:01 blood 1850 #> 3700 17.0 2 1903 1903-01-31 10:00:01 bone 1850 #> 3701 13.5 3 1903 1903-03-02 20:00:01 blood 1851 #> 3702 13.5 3 1903 1903-03-02 20:00:01 bone 1851 #> 3703 26.1 4 1903 1903-04-02 06:00:00 blood 1852 #> 3704 26.1 4 1903 1903-04-02 06:00:00 bone 1852 #> 3705 14.6 5 1903 1903-05-02 16:00:01 blood 1853 #> 3706 14.6 5 1903 1903-05-02 16:00:01 bone 1853 #> 3707 16.3 6 1903 1903-06-02 02:00:01 blood 1854 #> 3708 16.3 6 1903 1903-06-02 02:00:01 bone 1854 #> 3709 27.9 7 1903 1903-07-02 12:00:00 blood 1855 #> 3710 27.9 7 1903 1903-07-02 12:00:00 bone 1855 #> 3711 28.8 8 1903 1903-08-01 22:00:01 blood 1856 #> 3712 28.8 8 1903 1903-08-01 22:00:01 bone 1856 #> 3713 11.1 9 1903 1903-09-01 08:00:01 blood 1857 #> 3714 11.1 9 1903 1903-09-01 08:00:01 bone 1857 #> 3715 38.9 10 1903 1903-10-01 18:00:00 blood 1858 #> 3716 38.9 10 1903 1903-10-01 18:00:00 bone 1858 #> 3717 44.5 11 1903 1903-11-01 04:00:01 blood 1859 #> 3718 44.5 11 1903 1903-11-01 04:00:01 bone 1859 #> 3719 45.6 12 1903 1903-12-01 14:00:01 blood 1860 #> 3720 45.6 12 1903 1903-12-01 14:00:01 bone 1860 #> 3721 31.6 1 1904 1904-01-01 00:00:00 blood 1861 #> 3722 31.6 1 1904 1904-01-01 00:00:00 bone 1861 #> 3723 24.5 2 1904 1904-01-31 12:00:01 blood 1862 #> 3724 24.5 2 1904 1904-01-31 12:00:01 bone 1862 #> 3725 37.2 3 1904 1904-03-02 00:00:01 blood 1863 #> 3726 37.2 3 1904 1904-03-02 00:00:01 bone 1863 #> 3727 43.0 4 1904 1904-04-01 12:00:00 blood 1864 #> 3728 43.0 4 1904 1904-04-01 12:00:00 bone 1864 #> 3729 39.5 5 1904 1904-05-02 00:00:01 blood 1865 #> 3730 39.5 5 1904 1904-05-02 00:00:01 bone 1865 #> 3731 41.9 6 1904 1904-06-01 12:00:01 blood 1866 #> 3732 41.9 6 1904 1904-06-01 12:00:01 bone 1866 #> 3733 50.6 7 1904 1904-07-02 00:00:00 blood 1867 #> 3734 50.6 7 1904 1904-07-02 00:00:00 bone 1867 #> 3735 58.2 8 1904 1904-08-01 12:00:01 blood 1868 #> 3736 58.2 8 1904 1904-08-01 12:00:01 bone 1868 #> 3737 30.1 9 1904 1904-09-01 00:00:01 blood 1869 #> 3738 30.1 9 1904 1904-09-01 00:00:01 bone 1869 #> 3739 54.2 10 1904 1904-10-01 12:00:00 blood 1870 #> 3740 54.2 10 1904 1904-10-01 12:00:00 bone 1870 #> 3741 38.0 11 1904 1904-11-01 00:00:01 blood 1871 #> 3742 38.0 11 1904 1904-11-01 00:00:01 bone 1871 #> 3743 54.6 12 1904 1904-12-01 12:00:01 blood 1872 #> 3744 54.6 12 1904 1904-12-01 12:00:01 bone 1872 #> 3745 54.8 1 1905 1905-01-01 00:00:00 blood 1873 #> 3746 54.8 1 1905 1905-01-01 00:00:00 bone 1873 #> 3747 85.8 2 1905 1905-01-31 10:00:01 blood 1874 #> 3748 85.8 2 1905 1905-01-31 10:00:01 bone 1874 #> 3749 56.5 3 1905 1905-03-02 20:00:01 blood 1875 #> 3750 56.5 3 1905 1905-03-02 20:00:01 bone 1875 #> 3751 39.3 4 1905 1905-04-02 06:00:00 blood 1876 #> 3752 39.3 4 1905 1905-04-02 06:00:00 bone 1876 #> 3753 48.0 5 1905 1905-05-02 16:00:01 blood 1877 #> 3754 48.0 5 1905 1905-05-02 16:00:01 bone 1877 #> 3755 49.0 6 1905 1905-06-02 02:00:01 blood 1878 #> 3756 49.0 6 1905 1905-06-02 02:00:01 bone 1878 #> 3757 73.0 7 1905 1905-07-02 12:00:00 blood 1879 #> 3758 73.0 7 1905 1905-07-02 12:00:00 bone 1879 #> 3759 58.8 8 1905 1905-08-01 22:00:01 blood 1880 #> 3760 58.8 8 1905 1905-08-01 22:00:01 bone 1880 #> 3761 55.0 9 1905 1905-09-01 08:00:01 blood 1881 #> 3762 55.0 9 1905 1905-09-01 08:00:01 bone 1881 #> 3763 78.7 10 1905 1905-10-01 18:00:00 blood 1882 #> 3764 78.7 10 1905 1905-10-01 18:00:00 bone 1882 #> 3765 107.2 11 1905 1905-11-01 04:00:01 blood 1883 #> 3766 107.2 11 1905 1905-11-01 04:00:01 bone 1883 #> 3767 55.5 12 1905 1905-12-01 14:00:01 blood 1884 #> 3768 55.5 12 1905 1905-12-01 14:00:01 bone 1884 #> 3769 45.5 1 1906 1906-01-01 00:00:00 blood 1885 #> 3770 45.5 1 1906 1906-01-01 00:00:00 bone 1885 #> 3771 31.3 2 1906 1906-01-31 10:00:01 blood 1886 #> 3772 31.3 2 1906 1906-01-31 10:00:01 bone 1886 #> 3773 64.5 3 1906 1906-03-02 20:00:01 blood 1887 #> 3774 64.5 3 1906 1906-03-02 20:00:01 bone 1887 #> 3775 55.3 4 1906 1906-04-02 06:00:00 blood 1888 #> 3776 55.3 4 1906 1906-04-02 06:00:00 bone 1888 #> 3777 57.7 5 1906 1906-05-02 16:00:01 blood 1889 #> 3778 57.7 5 1906 1906-05-02 16:00:01 bone 1889 #> 3779 63.2 6 1906 1906-06-02 02:00:01 blood 1890 #> 3780 63.2 6 1906 1906-06-02 02:00:01 bone 1890 #> 3781 103.6 7 1906 1906-07-02 12:00:00 blood 1891 #> 3782 103.6 7 1906 1906-07-02 12:00:00 bone 1891 #> 3783 47.7 8 1906 1906-08-01 22:00:01 blood 1892 #> 3784 47.7 8 1906 1906-08-01 22:00:01 bone 1892 #> 3785 56.1 9 1906 1906-09-01 08:00:01 blood 1893 #> 3786 56.1 9 1906 1906-09-01 08:00:01 bone 1893 #> 3787 17.8 10 1906 1906-10-01 18:00:00 blood 1894 #> 3788 17.8 10 1906 1906-10-01 18:00:00 bone 1894 #> 3789 38.9 11 1906 1906-11-01 04:00:01 blood 1895 #> 3790 38.9 11 1906 1906-11-01 04:00:01 bone 1895 #> 3791 64.7 12 1906 1906-12-01 14:00:01 blood 1896 #> 3792 64.7 12 1906 1906-12-01 14:00:01 bone 1896 #> 3793 76.4 1 1907 1907-01-01 00:00:00 blood 1897 #> 3794 76.4 1 1907 1907-01-01 00:00:00 bone 1897 #> 3795 108.2 2 1907 1907-01-31 10:00:01 blood 1898 #> 3796 108.2 2 1907 1907-01-31 10:00:01 bone 1898 #> 3797 60.7 3 1907 1907-03-02 20:00:01 blood 1899 #> 3798 60.7 3 1907 1907-03-02 20:00:01 bone 1899 #> 3799 52.6 4 1907 1907-04-02 06:00:00 blood 1900 #> 3800 52.6 4 1907 1907-04-02 06:00:00 bone 1900 #> 3801 42.9 5 1907 1907-05-02 16:00:01 blood 1901 #> 3802 42.9 5 1907 1907-05-02 16:00:01 bone 1901 #> 3803 40.4 6 1907 1907-06-02 02:00:01 blood 1902 #> 3804 40.4 6 1907 1907-06-02 02:00:01 bone 1902 #> 3805 49.7 7 1907 1907-07-02 12:00:00 blood 1903 #> 3806 49.7 7 1907 1907-07-02 12:00:00 bone 1903 #> 3807 54.3 8 1907 1907-08-01 22:00:01 blood 1904 #> 3808 54.3 8 1907 1907-08-01 22:00:01 bone 1904 #> 3809 85.0 9 1907 1907-09-01 08:00:01 blood 1905 #> 3810 85.0 9 1907 1907-09-01 08:00:01 bone 1905 #> 3811 65.4 10 1907 1907-10-01 18:00:00 blood 1906 #> 3812 65.4 10 1907 1907-10-01 18:00:00 bone 1906 #> 3813 61.5 11 1907 1907-11-01 04:00:01 blood 1907 #> 3814 61.5 11 1907 1907-11-01 04:00:01 bone 1907 #> 3815 47.3 12 1907 1907-12-01 14:00:01 blood 1908 #> 3816 47.3 12 1907 1907-12-01 14:00:01 bone 1908 #> 3817 39.2 1 1908 1908-01-01 00:00:00 blood 1909 #> 3818 39.2 1 1908 1908-01-01 00:00:00 bone 1909 #> 3819 33.9 2 1908 1908-01-31 12:00:01 blood 1910 #> 3820 33.9 2 1908 1908-01-31 12:00:01 bone 1910 #> 3821 28.7 3 1908 1908-03-02 00:00:01 blood 1911 #> 3822 28.7 3 1908 1908-03-02 00:00:01 bone 1911 #> 3823 57.6 4 1908 1908-04-01 12:00:00 blood 1912 #> 3824 57.6 4 1908 1908-04-01 12:00:00 bone 1912 #> 3825 40.8 5 1908 1908-05-02 00:00:01 blood 1913 #> 3826 40.8 5 1908 1908-05-02 00:00:01 bone 1913 #> 3827 48.1 6 1908 1908-06-01 12:00:01 blood 1914 #> 3828 48.1 6 1908 1908-06-01 12:00:01 bone 1914 #> 3829 39.5 7 1908 1908-07-02 00:00:00 blood 1915 #> 3830 39.5 7 1908 1908-07-02 00:00:00 bone 1915 #> 3831 90.5 8 1908 1908-08-01 12:00:01 blood 1916 #> 3832 90.5 8 1908 1908-08-01 12:00:01 bone 1916 #> 3833 86.9 9 1908 1908-09-01 00:00:01 blood 1917 #> 3834 86.9 9 1908 1908-09-01 00:00:01 bone 1917 #> 3835 32.3 10 1908 1908-10-01 12:00:00 blood 1918 #> 3836 32.3 10 1908 1908-10-01 12:00:00 bone 1918 #> 3837 45.5 11 1908 1908-11-01 00:00:01 blood 1919 #> 3838 45.5 11 1908 1908-11-01 00:00:01 bone 1919 #> 3839 39.5 12 1908 1908-12-01 12:00:01 blood 1920 #> 3840 39.5 12 1908 1908-12-01 12:00:01 bone 1920 #> 3841 56.7 1 1909 1909-01-01 00:00:00 blood 1921 #> 3842 56.7 1 1909 1909-01-01 00:00:00 bone 1921 #> 3843 46.6 2 1909 1909-01-31 10:00:01 blood 1922 #> 3844 46.6 2 1909 1909-01-31 10:00:01 bone 1922 #> 3845 66.3 3 1909 1909-03-02 20:00:01 blood 1923 #> 3846 66.3 3 1909 1909-03-02 20:00:01 bone 1923 #> 3847 32.3 4 1909 1909-04-02 06:00:00 blood 1924 #> 3848 32.3 4 1909 1909-04-02 06:00:00 bone 1924 #> 3849 36.0 5 1909 1909-05-02 16:00:01 blood 1925 #> 3850 36.0 5 1909 1909-05-02 16:00:01 bone 1925 #> 3851 22.6 6 1909 1909-06-02 02:00:01 blood 1926 #> 3852 22.6 6 1909 1909-06-02 02:00:01 bone 1926 #> 3853 35.8 7 1909 1909-07-02 12:00:00 blood 1927 #> 3854 35.8 7 1909 1909-07-02 12:00:00 bone 1927 #> 3855 23.1 8 1909 1909-08-01 22:00:01 blood 1928 #> 3856 23.1 8 1909 1909-08-01 22:00:01 bone 1928 #> 3857 38.8 9 1909 1909-09-01 08:00:01 blood 1929 #> 3858 38.8 9 1909 1909-09-01 08:00:01 bone 1929 #> 3859 58.4 10 1909 1909-10-01 18:00:00 blood 1930 #> 3860 58.4 10 1909 1909-10-01 18:00:00 bone 1930 #> 3861 55.8 11 1909 1909-11-01 04:00:01 blood 1931 #> 3862 55.8 11 1909 1909-11-01 04:00:01 bone 1931 #> 3863 54.2 12 1909 1909-12-01 14:00:01 blood 1932 #> 3864 54.2 12 1909 1909-12-01 14:00:01 bone 1932 #> 3865 26.4 1 1910 1910-01-01 00:00:00 blood 1933 #> 3866 26.4 1 1910 1910-01-01 00:00:00 bone 1933 #> 3867 31.5 2 1910 1910-01-31 10:00:01 blood 1934 #> 3868 31.5 2 1910 1910-01-31 10:00:01 bone 1934 #> 3869 21.4 3 1910 1910-03-02 20:00:01 blood 1935 #> 3870 21.4 3 1910 1910-03-02 20:00:01 bone 1935 #> 3871 8.4 4 1910 1910-04-02 06:00:00 blood 1936 #> 3872 8.4 4 1910 1910-04-02 06:00:00 bone 1936 #> 3873 22.2 5 1910 1910-05-02 16:00:01 blood 1937 #> 3874 22.2 5 1910 1910-05-02 16:00:01 bone 1937 #> 3875 12.3 6 1910 1910-06-02 02:00:01 blood 1938 #> 3876 12.3 6 1910 1910-06-02 02:00:01 bone 1938 #> 3877 14.1 7 1910 1910-07-02 12:00:00 blood 1939 #> 3878 14.1 7 1910 1910-07-02 12:00:00 bone 1939 #> 3879 11.5 8 1910 1910-08-01 22:00:01 blood 1940 #> 3880 11.5 8 1910 1910-08-01 22:00:01 bone 1940 #> 3881 26.2 9 1910 1910-09-01 08:00:01 blood 1941 #> 3882 26.2 9 1910 1910-09-01 08:00:01 bone 1941 #> 3883 38.3 10 1910 1910-10-01 18:00:00 blood 1942 #> 3884 38.3 10 1910 1910-10-01 18:00:00 bone 1942 #> 3885 4.9 11 1910 1910-11-01 04:00:01 blood 1943 #> 3886 4.9 11 1910 1910-11-01 04:00:01 bone 1943 #> 3887 5.8 12 1910 1910-12-01 14:00:01 blood 1944 #> 3888 5.8 12 1910 1910-12-01 14:00:01 bone 1944 #> 3889 3.4 1 1911 1911-01-01 00:00:00 blood 1945 #> 3890 3.4 1 1911 1911-01-01 00:00:00 bone 1945 #> 3891 9.0 2 1911 1911-01-31 10:00:01 blood 1946 #> 3892 9.0 2 1911 1911-01-31 10:00:01 bone 1946 #> 3893 7.8 3 1911 1911-03-02 20:00:01 blood 1947 #> 3894 7.8 3 1911 1911-03-02 20:00:01 bone 1947 #> 3895 16.5 4 1911 1911-04-02 06:00:00 blood 1948 #> 3896 16.5 4 1911 1911-04-02 06:00:00 bone 1948 #> 3897 9.0 5 1911 1911-05-02 16:00:01 blood 1949 #> 3898 9.0 5 1911 1911-05-02 16:00:01 bone 1949 #> 3899 2.2 6 1911 1911-06-02 02:00:01 blood 1950 #> 3900 2.2 6 1911 1911-06-02 02:00:01 bone 1950 #> 3901 3.5 7 1911 1911-07-02 12:00:00 blood 1951 #> 3902 3.5 7 1911 1911-07-02 12:00:00 bone 1951 #> 3903 4.0 8 1911 1911-08-01 22:00:01 blood 1952 #> 3904 4.0 8 1911 1911-08-01 22:00:01 bone 1952 #> 3905 4.0 9 1911 1911-09-01 08:00:01 blood 1953 #> 3906 4.0 9 1911 1911-09-01 08:00:01 bone 1953 #> 3907 2.6 10 1911 1911-10-01 18:00:00 blood 1954 #> 3908 2.6 10 1911 1911-10-01 18:00:00 bone 1954 #> 3909 4.2 11 1911 1911-11-01 04:00:01 blood 1955 #> 3910 4.2 11 1911 1911-11-01 04:00:01 bone 1955 #> 3911 2.2 12 1911 1911-12-01 14:00:01 blood 1956 #> 3912 2.2 12 1911 1911-12-01 14:00:01 bone 1956 #> 3913 0.3 1 1912 1912-01-01 00:00:00 blood 1957 #> 3914 0.3 1 1912 1912-01-01 00:00:00 bone 1957 #> 3915 0.0 2 1912 1912-01-31 12:00:01 blood 1958 #> 3916 0.0 2 1912 1912-01-31 12:00:01 bone 1958 #> 3917 4.9 3 1912 1912-03-02 00:00:01 blood 1959 #> 3918 4.9 3 1912 1912-03-02 00:00:01 bone 1959 #> 3919 4.5 4 1912 1912-04-01 12:00:00 blood 1960 #> 3920 4.5 4 1912 1912-04-01 12:00:00 bone 1960 #> 3921 4.4 5 1912 1912-05-02 00:00:01 blood 1961 #> 3922 4.4 5 1912 1912-05-02 00:00:01 bone 1961 #> 3923 4.1 6 1912 1912-06-01 12:00:01 blood 1962 #> 3924 4.1 6 1912 1912-06-01 12:00:01 bone 1962 #> 3925 3.0 7 1912 1912-07-02 00:00:00 blood 1963 #> 3926 3.0 7 1912 1912-07-02 00:00:00 bone 1963 #> 3927 0.3 8 1912 1912-08-01 12:00:01 blood 1964 #> 3928 0.3 8 1912 1912-08-01 12:00:01 bone 1964 #> 3929 9.5 9 1912 1912-09-01 00:00:01 blood 1965 #> 3930 9.5 9 1912 1912-09-01 00:00:01 bone 1965 #> 3931 4.6 10 1912 1912-10-01 12:00:00 blood 1966 #> 3932 4.6 10 1912 1912-10-01 12:00:00 bone 1966 #> 3933 1.1 11 1912 1912-11-01 00:00:01 blood 1967 #> 3934 1.1 11 1912 1912-11-01 00:00:01 bone 1967 #> 3935 6.4 12 1912 1912-12-01 12:00:01 blood 1968 #> 3936 6.4 12 1912 1912-12-01 12:00:01 bone 1968 #> 3937 2.3 1 1913 1913-01-01 00:00:00 blood 1969 #> 3938 2.3 1 1913 1913-01-01 00:00:00 bone 1969 #> 3939 2.9 2 1913 1913-01-31 10:00:01 blood 1970 #> 3940 2.9 2 1913 1913-01-31 10:00:01 bone 1970 #> 3941 0.5 3 1913 1913-03-02 20:00:01 blood 1971 #> 3942 0.5 3 1913 1913-03-02 20:00:01 bone 1971 #> 3943 0.9 4 1913 1913-04-02 06:00:00 blood 1972 #> 3944 0.9 4 1913 1913-04-02 06:00:00 bone 1972 #> 3945 0.0 5 1913 1913-05-02 16:00:01 blood 1973 #> 3946 0.0 5 1913 1913-05-02 16:00:01 bone 1973 #> 3947 0.0 6 1913 1913-06-02 02:00:01 blood 1974 #> 3948 0.0 6 1913 1913-06-02 02:00:01 bone 1974 #> 3949 1.7 7 1913 1913-07-02 12:00:00 blood 1975 #> 3950 1.7 7 1913 1913-07-02 12:00:00 bone 1975 #> 3951 0.2 8 1913 1913-08-01 22:00:01 blood 1976 #> 3952 0.2 8 1913 1913-08-01 22:00:01 bone 1976 #> 3953 1.2 9 1913 1913-09-01 08:00:01 blood 1977 #> 3954 1.2 9 1913 1913-09-01 08:00:01 bone 1977 #> 3955 3.1 10 1913 1913-10-01 18:00:00 blood 1978 #> 3956 3.1 10 1913 1913-10-01 18:00:00 bone 1978 #> 3957 0.7 11 1913 1913-11-01 04:00:01 blood 1979 #> 3958 0.7 11 1913 1913-11-01 04:00:01 bone 1979 #> 3959 3.8 12 1913 1913-12-01 14:00:01 blood 1980 #> 3960 3.8 12 1913 1913-12-01 14:00:01 bone 1980 #> 3961 2.8 1 1914 1914-01-01 00:00:00 blood 1981 #> 3962 2.8 1 1914 1914-01-01 00:00:00 bone 1981 #> 3963 2.6 2 1914 1914-01-31 10:00:01 blood 1982 #> 3964 2.6 2 1914 1914-01-31 10:00:01 bone 1982 #> 3965 3.1 3 1914 1914-03-02 20:00:01 blood 1983 #> 3966 3.1 3 1914 1914-03-02 20:00:01 bone 1983 #> 3967 17.3 4 1914 1914-04-02 06:00:00 blood 1984 #> 3968 17.3 4 1914 1914-04-02 06:00:00 bone 1984 #> 3969 5.2 5 1914 1914-05-02 16:00:01 blood 1985 #> 3970 5.2 5 1914 1914-05-02 16:00:01 bone 1985 #> 3971 11.4 6 1914 1914-06-02 02:00:01 blood 1986 #> 3972 11.4 6 1914 1914-06-02 02:00:01 bone 1986 #> 3973 5.4 7 1914 1914-07-02 12:00:00 blood 1987 #> 3974 5.4 7 1914 1914-07-02 12:00:00 bone 1987 #> 3975 7.7 8 1914 1914-08-01 22:00:01 blood 1988 #> 3976 7.7 8 1914 1914-08-01 22:00:01 bone 1988 #> 3977 12.7 9 1914 1914-09-01 08:00:01 blood 1989 #> 3978 12.7 9 1914 1914-09-01 08:00:01 bone 1989 #> 3979 8.2 10 1914 1914-10-01 18:00:00 blood 1990 #> 3980 8.2 10 1914 1914-10-01 18:00:00 bone 1990 #> 3981 16.4 11 1914 1914-11-01 04:00:01 blood 1991 #> 3982 16.4 11 1914 1914-11-01 04:00:01 bone 1991 #> 3983 22.3 12 1914 1914-12-01 14:00:01 blood 1992 #> 3984 22.3 12 1914 1914-12-01 14:00:01 bone 1992 #> 3985 23.0 1 1915 1915-01-01 00:00:00 blood 1993 #> 3986 23.0 1 1915 1915-01-01 00:00:00 bone 1993 #> 3987 42.3 2 1915 1915-01-31 10:00:01 blood 1994 #> 3988 42.3 2 1915 1915-01-31 10:00:01 bone 1994 #> 3989 38.8 3 1915 1915-03-02 20:00:01 blood 1995 #> 3990 38.8 3 1915 1915-03-02 20:00:01 bone 1995 #> 3991 41.3 4 1915 1915-04-02 06:00:00 blood 1996 #> 3992 41.3 4 1915 1915-04-02 06:00:00 bone 1996 #> 3993 33.0 5 1915 1915-05-02 16:00:01 blood 1997 #> 3994 33.0 5 1915 1915-05-02 16:00:01 bone 1997 #> 3995 68.8 6 1915 1915-06-02 02:00:01 blood 1998 #> 3996 68.8 6 1915 1915-06-02 02:00:01 bone 1998 #> 3997 71.6 7 1915 1915-07-02 12:00:00 blood 1999 #> 3998 71.6 7 1915 1915-07-02 12:00:00 bone 1999 #> 3999 69.6 8 1915 1915-08-01 22:00:01 blood 2000 #> 4000 69.6 8 1915 1915-08-01 22:00:01 bone 2000 #> 4001 49.5 9 1915 1915-09-01 08:00:01 blood 2001 #> 4002 49.5 9 1915 1915-09-01 08:00:01 bone 2001 #> 4003 53.5 10 1915 1915-10-01 18:00:00 blood 2002 #> 4004 53.5 10 1915 1915-10-01 18:00:00 bone 2002 #> 4005 42.5 11 1915 1915-11-01 04:00:01 blood 2003 #> 4006 42.5 11 1915 1915-11-01 04:00:01 bone 2003 #> 4007 34.5 12 1915 1915-12-01 14:00:01 blood 2004 #> 4008 34.5 12 1915 1915-12-01 14:00:01 bone 2004 #> 4009 45.3 1 1916 1916-01-01 00:00:00 blood 2005 #> 4010 45.3 1 1916 1916-01-01 00:00:00 bone 2005 #> 4011 55.4 2 1916 1916-01-31 12:00:01 blood 2006 #> 4012 55.4 2 1916 1916-01-31 12:00:01 bone 2006 #> 4013 67.0 3 1916 1916-03-02 00:00:01 blood 2007 #> 4014 67.0 3 1916 1916-03-02 00:00:01 bone 2007 #> 4015 71.8 4 1916 1916-04-01 12:00:00 blood 2008 #> 4016 71.8 4 1916 1916-04-01 12:00:00 bone 2008 #> 4017 74.5 5 1916 1916-05-02 00:00:01 blood 2009 #> 4018 74.5 5 1916 1916-05-02 00:00:01 bone 2009 #> 4019 67.7 6 1916 1916-06-01 12:00:01 blood 2010 #> 4020 67.7 6 1916 1916-06-01 12:00:01 bone 2010 #> 4021 53.5 7 1916 1916-07-02 00:00:00 blood 2011 #> 4022 53.5 7 1916 1916-07-02 00:00:00 bone 2011 #> 4023 35.2 8 1916 1916-08-01 12:00:01 blood 2012 #> 4024 35.2 8 1916 1916-08-01 12:00:01 bone 2012 #> 4025 45.1 9 1916 1916-09-01 00:00:01 blood 2013 #> 4026 45.1 9 1916 1916-09-01 00:00:01 bone 2013 #> 4027 50.7 10 1916 1916-10-01 12:00:00 blood 2014 #> 4028 50.7 10 1916 1916-10-01 12:00:00 bone 2014 #> 4029 65.6 11 1916 1916-11-01 00:00:01 blood 2015 #> 4030 65.6 11 1916 1916-11-01 00:00:01 bone 2015 #> 4031 53.0 12 1916 1916-12-01 12:00:01 blood 2016 #> 4032 53.0 12 1916 1916-12-01 12:00:01 bone 2016 #> 4033 74.7 1 1917 1917-01-01 00:00:00 blood 2017 #> 4034 74.7 1 1917 1917-01-01 00:00:00 bone 2017 #> 4035 71.9 2 1917 1917-01-31 10:00:01 blood 2018 #> 4036 71.9 2 1917 1917-01-31 10:00:01 bone 2018 #> 4037 94.8 3 1917 1917-03-02 20:00:01 blood 2019 #> 4038 94.8 3 1917 1917-03-02 20:00:01 bone 2019 #> 4039 74.7 4 1917 1917-04-02 06:00:00 blood 2020 #> 4040 74.7 4 1917 1917-04-02 06:00:00 bone 2020 #> 4041 114.1 5 1917 1917-05-02 16:00:01 blood 2021 #> 4042 114.1 5 1917 1917-05-02 16:00:01 bone 2021 #> 4043 114.9 6 1917 1917-06-02 02:00:01 blood 2022 #> 4044 114.9 6 1917 1917-06-02 02:00:01 bone 2022 #> 4045 119.8 7 1917 1917-07-02 12:00:00 blood 2023 #> 4046 119.8 7 1917 1917-07-02 12:00:00 bone 2023 #> 4047 154.5 8 1917 1917-08-01 22:00:01 blood 2024 #> 4048 154.5 8 1917 1917-08-01 22:00:01 bone 2024 #> 4049 129.4 9 1917 1917-09-01 08:00:01 blood 2025 #> 4050 129.4 9 1917 1917-09-01 08:00:01 bone 2025 #> 4051 72.2 10 1917 1917-10-01 18:00:00 blood 2026 #> 4052 72.2 10 1917 1917-10-01 18:00:00 bone 2026 #> 4053 96.4 11 1917 1917-11-01 04:00:01 blood 2027 #> 4054 96.4 11 1917 1917-11-01 04:00:01 bone 2027 #> 4055 129.3 12 1917 1917-12-01 14:00:01 blood 2028 #> 4056 129.3 12 1917 1917-12-01 14:00:01 bone 2028 #> 4057 96.0 1 1918 1918-01-01 00:00:00 blood 2029 #> 4058 96.0 1 1918 1918-01-01 00:00:00 bone 2029 #> 4059 65.3 2 1918 1918-01-31 10:00:01 blood 2030 #> 4060 65.3 2 1918 1918-01-31 10:00:01 bone 2030 #> 4061 72.2 3 1918 1918-03-02 20:00:01 blood 2031 #> 4062 72.2 3 1918 1918-03-02 20:00:01 bone 2031 #> 4063 80.5 4 1918 1918-04-02 06:00:00 blood 2032 #> 4064 80.5 4 1918 1918-04-02 06:00:00 bone 2032 #> 4065 76.7 5 1918 1918-05-02 16:00:01 blood 2033 #> 4066 76.7 5 1918 1918-05-02 16:00:01 bone 2033 #> 4067 59.4 6 1918 1918-06-02 02:00:01 blood 2034 #> 4068 59.4 6 1918 1918-06-02 02:00:01 bone 2034 #> 4069 107.6 7 1918 1918-07-02 12:00:00 blood 2035 #> 4070 107.6 7 1918 1918-07-02 12:00:00 bone 2035 #> 4071 101.7 8 1918 1918-08-01 22:00:01 blood 2036 #> 4072 101.7 8 1918 1918-08-01 22:00:01 bone 2036 #> 4073 79.9 9 1918 1918-09-01 08:00:01 blood 2037 #> 4074 79.9 9 1918 1918-09-01 08:00:01 bone 2037 #> 4075 85.0 10 1918 1918-10-01 18:00:00 blood 2038 #> 4076 85.0 10 1918 1918-10-01 18:00:00 bone 2038 #> 4077 83.4 11 1918 1918-11-01 04:00:01 blood 2039 #> 4078 83.4 11 1918 1918-11-01 04:00:01 bone 2039 #> 4079 59.2 12 1918 1918-12-01 14:00:01 blood 2040 #> 4080 59.2 12 1918 1918-12-01 14:00:01 bone 2040 #> 4081 48.1 1 1919 1919-01-01 00:00:00 blood 2041 #> 4082 48.1 1 1919 1919-01-01 00:00:00 bone 2041 #> 4083 79.5 2 1919 1919-01-31 10:00:01 blood 2042 #> 4084 79.5 2 1919 1919-01-31 10:00:01 bone 2042 #> 4085 66.5 3 1919 1919-03-02 20:00:01 blood 2043 #> 4086 66.5 3 1919 1919-03-02 20:00:01 bone 2043 #> 4087 51.8 4 1919 1919-04-02 06:00:00 blood 2044 #> 4088 51.8 4 1919 1919-04-02 06:00:00 bone 2044 #> 4089 88.1 5 1919 1919-05-02 16:00:01 blood 2045 #> 4090 88.1 5 1919 1919-05-02 16:00:01 bone 2045 #> 4091 111.2 6 1919 1919-06-02 02:00:01 blood 2046 #> 4092 111.2 6 1919 1919-06-02 02:00:01 bone 2046 #> 4093 64.7 7 1919 1919-07-02 12:00:00 blood 2047 #> 4094 64.7 7 1919 1919-07-02 12:00:00 bone 2047 #> 4095 69.0 8 1919 1919-08-01 22:00:01 blood 2048 #> 4096 69.0 8 1919 1919-08-01 22:00:01 bone 2048 #> 4097 54.7 9 1919 1919-09-01 08:00:01 blood 2049 #> 4098 54.7 9 1919 1919-09-01 08:00:01 bone 2049 #> 4099 52.8 10 1919 1919-10-01 18:00:00 blood 2050 #> 4100 52.8 10 1919 1919-10-01 18:00:00 bone 2050 #> 4101 42.0 11 1919 1919-11-01 04:00:01 blood 2051 #> 4102 42.0 11 1919 1919-11-01 04:00:01 bone 2051 #> 4103 34.9 12 1919 1919-12-01 14:00:01 blood 2052 #> 4104 34.9 12 1919 1919-12-01 14:00:01 bone 2052 #> 4105 51.1 1 1920 1920-01-01 00:00:00 blood 2053 #> 4106 51.1 1 1920 1920-01-01 00:00:00 bone 2053 #> 4107 53.9 2 1920 1920-01-31 12:00:01 blood 2054 #> 4108 53.9 2 1920 1920-01-31 12:00:01 bone 2054 #> 4109 70.2 3 1920 1920-03-02 00:00:01 blood 2055 #> 4110 70.2 3 1920 1920-03-02 00:00:01 bone 2055 #> 4111 14.8 4 1920 1920-04-01 12:00:00 blood 2056 #> 4112 14.8 4 1920 1920-04-01 12:00:00 bone 2056 #> 4113 33.3 5 1920 1920-05-02 00:00:01 blood 2057 #> 4114 33.3 5 1920 1920-05-02 00:00:01 bone 2057 #> 4115 38.7 6 1920 1920-06-01 12:00:01 blood 2058 #> 4116 38.7 6 1920 1920-06-01 12:00:01 bone 2058 #> 4117 27.5 7 1920 1920-07-02 00:00:00 blood 2059 #> 4118 27.5 7 1920 1920-07-02 00:00:00 bone 2059 #> 4119 19.2 8 1920 1920-08-01 12:00:01 blood 2060 #> 4120 19.2 8 1920 1920-08-01 12:00:01 bone 2060 #> 4121 36.3 9 1920 1920-09-01 00:00:01 blood 2061 #> 4122 36.3 9 1920 1920-09-01 00:00:01 bone 2061 #> 4123 49.6 10 1920 1920-10-01 12:00:00 blood 2062 #> 4124 49.6 10 1920 1920-10-01 12:00:00 bone 2062 #> 4125 27.2 11 1920 1920-11-01 00:00:01 blood 2063 #> 4126 27.2 11 1920 1920-11-01 00:00:01 bone 2063 #> 4127 29.9 12 1920 1920-12-01 12:00:01 blood 2064 #> 4128 29.9 12 1920 1920-12-01 12:00:01 bone 2064 #> 4129 31.5 1 1921 1921-01-01 00:00:00 blood 2065 #> 4130 31.5 1 1921 1921-01-01 00:00:00 bone 2065 #> 4131 28.3 2 1921 1921-01-31 10:00:01 blood 2066 #> 4132 28.3 2 1921 1921-01-31 10:00:01 bone 2066 #> 4133 26.7 3 1921 1921-03-02 20:00:01 blood 2067 #> 4134 26.7 3 1921 1921-03-02 20:00:01 bone 2067 #> 4135 32.4 4 1921 1921-04-02 06:00:00 blood 2068 #> 4136 32.4 4 1921 1921-04-02 06:00:00 bone 2068 #> 4137 22.2 5 1921 1921-05-02 16:00:01 blood 2069 #> 4138 22.2 5 1921 1921-05-02 16:00:01 bone 2069 #> 4139 33.7 6 1921 1921-06-02 02:00:01 blood 2070 #> 4140 33.7 6 1921 1921-06-02 02:00:01 bone 2070 #> 4141 41.9 7 1921 1921-07-02 12:00:00 blood 2071 #> 4142 41.9 7 1921 1921-07-02 12:00:00 bone 2071 #> 4143 22.8 8 1921 1921-08-01 22:00:01 blood 2072 #> 4144 22.8 8 1921 1921-08-01 22:00:01 bone 2072 #> 4145 17.8 9 1921 1921-09-01 08:00:01 blood 2073 #> 4146 17.8 9 1921 1921-09-01 08:00:01 bone 2073 #> 4147 18.2 10 1921 1921-10-01 18:00:00 blood 2074 #> 4148 18.2 10 1921 1921-10-01 18:00:00 bone 2074 #> 4149 17.8 11 1921 1921-11-01 04:00:01 blood 2075 #> 4150 17.8 11 1921 1921-11-01 04:00:01 bone 2075 #> 4151 20.3 12 1921 1921-12-01 14:00:01 blood 2076 #> 4152 20.3 12 1921 1921-12-01 14:00:01 bone 2076 #> 4153 11.8 1 1922 1922-01-01 00:00:00 blood 2077 #> 4154 11.8 1 1922 1922-01-01 00:00:00 bone 2077 #> 4155 26.4 2 1922 1922-01-31 10:00:01 blood 2078 #> 4156 26.4 2 1922 1922-01-31 10:00:01 bone 2078 #> 4157 54.7 3 1922 1922-03-02 20:00:01 blood 2079 #> 4158 54.7 3 1922 1922-03-02 20:00:01 bone 2079 #> 4159 11.0 4 1922 1922-04-02 06:00:00 blood 2080 #> 4160 11.0 4 1922 1922-04-02 06:00:00 bone 2080 #> 4161 8.0 5 1922 1922-05-02 16:00:01 blood 2081 #> 4162 8.0 5 1922 1922-05-02 16:00:01 bone 2081 #> 4163 5.8 6 1922 1922-06-02 02:00:01 blood 2082 #> 4164 5.8 6 1922 1922-06-02 02:00:01 bone 2082 #> 4165 10.9 7 1922 1922-07-02 12:00:00 blood 2083 #> 4166 10.9 7 1922 1922-07-02 12:00:00 bone 2083 #> 4167 6.5 8 1922 1922-08-01 22:00:01 blood 2084 #> 4168 6.5 8 1922 1922-08-01 22:00:01 bone 2084 #> 4169 4.7 9 1922 1922-09-01 08:00:01 blood 2085 #> 4170 4.7 9 1922 1922-09-01 08:00:01 bone 2085 #> 4171 6.2 10 1922 1922-10-01 18:00:00 blood 2086 #> 4172 6.2 10 1922 1922-10-01 18:00:00 bone 2086 #> 4173 7.4 11 1922 1922-11-01 04:00:01 blood 2087 #> 4174 7.4 11 1922 1922-11-01 04:00:01 bone 2087 #> 4175 17.5 12 1922 1922-12-01 14:00:01 blood 2088 #> 4176 17.5 12 1922 1922-12-01 14:00:01 bone 2088 #> 4177 4.5 1 1923 1923-01-01 00:00:00 blood 2089 #> 4178 4.5 1 1923 1923-01-01 00:00:00 bone 2089 #> 4179 1.5 2 1923 1923-01-31 10:00:01 blood 2090 #> 4180 1.5 2 1923 1923-01-31 10:00:01 bone 2090 #> 4181 3.3 3 1923 1923-03-02 20:00:01 blood 2091 #> 4182 3.3 3 1923 1923-03-02 20:00:01 bone 2091 #> 4183 6.1 4 1923 1923-04-02 06:00:00 blood 2092 #> 4184 6.1 4 1923 1923-04-02 06:00:00 bone 2092 #> 4185 3.2 5 1923 1923-05-02 16:00:01 blood 2093 #> 4186 3.2 5 1923 1923-05-02 16:00:01 bone 2093 #> 4187 9.1 6 1923 1923-06-02 02:00:01 blood 2094 #> 4188 9.1 6 1923 1923-06-02 02:00:01 bone 2094 #> 4189 3.5 7 1923 1923-07-02 12:00:00 blood 2095 #> 4190 3.5 7 1923 1923-07-02 12:00:00 bone 2095 #> 4191 0.5 8 1923 1923-08-01 22:00:01 blood 2096 #> 4192 0.5 8 1923 1923-08-01 22:00:01 bone 2096 #> 4193 13.2 9 1923 1923-09-01 08:00:01 blood 2097 #> 4194 13.2 9 1923 1923-09-01 08:00:01 bone 2097 #> 4195 11.6 10 1923 1923-10-01 18:00:00 blood 2098 #> 4196 11.6 10 1923 1923-10-01 18:00:00 bone 2098 #> 4197 10.0 11 1923 1923-11-01 04:00:01 blood 2099 #> 4198 10.0 11 1923 1923-11-01 04:00:01 bone 2099 #> 4199 2.8 12 1923 1923-12-01 14:00:01 blood 2100 #> 4200 2.8 12 1923 1923-12-01 14:00:01 bone 2100 #> 4201 0.5 1 1924 1924-01-01 00:00:00 blood 2101 #> 4202 0.5 1 1924 1924-01-01 00:00:00 bone 2101 #> 4203 5.1 2 1924 1924-01-31 12:00:01 blood 2102 #> 4204 5.1 2 1924 1924-01-31 12:00:01 bone 2102 #> 4205 1.8 3 1924 1924-03-02 00:00:01 blood 2103 #> 4206 1.8 3 1924 1924-03-02 00:00:01 bone 2103 #> 4207 11.3 4 1924 1924-04-01 12:00:00 blood 2104 #> 4208 11.3 4 1924 1924-04-01 12:00:00 bone 2104 #> 4209 20.8 5 1924 1924-05-02 00:00:01 blood 2105 #> 4210 20.8 5 1924 1924-05-02 00:00:01 bone 2105 #> 4211 24.0 6 1924 1924-06-01 12:00:01 blood 2106 #> 4212 24.0 6 1924 1924-06-01 12:00:01 bone 2106 #> 4213 28.1 7 1924 1924-07-02 00:00:00 blood 2107 #> 4214 28.1 7 1924 1924-07-02 00:00:00 bone 2107 #> 4215 19.3 8 1924 1924-08-01 12:00:01 blood 2108 #> 4216 19.3 8 1924 1924-08-01 12:00:01 bone 2108 #> 4217 25.1 9 1924 1924-09-01 00:00:01 blood 2109 #> 4218 25.1 9 1924 1924-09-01 00:00:01 bone 2109 #> 4219 25.6 10 1924 1924-10-01 12:00:00 blood 2110 #> 4220 25.6 10 1924 1924-10-01 12:00:00 bone 2110 #> 4221 22.5 11 1924 1924-11-01 00:00:01 blood 2111 #> 4222 22.5 11 1924 1924-11-01 00:00:01 bone 2111 #> 4223 16.5 12 1924 1924-12-01 12:00:01 blood 2112 #> 4224 16.5 12 1924 1924-12-01 12:00:01 bone 2112 #> 4225 5.5 1 1925 1925-01-01 00:00:00 blood 2113 #> 4226 5.5 1 1925 1925-01-01 00:00:00 bone 2113 #> 4227 23.2 2 1925 1925-01-31 10:00:01 blood 2114 #> 4228 23.2 2 1925 1925-01-31 10:00:01 bone 2114 #> 4229 18.0 3 1925 1925-03-02 20:00:01 blood 2115 #> 4230 18.0 3 1925 1925-03-02 20:00:01 bone 2115 #> 4231 31.7 4 1925 1925-04-02 06:00:00 blood 2116 #> 4232 31.7 4 1925 1925-04-02 06:00:00 bone 2116 #> 4233 42.8 5 1925 1925-05-02 16:00:01 blood 2117 #> 4234 42.8 5 1925 1925-05-02 16:00:01 bone 2117 #> 4235 47.5 6 1925 1925-06-02 02:00:01 blood 2118 #> 4236 47.5 6 1925 1925-06-02 02:00:01 bone 2118 #> 4237 38.5 7 1925 1925-07-02 12:00:00 blood 2119 #> 4238 38.5 7 1925 1925-07-02 12:00:00 bone 2119 #> 4239 37.9 8 1925 1925-08-01 22:00:01 blood 2120 #> 4240 37.9 8 1925 1925-08-01 22:00:01 bone 2120 #> 4241 60.2 9 1925 1925-09-01 08:00:01 blood 2121 #> 4242 60.2 9 1925 1925-09-01 08:00:01 bone 2121 #> 4243 69.2 10 1925 1925-10-01 18:00:00 blood 2122 #> 4244 69.2 10 1925 1925-10-01 18:00:00 bone 2122 #> 4245 58.6 11 1925 1925-11-01 04:00:01 blood 2123 #> 4246 58.6 11 1925 1925-11-01 04:00:01 bone 2123 #> 4247 98.6 12 1925 1925-12-01 14:00:01 blood 2124 #> 4248 98.6 12 1925 1925-12-01 14:00:01 bone 2124 #> 4249 71.8 1 1926 1926-01-01 00:00:00 blood 2125 #> 4250 71.8 1 1926 1926-01-01 00:00:00 bone 2125 #> 4251 70.0 2 1926 1926-01-31 10:00:01 blood 2126 #> 4252 70.0 2 1926 1926-01-31 10:00:01 bone 2126 #> 4253 62.5 3 1926 1926-03-02 20:00:01 blood 2127 #> 4254 62.5 3 1926 1926-03-02 20:00:01 bone 2127 #> 4255 38.5 4 1926 1926-04-02 06:00:00 blood 2128 #> 4256 38.5 4 1926 1926-04-02 06:00:00 bone 2128 #> 4257 64.3 5 1926 1926-05-02 16:00:01 blood 2129 #> 4258 64.3 5 1926 1926-05-02 16:00:01 bone 2129 #> 4259 73.5 6 1926 1926-06-02 02:00:01 blood 2130 #> 4260 73.5 6 1926 1926-06-02 02:00:01 bone 2130 #> 4261 52.3 7 1926 1926-07-02 12:00:00 blood 2131 #> 4262 52.3 7 1926 1926-07-02 12:00:00 bone 2131 #> 4263 61.6 8 1926 1926-08-01 22:00:01 blood 2132 #> 4264 61.6 8 1926 1926-08-01 22:00:01 bone 2132 #> 4265 60.8 9 1926 1926-09-01 08:00:01 blood 2133 #> 4266 60.8 9 1926 1926-09-01 08:00:01 bone 2133 #> 4267 71.5 10 1926 1926-10-01 18:00:00 blood 2134 #> 4268 71.5 10 1926 1926-10-01 18:00:00 bone 2134 #> 4269 60.5 11 1926 1926-11-01 04:00:01 blood 2135 #> 4270 60.5 11 1926 1926-11-01 04:00:01 bone 2135 #> 4271 79.4 12 1926 1926-12-01 14:00:01 blood 2136 #> 4272 79.4 12 1926 1926-12-01 14:00:01 bone 2136 #> 4273 81.6 1 1927 1927-01-01 00:00:00 blood 2137 #> 4274 81.6 1 1927 1927-01-01 00:00:00 bone 2137 #> 4275 93.0 2 1927 1927-01-31 10:00:01 blood 2138 #> 4276 93.0 2 1927 1927-01-31 10:00:01 bone 2138 #> 4277 69.6 3 1927 1927-03-02 20:00:01 blood 2139 #> 4278 69.6 3 1927 1927-03-02 20:00:01 bone 2139 #> 4279 93.5 4 1927 1927-04-02 06:00:00 blood 2140 #> 4280 93.5 4 1927 1927-04-02 06:00:00 bone 2140 #> 4281 79.1 5 1927 1927-05-02 16:00:01 blood 2141 #> 4282 79.1 5 1927 1927-05-02 16:00:01 bone 2141 #> 4283 59.1 6 1927 1927-06-02 02:00:01 blood 2142 #> 4284 59.1 6 1927 1927-06-02 02:00:01 bone 2142 #> 4285 54.9 7 1927 1927-07-02 12:00:00 blood 2143 #> 4286 54.9 7 1927 1927-07-02 12:00:00 bone 2143 #> 4287 53.8 8 1927 1927-08-01 22:00:01 blood 2144 #> 4288 53.8 8 1927 1927-08-01 22:00:01 bone 2144 #> 4289 68.4 9 1927 1927-09-01 08:00:01 blood 2145 #> 4290 68.4 9 1927 1927-09-01 08:00:01 bone 2145 #> 4291 63.1 10 1927 1927-10-01 18:00:00 blood 2146 #> 4292 63.1 10 1927 1927-10-01 18:00:00 bone 2146 #> 4293 67.2 11 1927 1927-11-01 04:00:01 blood 2147 #> 4294 67.2 11 1927 1927-11-01 04:00:01 bone 2147 #> 4295 45.2 12 1927 1927-12-01 14:00:01 blood 2148 #> 4296 45.2 12 1927 1927-12-01 14:00:01 bone 2148 #> 4297 83.5 1 1928 1928-01-01 00:00:00 blood 2149 #> 4298 83.5 1 1928 1928-01-01 00:00:00 bone 2149 #> 4299 73.5 2 1928 1928-01-31 12:00:01 blood 2150 #> 4300 73.5 2 1928 1928-01-31 12:00:01 bone 2150 #> 4301 85.4 3 1928 1928-03-02 00:00:01 blood 2151 #> 4302 85.4 3 1928 1928-03-02 00:00:01 bone 2151 #> 4303 80.6 4 1928 1928-04-01 12:00:00 blood 2152 #> 4304 80.6 4 1928 1928-04-01 12:00:00 bone 2152 #> 4305 76.9 5 1928 1928-05-02 00:00:01 blood 2153 #> 4306 76.9 5 1928 1928-05-02 00:00:01 bone 2153 #> 4307 91.4 6 1928 1928-06-01 12:00:01 blood 2154 #> 4308 91.4 6 1928 1928-06-01 12:00:01 bone 2154 #> 4309 98.0 7 1928 1928-07-02 00:00:00 blood 2155 #> 4310 98.0 7 1928 1928-07-02 00:00:00 bone 2155 #> 4311 83.8 8 1928 1928-08-01 12:00:01 blood 2156 #> 4312 83.8 8 1928 1928-08-01 12:00:01 bone 2156 #> 4313 89.7 9 1928 1928-09-01 00:00:01 blood 2157 #> 4314 89.7 9 1928 1928-09-01 00:00:01 bone 2157 #> 4315 61.4 10 1928 1928-10-01 12:00:00 blood 2158 #> 4316 61.4 10 1928 1928-10-01 12:00:00 bone 2158 #> 4317 50.3 11 1928 1928-11-01 00:00:01 blood 2159 #> 4318 50.3 11 1928 1928-11-01 00:00:01 bone 2159 #> 4319 59.0 12 1928 1928-12-01 12:00:01 blood 2160 #> 4320 59.0 12 1928 1928-12-01 12:00:01 bone 2160 #> 4321 68.9 1 1929 1929-01-01 00:00:00 blood 2161 #> 4322 68.9 1 1929 1929-01-01 00:00:00 bone 2161 #> 4323 64.1 2 1929 1929-01-31 10:00:01 blood 2162 #> 4324 64.1 2 1929 1929-01-31 10:00:01 bone 2162 #> 4325 50.2 3 1929 1929-03-02 20:00:01 blood 2163 #> 4326 50.2 3 1929 1929-03-02 20:00:01 bone 2163 #> 4327 52.8 4 1929 1929-04-02 06:00:00 blood 2164 #> 4328 52.8 4 1929 1929-04-02 06:00:00 bone 2164 #> 4329 58.2 5 1929 1929-05-02 16:00:01 blood 2165 #> 4330 58.2 5 1929 1929-05-02 16:00:01 bone 2165 #> 4331 71.9 6 1929 1929-06-02 02:00:01 blood 2166 #> 4332 71.9 6 1929 1929-06-02 02:00:01 bone 2166 #> 4333 70.2 7 1929 1929-07-02 12:00:00 blood 2167 #> 4334 70.2 7 1929 1929-07-02 12:00:00 bone 2167 #> 4335 65.8 8 1929 1929-08-01 22:00:01 blood 2168 #> 4336 65.8 8 1929 1929-08-01 22:00:01 bone 2168 #> 4337 34.4 9 1929 1929-09-01 08:00:01 blood 2169 #> 4338 34.4 9 1929 1929-09-01 08:00:01 bone 2169 #> 4339 54.0 10 1929 1929-10-01 18:00:00 blood 2170 #> 4340 54.0 10 1929 1929-10-01 18:00:00 bone 2170 #> 4341 81.1 11 1929 1929-11-01 04:00:01 blood 2171 #> 4342 81.1 11 1929 1929-11-01 04:00:01 bone 2171 #> 4343 108.0 12 1929 1929-12-01 14:00:01 blood 2172 #> 4344 108.0 12 1929 1929-12-01 14:00:01 bone 2172 #> 4345 65.3 1 1930 1930-01-01 00:00:00 blood 2173 #> 4346 65.3 1 1930 1930-01-01 00:00:00 bone 2173 #> 4347 49.2 2 1930 1930-01-31 10:00:01 blood 2174 #> 4348 49.2 2 1930 1930-01-31 10:00:01 bone 2174 #> 4349 35.0 3 1930 1930-03-02 20:00:01 blood 2175 #> 4350 35.0 3 1930 1930-03-02 20:00:01 bone 2175 #> 4351 38.2 4 1930 1930-04-02 06:00:00 blood 2176 #> 4352 38.2 4 1930 1930-04-02 06:00:00 bone 2176 #> 4353 36.8 5 1930 1930-05-02 16:00:01 blood 2177 #> 4354 36.8 5 1930 1930-05-02 16:00:01 bone 2177 #> 4355 28.8 6 1930 1930-06-02 02:00:01 blood 2178 #> 4356 28.8 6 1930 1930-06-02 02:00:01 bone 2178 #> 4357 21.9 7 1930 1930-07-02 12:00:00 blood 2179 #> 4358 21.9 7 1930 1930-07-02 12:00:00 bone 2179 #> 4359 24.9 8 1930 1930-08-01 22:00:01 blood 2180 #> 4360 24.9 8 1930 1930-08-01 22:00:01 bone 2180 #> 4361 32.1 9 1930 1930-09-01 08:00:01 blood 2181 #> 4362 32.1 9 1930 1930-09-01 08:00:01 bone 2181 #> 4363 34.4 10 1930 1930-10-01 18:00:00 blood 2182 #> 4364 34.4 10 1930 1930-10-01 18:00:00 bone 2182 #> 4365 35.6 11 1930 1930-11-01 04:00:01 blood 2183 #> 4366 35.6 11 1930 1930-11-01 04:00:01 bone 2183 #> 4367 25.8 12 1930 1930-12-01 14:00:01 blood 2184 #> 4368 25.8 12 1930 1930-12-01 14:00:01 bone 2184 #> 4369 14.6 1 1931 1931-01-01 00:00:00 blood 2185 #> 4370 14.6 1 1931 1931-01-01 00:00:00 bone 2185 #> 4371 43.1 2 1931 1931-01-31 10:00:01 blood 2186 #> 4372 43.1 2 1931 1931-01-31 10:00:01 bone 2186 #> 4373 30.0 3 1931 1931-03-02 20:00:01 blood 2187 #> 4374 30.0 3 1931 1931-03-02 20:00:01 bone 2187 #> 4375 31.2 4 1931 1931-04-02 06:00:00 blood 2188 #> 4376 31.2 4 1931 1931-04-02 06:00:00 bone 2188 #> 4377 24.6 5 1931 1931-05-02 16:00:01 blood 2189 #> 4378 24.6 5 1931 1931-05-02 16:00:01 bone 2189 #> 4379 15.3 6 1931 1931-06-02 02:00:01 blood 2190 #> 4380 15.3 6 1931 1931-06-02 02:00:01 bone 2190 #> 4381 17.4 7 1931 1931-07-02 12:00:00 blood 2191 #> 4382 17.4 7 1931 1931-07-02 12:00:00 bone 2191 #> 4383 13.0 8 1931 1931-08-01 22:00:01 blood 2192 #> 4384 13.0 8 1931 1931-08-01 22:00:01 bone 2192 #> 4385 19.0 9 1931 1931-09-01 08:00:01 blood 2193 #> 4386 19.0 9 1931 1931-09-01 08:00:01 bone 2193 #> 4387 10.0 10 1931 1931-10-01 18:00:00 blood 2194 #> 4388 10.0 10 1931 1931-10-01 18:00:00 bone 2194 #> 4389 18.7 11 1931 1931-11-01 04:00:01 blood 2195 #> 4390 18.7 11 1931 1931-11-01 04:00:01 bone 2195 #> 4391 17.8 12 1931 1931-12-01 14:00:01 blood 2196 #> 4392 17.8 12 1931 1931-12-01 14:00:01 bone 2196 #> 4393 12.1 1 1932 1932-01-01 00:00:00 blood 2197 #> 4394 12.1 1 1932 1932-01-01 00:00:00 bone 2197 #> 4395 10.6 2 1932 1932-01-31 12:00:01 blood 2198 #> 4396 10.6 2 1932 1932-01-31 12:00:01 bone 2198 #> 4397 11.2 3 1932 1932-03-02 00:00:01 blood 2199 #> 4398 11.2 3 1932 1932-03-02 00:00:01 bone 2199 #> 4399 11.2 4 1932 1932-04-01 12:00:00 blood 2200 #> 4400 11.2 4 1932 1932-04-01 12:00:00 bone 2200 #> 4401 17.9 5 1932 1932-05-02 00:00:01 blood 2201 #> 4402 17.9 5 1932 1932-05-02 00:00:01 bone 2201 #> 4403 22.2 6 1932 1932-06-01 12:00:01 blood 2202 #> 4404 22.2 6 1932 1932-06-01 12:00:01 bone 2202 #> 4405 9.6 7 1932 1932-07-02 00:00:00 blood 2203 #> 4406 9.6 7 1932 1932-07-02 00:00:00 bone 2203 #> 4407 6.8 8 1932 1932-08-01 12:00:01 blood 2204 #> 4408 6.8 8 1932 1932-08-01 12:00:01 bone 2204 #> 4409 4.0 9 1932 1932-09-01 00:00:01 blood 2205 #> 4410 4.0 9 1932 1932-09-01 00:00:01 bone 2205 #> 4411 8.9 10 1932 1932-10-01 12:00:00 blood 2206 #> 4412 8.9 10 1932 1932-10-01 12:00:00 bone 2206 #> 4413 8.2 11 1932 1932-11-01 00:00:01 blood 2207 #> 4414 8.2 11 1932 1932-11-01 00:00:01 bone 2207 #> 4415 11.0 12 1932 1932-12-01 12:00:01 blood 2208 #> 4416 11.0 12 1932 1932-12-01 12:00:01 bone 2208 #> 4417 12.3 1 1933 1933-01-01 00:00:00 blood 2209 #> 4418 12.3 1 1933 1933-01-01 00:00:00 bone 2209 #> 4419 22.2 2 1933 1933-01-31 10:00:01 blood 2210 #> 4420 22.2 2 1933 1933-01-31 10:00:01 bone 2210 #> 4421 10.1 3 1933 1933-03-02 20:00:01 blood 2211 #> 4422 10.1 3 1933 1933-03-02 20:00:01 bone 2211 #> 4423 2.9 4 1933 1933-04-02 06:00:00 blood 2212 #> 4424 2.9 4 1933 1933-04-02 06:00:00 bone 2212 #> 4425 3.2 5 1933 1933-05-02 16:00:01 blood 2213 #> 4426 3.2 5 1933 1933-05-02 16:00:01 bone 2213 #> 4427 5.2 6 1933 1933-06-02 02:00:01 blood 2214 #> 4428 5.2 6 1933 1933-06-02 02:00:01 bone 2214 #> 4429 2.8 7 1933 1933-07-02 12:00:00 blood 2215 #> 4430 2.8 7 1933 1933-07-02 12:00:00 bone 2215 #> 4431 0.2 8 1933 1933-08-01 22:00:01 blood 2216 #> 4432 0.2 8 1933 1933-08-01 22:00:01 bone 2216 #> 4433 5.1 9 1933 1933-09-01 08:00:01 blood 2217 #> 4434 5.1 9 1933 1933-09-01 08:00:01 bone 2217 #> 4435 3.0 10 1933 1933-10-01 18:00:00 blood 2218 #> 4436 3.0 10 1933 1933-10-01 18:00:00 bone 2218 #> 4437 0.6 11 1933 1933-11-01 04:00:01 blood 2219 #> 4438 0.6 11 1933 1933-11-01 04:00:01 bone 2219 #> 4439 0.3 12 1933 1933-12-01 14:00:01 blood 2220 #> 4440 0.3 12 1933 1933-12-01 14:00:01 bone 2220 #> 4441 3.4 1 1934 1934-01-01 00:00:00 blood 2221 #> 4442 3.4 1 1934 1934-01-01 00:00:00 bone 2221 #> 4443 7.8 2 1934 1934-01-31 10:00:01 blood 2222 #> 4444 7.8 2 1934 1934-01-31 10:00:01 bone 2222 #> 4445 4.3 3 1934 1934-03-02 20:00:01 blood 2223 #> 4446 4.3 3 1934 1934-03-02 20:00:01 bone 2223 #> 4447 11.3 4 1934 1934-04-02 06:00:00 blood 2224 #> 4448 11.3 4 1934 1934-04-02 06:00:00 bone 2224 #> 4449 19.7 5 1934 1934-05-02 16:00:01 blood 2225 #> 4450 19.7 5 1934 1934-05-02 16:00:01 bone 2225 #> 4451 6.7 6 1934 1934-06-02 02:00:01 blood 2226 #> 4452 6.7 6 1934 1934-06-02 02:00:01 bone 2226 #> 4453 9.3 7 1934 1934-07-02 12:00:00 blood 2227 #> 4454 9.3 7 1934 1934-07-02 12:00:00 bone 2227 #> 4455 8.3 8 1934 1934-08-01 22:00:01 blood 2228 #> 4456 8.3 8 1934 1934-08-01 22:00:01 bone 2228 #> 4457 4.0 9 1934 1934-09-01 08:00:01 blood 2229 #> 4458 4.0 9 1934 1934-09-01 08:00:01 bone 2229 #> 4459 5.7 10 1934 1934-10-01 18:00:00 blood 2230 #> 4460 5.7 10 1934 1934-10-01 18:00:00 bone 2230 #> 4461 8.7 11 1934 1934-11-01 04:00:01 blood 2231 #> 4462 8.7 11 1934 1934-11-01 04:00:01 bone 2231 #> 4463 15.4 12 1934 1934-12-01 14:00:01 blood 2232 #> 4464 15.4 12 1934 1934-12-01 14:00:01 bone 2232 #> 4465 18.9 1 1935 1935-01-01 00:00:00 blood 2233 #> 4466 18.9 1 1935 1935-01-01 00:00:00 bone 2233 #> 4467 20.5 2 1935 1935-01-31 10:00:01 blood 2234 #> 4468 20.5 2 1935 1935-01-31 10:00:01 bone 2234 #> 4469 23.1 3 1935 1935-03-02 20:00:01 blood 2235 #> 4470 23.1 3 1935 1935-03-02 20:00:01 bone 2235 #> 4471 12.2 4 1935 1935-04-02 06:00:00 blood 2236 #> 4472 12.2 4 1935 1935-04-02 06:00:00 bone 2236 #> 4473 27.3 5 1935 1935-05-02 16:00:01 blood 2237 #> 4474 27.3 5 1935 1935-05-02 16:00:01 bone 2237 #> 4475 45.7 6 1935 1935-06-02 02:00:01 blood 2238 #> 4476 45.7 6 1935 1935-06-02 02:00:01 bone 2238 #> 4477 33.9 7 1935 1935-07-02 12:00:00 blood 2239 #> 4478 33.9 7 1935 1935-07-02 12:00:00 bone 2239 #> 4479 30.1 8 1935 1935-08-01 22:00:01 blood 2240 #> 4480 30.1 8 1935 1935-08-01 22:00:01 bone 2240 #> 4481 42.1 9 1935 1935-09-01 08:00:01 blood 2241 #> 4482 42.1 9 1935 1935-09-01 08:00:01 bone 2241 #> 4483 53.2 10 1935 1935-10-01 18:00:00 blood 2242 #> 4484 53.2 10 1935 1935-10-01 18:00:00 bone 2242 #> 4485 64.2 11 1935 1935-11-01 04:00:01 blood 2243 #> 4486 64.2 11 1935 1935-11-01 04:00:01 bone 2243 #> 4487 61.5 12 1935 1935-12-01 14:00:01 blood 2244 #> 4488 61.5 12 1935 1935-12-01 14:00:01 bone 2244 #> 4489 62.8 1 1936 1936-01-01 00:00:00 blood 2245 #> 4490 62.8 1 1936 1936-01-01 00:00:00 bone 2245 #> 4491 74.3 2 1936 1936-01-31 12:00:01 blood 2246 #> 4492 74.3 2 1936 1936-01-31 12:00:01 bone 2246 #> 4493 77.1 3 1936 1936-03-02 00:00:01 blood 2247 #> 4494 77.1 3 1936 1936-03-02 00:00:01 bone 2247 #> 4495 74.9 4 1936 1936-04-01 12:00:00 blood 2248 #> 4496 74.9 4 1936 1936-04-01 12:00:00 bone 2248 #> 4497 54.6 5 1936 1936-05-02 00:00:01 blood 2249 #> 4498 54.6 5 1936 1936-05-02 00:00:01 bone 2249 #> 4499 70.0 6 1936 1936-06-01 12:00:01 blood 2250 #> 4500 70.0 6 1936 1936-06-01 12:00:01 bone 2250 #> 4501 52.3 7 1936 1936-07-02 00:00:00 blood 2251 #> 4502 52.3 7 1936 1936-07-02 00:00:00 bone 2251 #> 4503 87.0 8 1936 1936-08-01 12:00:01 blood 2252 #> 4504 87.0 8 1936 1936-08-01 12:00:01 bone 2252 #> 4505 76.0 9 1936 1936-09-01 00:00:01 blood 2253 #> 4506 76.0 9 1936 1936-09-01 00:00:01 bone 2253 #> 4507 89.0 10 1936 1936-10-01 12:00:00 blood 2254 #> 4508 89.0 10 1936 1936-10-01 12:00:00 bone 2254 #> 4509 115.4 11 1936 1936-11-01 00:00:01 blood 2255 #> 4510 115.4 11 1936 1936-11-01 00:00:01 bone 2255 #> 4511 123.4 12 1936 1936-12-01 12:00:01 blood 2256 #> 4512 123.4 12 1936 1936-12-01 12:00:01 bone 2256 #> 4513 132.5 1 1937 1937-01-01 00:00:00 blood 2257 #> 4514 132.5 1 1937 1937-01-01 00:00:00 bone 2257 #> 4515 128.5 2 1937 1937-01-31 10:00:01 blood 2258 #> 4516 128.5 2 1937 1937-01-31 10:00:01 bone 2258 #> 4517 83.9 3 1937 1937-03-02 20:00:01 blood 2259 #> 4518 83.9 3 1937 1937-03-02 20:00:01 bone 2259 #> 4519 109.3 4 1937 1937-04-02 06:00:00 blood 2260 #> 4520 109.3 4 1937 1937-04-02 06:00:00 bone 2260 #> 4521 116.7 5 1937 1937-05-02 16:00:01 blood 2261 #> 4522 116.7 5 1937 1937-05-02 16:00:01 bone 2261 #> 4523 130.3 6 1937 1937-06-02 02:00:01 blood 2262 #> 4524 130.3 6 1937 1937-06-02 02:00:01 bone 2262 #> 4525 145.1 7 1937 1937-07-02 12:00:00 blood 2263 #> 4526 145.1 7 1937 1937-07-02 12:00:00 bone 2263 #> 4527 137.7 8 1937 1937-08-01 22:00:01 blood 2264 #> 4528 137.7 8 1937 1937-08-01 22:00:01 bone 2264 #> 4529 100.7 9 1937 1937-09-01 08:00:01 blood 2265 #> 4530 100.7 9 1937 1937-09-01 08:00:01 bone 2265 #> 4531 124.9 10 1937 1937-10-01 18:00:00 blood 2266 #> 4532 124.9 10 1937 1937-10-01 18:00:00 bone 2266 #> 4533 74.4 11 1937 1937-11-01 04:00:01 blood 2267 #> 4534 74.4 11 1937 1937-11-01 04:00:01 bone 2267 #> 4535 88.8 12 1937 1937-12-01 14:00:01 blood 2268 #> 4536 88.8 12 1937 1937-12-01 14:00:01 bone 2268 #> 4537 98.4 1 1938 1938-01-01 00:00:00 blood 2269 #> 4538 98.4 1 1938 1938-01-01 00:00:00 bone 2269 #> 4539 119.2 2 1938 1938-01-31 10:00:01 blood 2270 #> 4540 119.2 2 1938 1938-01-31 10:00:01 bone 2270 #> 4541 86.5 3 1938 1938-03-02 20:00:01 blood 2271 #> 4542 86.5 3 1938 1938-03-02 20:00:01 bone 2271 #> 4543 101.0 4 1938 1938-04-02 06:00:00 blood 2272 #> 4544 101.0 4 1938 1938-04-02 06:00:00 bone 2272 #> 4545 127.4 5 1938 1938-05-02 16:00:01 blood 2273 #> 4546 127.4 5 1938 1938-05-02 16:00:01 bone 2273 #> 4547 97.5 6 1938 1938-06-02 02:00:01 blood 2274 #> 4548 97.5 6 1938 1938-06-02 02:00:01 bone 2274 #> 4549 165.3 7 1938 1938-07-02 12:00:00 blood 2275 #> 4550 165.3 7 1938 1938-07-02 12:00:00 bone 2275 #> 4551 115.7 8 1938 1938-08-01 22:00:01 blood 2276 #> 4552 115.7 8 1938 1938-08-01 22:00:01 bone 2276 #> 4553 89.6 9 1938 1938-09-01 08:00:01 blood 2277 #> 4554 89.6 9 1938 1938-09-01 08:00:01 bone 2277 #> 4555 99.1 10 1938 1938-10-01 18:00:00 blood 2278 #> 4556 99.1 10 1938 1938-10-01 18:00:00 bone 2278 #> 4557 122.2 11 1938 1938-11-01 04:00:01 blood 2279 #> 4558 122.2 11 1938 1938-11-01 04:00:01 bone 2279 #> 4559 92.7 12 1938 1938-12-01 14:00:01 blood 2280 #> 4560 92.7 12 1938 1938-12-01 14:00:01 bone 2280 #> 4561 80.3 1 1939 1939-01-01 00:00:00 blood 2281 #> 4562 80.3 1 1939 1939-01-01 00:00:00 bone 2281 #> 4563 77.4 2 1939 1939-01-31 10:00:01 blood 2282 #> 4564 77.4 2 1939 1939-01-31 10:00:01 bone 2282 #> 4565 64.6 3 1939 1939-03-02 20:00:01 blood 2283 #> 4566 64.6 3 1939 1939-03-02 20:00:01 bone 2283 #> 4567 109.1 4 1939 1939-04-02 06:00:00 blood 2284 #> 4568 109.1 4 1939 1939-04-02 06:00:00 bone 2284 #> 4569 118.3 5 1939 1939-05-02 16:00:01 blood 2285 #> 4570 118.3 5 1939 1939-05-02 16:00:01 bone 2285 #> 4571 101.0 6 1939 1939-06-02 02:00:01 blood 2286 #> 4572 101.0 6 1939 1939-06-02 02:00:01 bone 2286 #> 4573 97.6 7 1939 1939-07-02 12:00:00 blood 2287 #> 4574 97.6 7 1939 1939-07-02 12:00:00 bone 2287 #> 4575 105.8 8 1939 1939-08-01 22:00:01 blood 2288 #> 4576 105.8 8 1939 1939-08-01 22:00:01 bone 2288 #> 4577 112.6 9 1939 1939-09-01 08:00:01 blood 2289 #> 4578 112.6 9 1939 1939-09-01 08:00:01 bone 2289 #> 4579 88.1 10 1939 1939-10-01 18:00:00 blood 2290 #> 4580 88.1 10 1939 1939-10-01 18:00:00 bone 2290 #> 4581 68.1 11 1939 1939-11-01 04:00:01 blood 2291 #> 4582 68.1 11 1939 1939-11-01 04:00:01 bone 2291 #> 4583 42.1 12 1939 1939-12-01 14:00:01 blood 2292 #> 4584 42.1 12 1939 1939-12-01 14:00:01 bone 2292 #> 4585 50.5 1 1940 1940-01-01 00:00:00 blood 2293 #> 4586 50.5 1 1940 1940-01-01 00:00:00 bone 2293 #> 4587 59.4 2 1940 1940-01-31 12:00:01 blood 2294 #> 4588 59.4 2 1940 1940-01-31 12:00:01 bone 2294 #> 4589 83.3 3 1940 1940-03-02 00:00:01 blood 2295 #> 4590 83.3 3 1940 1940-03-02 00:00:01 bone 2295 #> 4591 60.7 4 1940 1940-04-01 12:00:00 blood 2296 #> 4592 60.7 4 1940 1940-04-01 12:00:00 bone 2296 #> 4593 54.4 5 1940 1940-05-02 00:00:01 blood 2297 #> 4594 54.4 5 1940 1940-05-02 00:00:01 bone 2297 #> 4595 83.9 6 1940 1940-06-01 12:00:01 blood 2298 #> 4596 83.9 6 1940 1940-06-01 12:00:01 bone 2298 #> 4597 67.5 7 1940 1940-07-02 00:00:00 blood 2299 #> 4598 67.5 7 1940 1940-07-02 00:00:00 bone 2299 #> 4599 105.5 8 1940 1940-08-01 12:00:01 blood 2300 #> 4600 105.5 8 1940 1940-08-01 12:00:01 bone 2300 #> 4601 66.5 9 1940 1940-09-01 00:00:01 blood 2301 #> 4602 66.5 9 1940 1940-09-01 00:00:01 bone 2301 #> 4603 55.0 10 1940 1940-10-01 12:00:00 blood 2302 #> 4604 55.0 10 1940 1940-10-01 12:00:00 bone 2302 #> 4605 58.4 11 1940 1940-11-01 00:00:01 blood 2303 #> 4606 58.4 11 1940 1940-11-01 00:00:01 bone 2303 #> 4607 68.3 12 1940 1940-12-01 12:00:01 blood 2304 #> 4608 68.3 12 1940 1940-12-01 12:00:01 bone 2304 #> 4609 45.6 1 1941 1941-01-01 00:00:00 blood 2305 #> 4610 45.6 1 1941 1941-01-01 00:00:00 bone 2305 #> 4611 44.5 2 1941 1941-01-31 10:00:01 blood 2306 #> 4612 44.5 2 1941 1941-01-31 10:00:01 bone 2306 #> 4613 46.4 3 1941 1941-03-02 20:00:01 blood 2307 #> 4614 46.4 3 1941 1941-03-02 20:00:01 bone 2307 #> 4615 32.8 4 1941 1941-04-02 06:00:00 blood 2308 #> 4616 32.8 4 1941 1941-04-02 06:00:00 bone 2308 #> 4617 29.5 5 1941 1941-05-02 16:00:01 blood 2309 #> 4618 29.5 5 1941 1941-05-02 16:00:01 bone 2309 #> 4619 59.8 6 1941 1941-06-02 02:00:01 blood 2310 #> 4620 59.8 6 1941 1941-06-02 02:00:01 bone 2310 #> 4621 66.9 7 1941 1941-07-02 12:00:00 blood 2311 #> 4622 66.9 7 1941 1941-07-02 12:00:00 bone 2311 #> 4623 60.0 8 1941 1941-08-01 22:00:01 blood 2312 #> 4624 60.0 8 1941 1941-08-01 22:00:01 bone 2312 #> 4625 65.9 9 1941 1941-09-01 08:00:01 blood 2313 #> 4626 65.9 9 1941 1941-09-01 08:00:01 bone 2313 #> 4627 46.3 10 1941 1941-10-01 18:00:00 blood 2314 #> 4628 46.3 10 1941 1941-10-01 18:00:00 bone 2314 #> 4629 38.3 11 1941 1941-11-01 04:00:01 blood 2315 #> 4630 38.3 11 1941 1941-11-01 04:00:01 bone 2315 #> 4631 33.7 12 1941 1941-12-01 14:00:01 blood 2316 #> 4632 33.7 12 1941 1941-12-01 14:00:01 bone 2316 #> 4633 35.6 1 1942 1942-01-01 00:00:00 blood 2317 #> 4634 35.6 1 1942 1942-01-01 00:00:00 bone 2317 #> 4635 52.8 2 1942 1942-01-31 10:00:01 blood 2318 #> 4636 52.8 2 1942 1942-01-31 10:00:01 bone 2318 #> 4637 54.2 3 1942 1942-03-02 20:00:01 blood 2319 #> 4638 54.2 3 1942 1942-03-02 20:00:01 bone 2319 #> 4639 60.7 4 1942 1942-04-02 06:00:00 blood 2320 #> 4640 60.7 4 1942 1942-04-02 06:00:00 bone 2320 #> 4641 25.0 5 1942 1942-05-02 16:00:01 blood 2321 #> 4642 25.0 5 1942 1942-05-02 16:00:01 bone 2321 #> 4643 11.4 6 1942 1942-06-02 02:00:01 blood 2322 #> 4644 11.4 6 1942 1942-06-02 02:00:01 bone 2322 #> 4645 17.7 7 1942 1942-07-02 12:00:00 blood 2323 #> 4646 17.7 7 1942 1942-07-02 12:00:00 bone 2323 #> 4647 20.2 8 1942 1942-08-01 22:00:01 blood 2324 #> 4648 20.2 8 1942 1942-08-01 22:00:01 bone 2324 #> 4649 17.2 9 1942 1942-09-01 08:00:01 blood 2325 #> 4650 17.2 9 1942 1942-09-01 08:00:01 bone 2325 #> 4651 19.2 10 1942 1942-10-01 18:00:00 blood 2326 #> 4652 19.2 10 1942 1942-10-01 18:00:00 bone 2326 #> 4653 30.7 11 1942 1942-11-01 04:00:01 blood 2327 #> 4654 30.7 11 1942 1942-11-01 04:00:01 bone 2327 #> 4655 22.5 12 1942 1942-12-01 14:00:01 blood 2328 #> 4656 22.5 12 1942 1942-12-01 14:00:01 bone 2328 #> 4657 12.4 1 1943 1943-01-01 00:00:00 blood 2329 #> 4658 12.4 1 1943 1943-01-01 00:00:00 bone 2329 #> 4659 28.9 2 1943 1943-01-31 10:00:01 blood 2330 #> 4660 28.9 2 1943 1943-01-31 10:00:01 bone 2330 #> 4661 27.4 3 1943 1943-03-02 20:00:01 blood 2331 #> 4662 27.4 3 1943 1943-03-02 20:00:01 bone 2331 #> 4663 26.1 4 1943 1943-04-02 06:00:00 blood 2332 #> 4664 26.1 4 1943 1943-04-02 06:00:00 bone 2332 #> 4665 14.1 5 1943 1943-05-02 16:00:01 blood 2333 #> 4666 14.1 5 1943 1943-05-02 16:00:01 bone 2333 #> 4667 7.6 6 1943 1943-06-02 02:00:01 blood 2334 #> 4668 7.6 6 1943 1943-06-02 02:00:01 bone 2334 #> 4669 13.2 7 1943 1943-07-02 12:00:00 blood 2335 #> 4670 13.2 7 1943 1943-07-02 12:00:00 bone 2335 #> 4671 19.4 8 1943 1943-08-01 22:00:01 blood 2336 #> 4672 19.4 8 1943 1943-08-01 22:00:01 bone 2336 #> 4673 10.0 9 1943 1943-09-01 08:00:01 blood 2337 #> 4674 10.0 9 1943 1943-09-01 08:00:01 bone 2337 #> 4675 7.8 10 1943 1943-10-01 18:00:00 blood 2338 #> 4676 7.8 10 1943 1943-10-01 18:00:00 bone 2338 #> 4677 10.2 11 1943 1943-11-01 04:00:01 blood 2339 #> 4678 10.2 11 1943 1943-11-01 04:00:01 bone 2339 #> 4679 18.8 12 1943 1943-12-01 14:00:01 blood 2340 #> 4680 18.8 12 1943 1943-12-01 14:00:01 bone 2340 #> 4681 3.7 1 1944 1944-01-01 00:00:00 blood 2341 #> 4682 3.7 1 1944 1944-01-01 00:00:00 bone 2341 #> 4683 0.5 2 1944 1944-01-31 12:00:01 blood 2342 #> 4684 0.5 2 1944 1944-01-31 12:00:01 bone 2342 #> 4685 11.0 3 1944 1944-03-02 00:00:01 blood 2343 #> 4686 11.0 3 1944 1944-03-02 00:00:01 bone 2343 #> 4687 0.3 4 1944 1944-04-01 12:00:00 blood 2344 #> 4688 0.3 4 1944 1944-04-01 12:00:00 bone 2344 #> 4689 2.5 5 1944 1944-05-02 00:00:01 blood 2345 #> 4690 2.5 5 1944 1944-05-02 00:00:01 bone 2345 #> 4691 5.0 6 1944 1944-06-01 12:00:01 blood 2346 #> 4692 5.0 6 1944 1944-06-01 12:00:01 bone 2346 #> 4693 5.0 7 1944 1944-07-02 00:00:00 blood 2347 #> 4694 5.0 7 1944 1944-07-02 00:00:00 bone 2347 #> 4695 16.7 8 1944 1944-08-01 12:00:01 blood 2348 #> 4696 16.7 8 1944 1944-08-01 12:00:01 bone 2348 #> 4697 14.3 9 1944 1944-09-01 00:00:01 blood 2349 #> 4698 14.3 9 1944 1944-09-01 00:00:01 bone 2349 #> 4699 16.9 10 1944 1944-10-01 12:00:00 blood 2350 #> 4700 16.9 10 1944 1944-10-01 12:00:00 bone 2350 #> 4701 10.8 11 1944 1944-11-01 00:00:01 blood 2351 #> 4702 10.8 11 1944 1944-11-01 00:00:01 bone 2351 #> 4703 28.4 12 1944 1944-12-01 12:00:01 blood 2352 #> 4704 28.4 12 1944 1944-12-01 12:00:01 bone 2352 #> 4705 18.5 1 1945 1945-01-01 00:00:00 blood 2353 #> 4706 18.5 1 1945 1945-01-01 00:00:00 bone 2353 #> 4707 12.7 2 1945 1945-01-31 10:00:01 blood 2354 #> 4708 12.7 2 1945 1945-01-31 10:00:01 bone 2354 #> 4709 21.5 3 1945 1945-03-02 20:00:01 blood 2355 #> 4710 21.5 3 1945 1945-03-02 20:00:01 bone 2355 #> 4711 32.0 4 1945 1945-04-02 06:00:00 blood 2356 #> 4712 32.0 4 1945 1945-04-02 06:00:00 bone 2356 #> 4713 30.6 5 1945 1945-05-02 16:00:01 blood 2357 #> 4714 30.6 5 1945 1945-05-02 16:00:01 bone 2357 #> 4715 36.2 6 1945 1945-06-02 02:00:01 blood 2358 #> 4716 36.2 6 1945 1945-06-02 02:00:01 bone 2358 #> 4717 42.6 7 1945 1945-07-02 12:00:00 blood 2359 #> 4718 42.6 7 1945 1945-07-02 12:00:00 bone 2359 #> 4719 25.9 8 1945 1945-08-01 22:00:01 blood 2360 #> 4720 25.9 8 1945 1945-08-01 22:00:01 bone 2360 #> 4721 34.9 9 1945 1945-09-01 08:00:01 blood 2361 #> 4722 34.9 9 1945 1945-09-01 08:00:01 bone 2361 #> 4723 68.8 10 1945 1945-10-01 18:00:00 blood 2362 #> 4724 68.8 10 1945 1945-10-01 18:00:00 bone 2362 #> 4725 46.0 11 1945 1945-11-01 04:00:01 blood 2363 #> 4726 46.0 11 1945 1945-11-01 04:00:01 bone 2363 #> 4727 27.4 12 1945 1945-12-01 14:00:01 blood 2364 #> 4728 27.4 12 1945 1945-12-01 14:00:01 bone 2364 #> 4729 47.6 1 1946 1946-01-01 00:00:00 blood 2365 #> 4730 47.6 1 1946 1946-01-01 00:00:00 bone 2365 #> 4731 86.2 2 1946 1946-01-31 10:00:01 blood 2366 #> 4732 86.2 2 1946 1946-01-31 10:00:01 bone 2366 #> 4733 76.6 3 1946 1946-03-02 20:00:01 blood 2367 #> 4734 76.6 3 1946 1946-03-02 20:00:01 bone 2367 #> 4735 75.7 4 1946 1946-04-02 06:00:00 blood 2368 #> 4736 75.7 4 1946 1946-04-02 06:00:00 bone 2368 #> 4737 84.9 5 1946 1946-05-02 16:00:01 blood 2369 #> 4738 84.9 5 1946 1946-05-02 16:00:01 bone 2369 #> 4739 73.5 6 1946 1946-06-02 02:00:01 blood 2370 #> 4740 73.5 6 1946 1946-06-02 02:00:01 bone 2370 #> 4741 116.2 7 1946 1946-07-02 12:00:00 blood 2371 #> 4742 116.2 7 1946 1946-07-02 12:00:00 bone 2371 #> 4743 107.2 8 1946 1946-08-01 22:00:01 blood 2372 #> 4744 107.2 8 1946 1946-08-01 22:00:01 bone 2372 #> 4745 94.4 9 1946 1946-09-01 08:00:01 blood 2373 #> 4746 94.4 9 1946 1946-09-01 08:00:01 bone 2373 #> 4747 102.3 10 1946 1946-10-01 18:00:00 blood 2374 #> 4748 102.3 10 1946 1946-10-01 18:00:00 bone 2374 #> 4749 123.8 11 1946 1946-11-01 04:00:01 blood 2375 #> 4750 123.8 11 1946 1946-11-01 04:00:01 bone 2375 #> 4751 121.7 12 1946 1946-12-01 14:00:01 blood 2376 #> 4752 121.7 12 1946 1946-12-01 14:00:01 bone 2376 #> 4753 115.7 1 1947 1947-01-01 00:00:00 blood 2377 #> 4754 115.7 1 1947 1947-01-01 00:00:00 bone 2377 #> 4755 113.4 2 1947 1947-01-31 10:00:01 blood 2378 #> 4756 113.4 2 1947 1947-01-31 10:00:01 bone 2378 #> 4757 129.8 3 1947 1947-03-02 20:00:01 blood 2379 #> 4758 129.8 3 1947 1947-03-02 20:00:01 bone 2379 #> 4759 149.8 4 1947 1947-04-02 06:00:00 blood 2380 #> 4760 149.8 4 1947 1947-04-02 06:00:00 bone 2380 #> 4761 201.3 5 1947 1947-05-02 16:00:01 blood 2381 #> 4762 201.3 5 1947 1947-05-02 16:00:01 bone 2381 #> 4763 163.9 6 1947 1947-06-02 02:00:01 blood 2382 #> 4764 163.9 6 1947 1947-06-02 02:00:01 bone 2382 #> 4765 157.9 7 1947 1947-07-02 12:00:00 blood 2383 #> 4766 157.9 7 1947 1947-07-02 12:00:00 bone 2383 #> 4767 188.8 8 1947 1947-08-01 22:00:01 blood 2384 #> 4768 188.8 8 1947 1947-08-01 22:00:01 bone 2384 #> 4769 169.4 9 1947 1947-09-01 08:00:01 blood 2385 #> 4770 169.4 9 1947 1947-09-01 08:00:01 bone 2385 #> 4771 163.6 10 1947 1947-10-01 18:00:00 blood 2386 #> 4772 163.6 10 1947 1947-10-01 18:00:00 bone 2386 #> 4773 128.0 11 1947 1947-11-01 04:00:01 blood 2387 #> 4774 128.0 11 1947 1947-11-01 04:00:01 bone 2387 #> 4775 116.5 12 1947 1947-12-01 14:00:01 blood 2388 #> 4776 116.5 12 1947 1947-12-01 14:00:01 bone 2388 #> 4777 108.5 1 1948 1948-01-01 00:00:00 blood 2389 #> 4778 108.5 1 1948 1948-01-01 00:00:00 bone 2389 #> 4779 86.1 2 1948 1948-01-31 12:00:01 blood 2390 #> 4780 86.1 2 1948 1948-01-31 12:00:01 bone 2390 #> 4781 94.8 3 1948 1948-03-02 00:00:01 blood 2391 #> 4782 94.8 3 1948 1948-03-02 00:00:01 bone 2391 #> 4783 189.7 4 1948 1948-04-01 12:00:00 blood 2392 #> 4784 189.7 4 1948 1948-04-01 12:00:00 bone 2392 #> 4785 174.0 5 1948 1948-05-02 00:00:01 blood 2393 #> 4786 174.0 5 1948 1948-05-02 00:00:01 bone 2393 #> 4787 167.8 6 1948 1948-06-01 12:00:01 blood 2394 #> 4788 167.8 6 1948 1948-06-01 12:00:01 bone 2394 #> 4789 142.2 7 1948 1948-07-02 00:00:00 blood 2395 #> 4790 142.2 7 1948 1948-07-02 00:00:00 bone 2395 #> 4791 157.9 8 1948 1948-08-01 12:00:01 blood 2396 #> 4792 157.9 8 1948 1948-08-01 12:00:01 bone 2396 #> 4793 143.3 9 1948 1948-09-01 00:00:01 blood 2397 #> 4794 143.3 9 1948 1948-09-01 00:00:01 bone 2397 #> #> $data_test #> y season year date series time #> 1 136.3 10 1948 1948-10-01 12:00:00 blood 2398 #> 2 136.3 10 1948 1948-10-01 12:00:00 bone 2398 #> 3 95.8 11 1948 1948-11-01 00:00:01 blood 2399 #> 4 95.8 11 1948 1948-11-01 00:00:01 bone 2399 #> 5 138.0 12 1948 1948-12-01 12:00:01 blood 2400 #> 6 138.0 12 1948 1948-12-01 12:00:01 bone 2400 #> 7 119.1 1 1949 1949-01-01 00:00:00 blood 2401 #> 8 119.1 1 1949 1949-01-01 00:00:00 bone 2401 #> 9 182.3 2 1949 1949-01-31 10:00:01 blood 2402 #> 10 182.3 2 1949 1949-01-31 10:00:01 bone 2402 #> 11 157.5 3 1949 1949-03-02 20:00:01 blood 2403 #> 12 157.5 3 1949 1949-03-02 20:00:01 bone 2403 #> 13 147.0 4 1949 1949-04-02 06:00:00 blood 2404 #> 14 147.0 4 1949 1949-04-02 06:00:00 bone 2404 #> 15 106.2 5 1949 1949-05-02 16:00:01 blood 2405 #> 16 106.2 5 1949 1949-05-02 16:00:01 bone 2405 #> 17 121.7 6 1949 1949-06-02 02:00:01 blood 2406 #> 18 121.7 6 1949 1949-06-02 02:00:01 bone 2406 #> 19 125.8 7 1949 1949-07-02 12:00:00 blood 2407 #> 20 125.8 7 1949 1949-07-02 12:00:00 bone 2407 #> 21 123.8 8 1949 1949-08-01 22:00:01 blood 2408 #> 22 123.8 8 1949 1949-08-01 22:00:01 bone 2408 #> 23 145.3 9 1949 1949-09-01 08:00:01 blood 2409 #> 24 145.3 9 1949 1949-09-01 08:00:01 bone 2409 #> 25 131.6 10 1949 1949-10-01 18:00:00 blood 2410 #> 26 131.6 10 1949 1949-10-01 18:00:00 bone 2410 #> 27 143.5 11 1949 1949-11-01 04:00:01 blood 2411 #> 28 143.5 11 1949 1949-11-01 04:00:01 bone 2411 #> 29 117.6 12 1949 1949-12-01 14:00:01 blood 2412 #> 30 117.6 12 1949 1949-12-01 14:00:01 bone 2412 #> 31 101.6 1 1950 1950-01-01 00:00:00 blood 2413 #> 32 101.6 1 1950 1950-01-01 00:00:00 bone 2413 #> 33 94.8 2 1950 1950-01-31 10:00:01 blood 2414 #> 34 94.8 2 1950 1950-01-31 10:00:01 bone 2414 #> 35 109.7 3 1950 1950-03-02 20:00:01 blood 2415 #> 36 109.7 3 1950 1950-03-02 20:00:01 bone 2415 #> 37 113.4 4 1950 1950-04-02 06:00:00 blood 2416 #> 38 113.4 4 1950 1950-04-02 06:00:00 bone 2416 #> 39 106.2 5 1950 1950-05-02 16:00:01 blood 2417 #> 40 106.2 5 1950 1950-05-02 16:00:01 bone 2417 #> 41 83.6 6 1950 1950-06-02 02:00:01 blood 2418 #> 42 83.6 6 1950 1950-06-02 02:00:01 bone 2418 #> 43 91.0 7 1950 1950-07-02 12:00:00 blood 2419 #> 44 91.0 7 1950 1950-07-02 12:00:00 bone 2419 #> 45 85.2 8 1950 1950-08-01 22:00:01 blood 2420 #> 46 85.2 8 1950 1950-08-01 22:00:01 bone 2420 #> 47 51.3 9 1950 1950-09-01 08:00:01 blood 2421 #> 48 51.3 9 1950 1950-09-01 08:00:01 bone 2421 #> 49 61.4 10 1950 1950-10-01 18:00:00 blood 2422 #> 50 61.4 10 1950 1950-10-01 18:00:00 bone 2422 #> 51 54.8 11 1950 1950-11-01 04:00:01 blood 2423 #> 52 54.8 11 1950 1950-11-01 04:00:01 bone 2423 #> 53 54.1 12 1950 1950-12-01 14:00:01 blood 2424 #> 54 54.1 12 1950 1950-12-01 14:00:01 bone 2424 #> 55 59.9 1 1951 1951-01-01 00:00:00 blood 2425 #> 56 59.9 1 1951 1951-01-01 00:00:00 bone 2425 #> 57 59.9 2 1951 1951-01-31 10:00:01 blood 2426 #> 58 59.9 2 1951 1951-01-31 10:00:01 bone 2426 #> 59 59.9 3 1951 1951-03-02 20:00:01 blood 2427 #> 60 59.9 3 1951 1951-03-02 20:00:01 bone 2427 #> 61 92.9 4 1951 1951-04-02 06:00:00 blood 2428 #> 62 92.9 4 1951 1951-04-02 06:00:00 bone 2428 #> 63 108.5 5 1951 1951-05-02 16:00:01 blood 2429 #> 64 108.5 5 1951 1951-05-02 16:00:01 bone 2429 #> 65 100.6 6 1951 1951-06-02 02:00:01 blood 2430 #> 66 100.6 6 1951 1951-06-02 02:00:01 bone 2430 #> 67 61.5 7 1951 1951-07-02 12:00:00 blood 2431 #> 68 61.5 7 1951 1951-07-02 12:00:00 bone 2431 #> 69 61.0 8 1951 1951-08-01 22:00:01 blood 2432 #> 70 61.0 8 1951 1951-08-01 22:00:01 bone 2432 #> 71 83.1 9 1951 1951-09-01 08:00:01 blood 2433 #> 72 83.1 9 1951 1951-09-01 08:00:01 bone 2433 #> 73 51.6 10 1951 1951-10-01 18:00:00 blood 2434 #> 74 51.6 10 1951 1951-10-01 18:00:00 bone 2434 #> 75 52.4 11 1951 1951-11-01 04:00:01 blood 2435 #> 76 52.4 11 1951 1951-11-01 04:00:01 bone 2435 #> 77 45.8 12 1951 1951-12-01 14:00:01 blood 2436 #> 78 45.8 12 1951 1951-12-01 14:00:01 bone 2436 #> 79 40.7 1 1952 1952-01-01 00:00:00 blood 2437 #> 80 40.7 1 1952 1952-01-01 00:00:00 bone 2437 #> 81 22.7 2 1952 1952-01-31 12:00:01 blood 2438 #> 82 22.7 2 1952 1952-01-31 12:00:01 bone 2438 #> 83 22.0 3 1952 1952-03-02 00:00:01 blood 2439 #> 84 22.0 3 1952 1952-03-02 00:00:01 bone 2439 #> 85 29.1 4 1952 1952-04-01 12:00:00 blood 2440 #> 86 29.1 4 1952 1952-04-01 12:00:00 bone 2440 #> 87 23.4 5 1952 1952-05-02 00:00:01 blood 2441 #> 88 23.4 5 1952 1952-05-02 00:00:01 bone 2441 #> 89 36.4 6 1952 1952-06-01 12:00:01 blood 2442 #> 90 36.4 6 1952 1952-06-01 12:00:01 bone 2442 #> 91 39.3 7 1952 1952-07-02 00:00:00 blood 2443 #> 92 39.3 7 1952 1952-07-02 00:00:00 bone 2443 #> 93 54.9 8 1952 1952-08-01 12:00:01 blood 2444 #> 94 54.9 8 1952 1952-08-01 12:00:01 bone 2444 #> 95 28.2 9 1952 1952-09-01 00:00:01 blood 2445 #> 96 28.2 9 1952 1952-09-01 00:00:01 bone 2445 #> 97 23.8 10 1952 1952-10-01 12:00:00 blood 2446 #> 98 23.8 10 1952 1952-10-01 12:00:00 bone 2446 #> 99 22.1 11 1952 1952-11-01 00:00:01 blood 2447 #> 100 22.1 11 1952 1952-11-01 00:00:01 bone 2447 #> 101 34.3 12 1952 1952-12-01 12:00:01 blood 2448 #> 102 34.3 12 1952 1952-12-01 12:00:01 bone 2448 #> 103 26.5 1 1953 1953-01-01 00:00:00 blood 2449 #> 104 26.5 1 1953 1953-01-01 00:00:00 bone 2449 #> 105 3.9 2 1953 1953-01-31 10:00:01 blood 2450 #> 106 3.9 2 1953 1953-01-31 10:00:01 bone 2450 #> 107 10.0 3 1953 1953-03-02 20:00:01 blood 2451 #> 108 10.0 3 1953 1953-03-02 20:00:01 bone 2451 #> 109 27.8 4 1953 1953-04-02 06:00:00 blood 2452 #> 110 27.8 4 1953 1953-04-02 06:00:00 bone 2452 #> 111 12.5 5 1953 1953-05-02 16:00:01 blood 2453 #> 112 12.5 5 1953 1953-05-02 16:00:01 bone 2453 #> 113 21.8 6 1953 1953-06-02 02:00:01 blood 2454 #> 114 21.8 6 1953 1953-06-02 02:00:01 bone 2454 #> 115 8.6 7 1953 1953-07-02 12:00:00 blood 2455 #> 116 8.6 7 1953 1953-07-02 12:00:00 bone 2455 #> 117 23.5 8 1953 1953-08-01 22:00:01 blood 2456 #> 118 23.5 8 1953 1953-08-01 22:00:01 bone 2456 #> 119 19.3 9 1953 1953-09-01 08:00:01 blood 2457 #> 120 19.3 9 1953 1953-09-01 08:00:01 bone 2457 #> 121 8.2 10 1953 1953-10-01 18:00:00 blood 2458 #> 122 8.2 10 1953 1953-10-01 18:00:00 bone 2458 #> 123 1.6 11 1953 1953-11-01 04:00:01 blood 2459 #> 124 1.6 11 1953 1953-11-01 04:00:01 bone 2459 #> 125 2.5 12 1953 1953-12-01 14:00:01 blood 2460 #> 126 2.5 12 1953 1953-12-01 14:00:01 bone 2460 #> 127 0.2 1 1954 1954-01-01 00:00:00 blood 2461 #> 128 0.2 1 1954 1954-01-01 00:00:00 bone 2461 #> 129 0.5 2 1954 1954-01-31 10:00:01 blood 2462 #> 130 0.5 2 1954 1954-01-31 10:00:01 bone 2462 #> 131 10.9 3 1954 1954-03-02 20:00:01 blood 2463 #> 132 10.9 3 1954 1954-03-02 20:00:01 bone 2463 #> 133 1.8 4 1954 1954-04-02 06:00:00 blood 2464 #> 134 1.8 4 1954 1954-04-02 06:00:00 bone 2464 #> 135 0.8 5 1954 1954-05-02 16:00:01 blood 2465 #> 136 0.8 5 1954 1954-05-02 16:00:01 bone 2465 #> 137 0.2 6 1954 1954-06-02 02:00:01 blood 2466 #> 138 0.2 6 1954 1954-06-02 02:00:01 bone 2466 #> 139 4.8 7 1954 1954-07-02 12:00:00 blood 2467 #> 140 4.8 7 1954 1954-07-02 12:00:00 bone 2467 #> 141 8.4 8 1954 1954-08-01 22:00:01 blood 2468 #> 142 8.4 8 1954 1954-08-01 22:00:01 bone 2468 #> 143 1.5 9 1954 1954-09-01 08:00:01 blood 2469 #> 144 1.5 9 1954 1954-09-01 08:00:01 bone 2469 #> 145 7.0 10 1954 1954-10-01 18:00:00 blood 2470 #> 146 7.0 10 1954 1954-10-01 18:00:00 bone 2470 #> 147 9.2 11 1954 1954-11-01 04:00:01 blood 2471 #> 148 9.2 11 1954 1954-11-01 04:00:01 bone 2471 #> 149 7.6 12 1954 1954-12-01 14:00:01 blood 2472 #> 150 7.6 12 1954 1954-12-01 14:00:01 bone 2472 #> 151 23.1 1 1955 1955-01-01 00:00:00 blood 2473 #> 152 23.1 1 1955 1955-01-01 00:00:00 bone 2473 #> 153 20.8 2 1955 1955-01-31 10:00:01 blood 2474 #> 154 20.8 2 1955 1955-01-31 10:00:01 bone 2474 #> 155 4.9 3 1955 1955-03-02 20:00:01 blood 2475 #> 156 4.9 3 1955 1955-03-02 20:00:01 bone 2475 #> 157 11.3 4 1955 1955-04-02 06:00:00 blood 2476 #> 158 11.3 4 1955 1955-04-02 06:00:00 bone 2476 #> 159 28.9 5 1955 1955-05-02 16:00:01 blood 2477 #> 160 28.9 5 1955 1955-05-02 16:00:01 bone 2477 #> 161 31.7 6 1955 1955-06-02 02:00:01 blood 2478 #> 162 31.7 6 1955 1955-06-02 02:00:01 bone 2478 #> 163 26.7 7 1955 1955-07-02 12:00:00 blood 2479 #> 164 26.7 7 1955 1955-07-02 12:00:00 bone 2479 #> 165 40.7 8 1955 1955-08-01 22:00:01 blood 2480 #> 166 40.7 8 1955 1955-08-01 22:00:01 bone 2480 #> 167 42.7 9 1955 1955-09-01 08:00:01 blood 2481 #> 168 42.7 9 1955 1955-09-01 08:00:01 bone 2481 #> 169 58.5 10 1955 1955-10-01 18:00:00 blood 2482 #> 170 58.5 10 1955 1955-10-01 18:00:00 bone 2482 #> 171 89.2 11 1955 1955-11-01 04:00:01 blood 2483 #> 172 89.2 11 1955 1955-11-01 04:00:01 bone 2483 #> 173 76.9 12 1955 1955-12-01 14:00:01 blood 2484 #> 174 76.9 12 1955 1955-12-01 14:00:01 bone 2484 #> 175 73.6 1 1956 1956-01-01 00:00:00 blood 2485 #> 176 73.6 1 1956 1956-01-01 00:00:00 bone 2485 #> 177 124.0 2 1956 1956-01-31 12:00:01 blood 2486 #> 178 124.0 2 1956 1956-01-31 12:00:01 bone 2486 #> 179 118.4 3 1956 1956-03-02 00:00:01 blood 2487 #> 180 118.4 3 1956 1956-03-02 00:00:01 bone 2487 #> 181 110.7 4 1956 1956-04-01 12:00:00 blood 2488 #> 182 110.7 4 1956 1956-04-01 12:00:00 bone 2488 #> 183 136.6 5 1956 1956-05-02 00:00:01 blood 2489 #> 184 136.6 5 1956 1956-05-02 00:00:01 bone 2489 #> 185 116.6 6 1956 1956-06-01 12:00:01 blood 2490 #> 186 116.6 6 1956 1956-06-01 12:00:01 bone 2490 #> 187 129.1 7 1956 1956-07-02 00:00:00 blood 2491 #> 188 129.1 7 1956 1956-07-02 00:00:00 bone 2491 #> 189 169.6 8 1956 1956-08-01 12:00:01 blood 2492 #> 190 169.6 8 1956 1956-08-01 12:00:01 bone 2492 #> 191 173.2 9 1956 1956-09-01 00:00:01 blood 2493 #> 192 173.2 9 1956 1956-09-01 00:00:01 bone 2493 #> 193 155.3 10 1956 1956-10-01 12:00:00 blood 2494 #> 194 155.3 10 1956 1956-10-01 12:00:00 bone 2494 #> 195 201.3 11 1956 1956-11-01 00:00:01 blood 2495 #> 196 201.3 11 1956 1956-11-01 00:00:01 bone 2495 #> 197 192.1 12 1956 1956-12-01 12:00:01 blood 2496 #> 198 192.1 12 1956 1956-12-01 12:00:01 bone 2496 #> 199 165.0 1 1957 1957-01-01 00:00:00 blood 2497 #> 200 165.0 1 1957 1957-01-01 00:00:00 bone 2497 #> 201 130.2 2 1957 1957-01-31 10:00:01 blood 2498 #> 202 130.2 2 1957 1957-01-31 10:00:01 bone 2498 #> 203 157.4 3 1957 1957-03-02 20:00:01 blood 2499 #> 204 157.4 3 1957 1957-03-02 20:00:01 bone 2499 #> 205 175.2 4 1957 1957-04-02 06:00:00 blood 2500 #> 206 175.2 4 1957 1957-04-02 06:00:00 bone 2500 #> 207 164.6 5 1957 1957-05-02 16:00:01 blood 2501 #> 208 164.6 5 1957 1957-05-02 16:00:01 bone 2501 #> 209 200.7 6 1957 1957-06-02 02:00:01 blood 2502 #> 210 200.7 6 1957 1957-06-02 02:00:01 bone 2502 #> 211 187.2 7 1957 1957-07-02 12:00:00 blood 2503 #> 212 187.2 7 1957 1957-07-02 12:00:00 bone 2503 #> 213 158.0 8 1957 1957-08-01 22:00:01 blood 2504 #> 214 158.0 8 1957 1957-08-01 22:00:01 bone 2504 #> 215 235.8 9 1957 1957-09-01 08:00:01 blood 2505 #> 216 235.8 9 1957 1957-09-01 08:00:01 bone 2505 #> 217 253.8 10 1957 1957-10-01 18:00:00 blood 2506 #> 218 253.8 10 1957 1957-10-01 18:00:00 bone 2506 #> 219 210.9 11 1957 1957-11-01 04:00:01 blood 2507 #> 220 210.9 11 1957 1957-11-01 04:00:01 bone 2507 #> 221 239.4 12 1957 1957-12-01 14:00:01 blood 2508 #> 222 239.4 12 1957 1957-12-01 14:00:01 bone 2508 #> 223 202.5 1 1958 1958-01-01 00:00:00 blood 2509 #> 224 202.5 1 1958 1958-01-01 00:00:00 bone 2509 #> 225 164.9 2 1958 1958-01-31 10:00:01 blood 2510 #> 226 164.9 2 1958 1958-01-31 10:00:01 bone 2510 #> 227 190.7 3 1958 1958-03-02 20:00:01 blood 2511 #> 228 190.7 3 1958 1958-03-02 20:00:01 bone 2511 #> 229 196.0 4 1958 1958-04-02 06:00:00 blood 2512 #> 230 196.0 4 1958 1958-04-02 06:00:00 bone 2512 #> 231 175.3 5 1958 1958-05-02 16:00:01 blood 2513 #> 232 175.3 5 1958 1958-05-02 16:00:01 bone 2513 #> 233 171.5 6 1958 1958-06-02 02:00:01 blood 2514 #> 234 171.5 6 1958 1958-06-02 02:00:01 bone 2514 #> 235 191.4 7 1958 1958-07-02 12:00:00 blood 2515 #> 236 191.4 7 1958 1958-07-02 12:00:00 bone 2515 #> 237 200.2 8 1958 1958-08-01 22:00:01 blood 2516 #> 238 200.2 8 1958 1958-08-01 22:00:01 bone 2516 #> 239 201.2 9 1958 1958-09-01 08:00:01 blood 2517 #> 240 201.2 9 1958 1958-09-01 08:00:01 bone 2517 #> 241 181.5 10 1958 1958-10-01 18:00:00 blood 2518 #> 242 181.5 10 1958 1958-10-01 18:00:00 bone 2518 #> 243 152.3 11 1958 1958-11-01 04:00:01 blood 2519 #> 244 152.3 11 1958 1958-11-01 04:00:01 bone 2519 #> 245 187.6 12 1958 1958-12-01 14:00:01 blood 2520 #> 246 187.6 12 1958 1958-12-01 14:00:01 bone 2520 #> 247 217.4 1 1959 1959-01-01 00:00:00 blood 2521 #> 248 217.4 1 1959 1959-01-01 00:00:00 bone 2521 #> 249 143.1 2 1959 1959-01-31 10:00:01 blood 2522 #> 250 143.1 2 1959 1959-01-31 10:00:01 bone 2522 #> 251 185.7 3 1959 1959-03-02 20:00:01 blood 2523 #> 252 185.7 3 1959 1959-03-02 20:00:01 bone 2523 #> 253 163.3 4 1959 1959-04-02 06:00:00 blood 2524 #> 254 163.3 4 1959 1959-04-02 06:00:00 bone 2524 #> 255 172.0 5 1959 1959-05-02 16:00:01 blood 2525 #> 256 172.0 5 1959 1959-05-02 16:00:01 bone 2525 #> 257 168.7 6 1959 1959-06-02 02:00:01 blood 2526 #> 258 168.7 6 1959 1959-06-02 02:00:01 bone 2526 #> 259 149.6 7 1959 1959-07-02 12:00:00 blood 2527 #> 260 149.6 7 1959 1959-07-02 12:00:00 bone 2527 #> 261 199.6 8 1959 1959-08-01 22:00:01 blood 2528 #> 262 199.6 8 1959 1959-08-01 22:00:01 bone 2528 #> 263 145.2 9 1959 1959-09-01 08:00:01 blood 2529 #> 264 145.2 9 1959 1959-09-01 08:00:01 bone 2529 #> 265 111.4 10 1959 1959-10-01 18:00:00 blood 2530 #> 266 111.4 10 1959 1959-10-01 18:00:00 bone 2530 #> 267 124.0 11 1959 1959-11-01 04:00:01 blood 2531 #> 268 124.0 11 1959 1959-11-01 04:00:01 bone 2531 #> 269 125.0 12 1959 1959-12-01 14:00:01 blood 2532 #> 270 125.0 12 1959 1959-12-01 14:00:01 bone 2532 #> 271 146.3 1 1960 1960-01-01 00:00:00 blood 2533 #> 272 146.3 1 1960 1960-01-01 00:00:00 bone 2533 #> 273 106.0 2 1960 1960-01-31 12:00:01 blood 2534 #> 274 106.0 2 1960 1960-01-31 12:00:01 bone 2534 #> 275 102.2 3 1960 1960-03-02 00:00:01 blood 2535 #> 276 102.2 3 1960 1960-03-02 00:00:01 bone 2535 #> 277 122.0 4 1960 1960-04-01 12:00:00 blood 2536 #> 278 122.0 4 1960 1960-04-01 12:00:00 bone 2536 #> 279 119.6 5 1960 1960-05-02 00:00:01 blood 2537 #> 280 119.6 5 1960 1960-05-02 00:00:01 bone 2537 #> 281 110.2 6 1960 1960-06-01 12:00:01 blood 2538 #> 282 110.2 6 1960 1960-06-01 12:00:01 bone 2538 #> 283 121.7 7 1960 1960-07-02 00:00:00 blood 2539 #> 284 121.7 7 1960 1960-07-02 00:00:00 bone 2539 #> 285 134.1 8 1960 1960-08-01 12:00:01 blood 2540 #> 286 134.1 8 1960 1960-08-01 12:00:01 bone 2540 #> 287 127.2 9 1960 1960-09-01 00:00:01 blood 2541 #> 288 127.2 9 1960 1960-09-01 00:00:01 bone 2541 #> 289 82.8 10 1960 1960-10-01 12:00:00 blood 2542 #> 290 82.8 10 1960 1960-10-01 12:00:00 bone 2542 #> 291 89.6 11 1960 1960-11-01 00:00:01 blood 2543 #> 292 89.6 11 1960 1960-11-01 00:00:01 bone 2543 #> 293 85.6 12 1960 1960-12-01 12:00:01 blood 2544 #> 294 85.6 12 1960 1960-12-01 12:00:01 bone 2544 #> 295 57.9 1 1961 1961-01-01 00:00:00 blood 2545 #> 296 57.9 1 1961 1961-01-01 00:00:00 bone 2545 #> 297 46.1 2 1961 1961-01-31 10:00:01 blood 2546 #> 298 46.1 2 1961 1961-01-31 10:00:01 bone 2546 #> 299 53.0 3 1961 1961-03-02 20:00:01 blood 2547 #> 300 53.0 3 1961 1961-03-02 20:00:01 bone 2547 #> 301 61.4 4 1961 1961-04-02 06:00:00 blood 2548 #> 302 61.4 4 1961 1961-04-02 06:00:00 bone 2548 #> 303 51.0 5 1961 1961-05-02 16:00:01 blood 2549 #> 304 51.0 5 1961 1961-05-02 16:00:01 bone 2549 #> 305 77.4 6 1961 1961-06-02 02:00:01 blood 2550 #> 306 77.4 6 1961 1961-06-02 02:00:01 bone 2550 #> 307 70.2 7 1961 1961-07-02 12:00:00 blood 2551 #> 308 70.2 7 1961 1961-07-02 12:00:00 bone 2551 #> 309 55.9 8 1961 1961-08-01 22:00:01 blood 2552 #> 310 55.9 8 1961 1961-08-01 22:00:01 bone 2552 #> 311 63.6 9 1961 1961-09-01 08:00:01 blood 2553 #> 312 63.6 9 1961 1961-09-01 08:00:01 bone 2553 #> 313 37.7 10 1961 1961-10-01 18:00:00 blood 2554 #> 314 37.7 10 1961 1961-10-01 18:00:00 bone 2554 #> 315 32.6 11 1961 1961-11-01 04:00:01 blood 2555 #> 316 32.6 11 1961 1961-11-01 04:00:01 bone 2555 #> 317 40.0 12 1961 1961-12-01 14:00:01 blood 2556 #> 318 40.0 12 1961 1961-12-01 14:00:01 bone 2556 #> 319 38.7 1 1962 1962-01-01 00:00:00 blood 2557 #> 320 38.7 1 1962 1962-01-01 00:00:00 bone 2557 #> 321 50.3 2 1962 1962-01-31 10:00:01 blood 2558 #> 322 50.3 2 1962 1962-01-31 10:00:01 bone 2558 #> 323 45.6 3 1962 1962-03-02 20:00:01 blood 2559 #> 324 45.6 3 1962 1962-03-02 20:00:01 bone 2559 #> 325 46.4 4 1962 1962-04-02 06:00:00 blood 2560 #> 326 46.4 4 1962 1962-04-02 06:00:00 bone 2560 #> 327 43.7 5 1962 1962-05-02 16:00:01 blood 2561 #> 328 43.7 5 1962 1962-05-02 16:00:01 bone 2561 #> 329 42.0 6 1962 1962-06-02 02:00:01 blood 2562 #> 330 42.0 6 1962 1962-06-02 02:00:01 bone 2562 #> 331 21.8 7 1962 1962-07-02 12:00:00 blood 2563 #> 332 21.8 7 1962 1962-07-02 12:00:00 bone 2563 #> 333 21.8 8 1962 1962-08-01 22:00:01 blood 2564 #> 334 21.8 8 1962 1962-08-01 22:00:01 bone 2564 #> 335 51.3 9 1962 1962-09-01 08:00:01 blood 2565 #> 336 51.3 9 1962 1962-09-01 08:00:01 bone 2565 #> 337 39.5 10 1962 1962-10-01 18:00:00 blood 2566 #> 338 39.5 10 1962 1962-10-01 18:00:00 bone 2566 #> 339 26.9 11 1962 1962-11-01 04:00:01 blood 2567 #> 340 26.9 11 1962 1962-11-01 04:00:01 bone 2567 #> 341 23.2 12 1962 1962-12-01 14:00:01 blood 2568 #> 342 23.2 12 1962 1962-12-01 14:00:01 bone 2568 #> 343 19.8 1 1963 1963-01-01 00:00:00 blood 2569 #> 344 19.8 1 1963 1963-01-01 00:00:00 bone 2569 #> 345 24.4 2 1963 1963-01-31 10:00:01 blood 2570 #> 346 24.4 2 1963 1963-01-31 10:00:01 bone 2570 #> 347 17.1 3 1963 1963-03-02 20:00:01 blood 2571 #> 348 17.1 3 1963 1963-03-02 20:00:01 bone 2571 #> 349 29.3 4 1963 1963-04-02 06:00:00 blood 2572 #> 350 29.3 4 1963 1963-04-02 06:00:00 bone 2572 #> 351 43.0 5 1963 1963-05-02 16:00:01 blood 2573 #> 352 43.0 5 1963 1963-05-02 16:00:01 bone 2573 #> 353 35.9 6 1963 1963-06-02 02:00:01 blood 2574 #> 354 35.9 6 1963 1963-06-02 02:00:01 bone 2574 #> 355 19.6 7 1963 1963-07-02 12:00:00 blood 2575 #> 356 19.6 7 1963 1963-07-02 12:00:00 bone 2575 #> 357 33.2 8 1963 1963-08-01 22:00:01 blood 2576 #> 358 33.2 8 1963 1963-08-01 22:00:01 bone 2576 #> 359 38.8 9 1963 1963-09-01 08:00:01 blood 2577 #> 360 38.8 9 1963 1963-09-01 08:00:01 bone 2577 #> 361 35.3 10 1963 1963-10-01 18:00:00 blood 2578 #> 362 35.3 10 1963 1963-10-01 18:00:00 bone 2578 #> 363 23.4 11 1963 1963-11-01 04:00:01 blood 2579 #> 364 23.4 11 1963 1963-11-01 04:00:01 bone 2579 #> 365 14.9 12 1963 1963-12-01 14:00:01 blood 2580 #> 366 14.9 12 1963 1963-12-01 14:00:01 bone 2580 #> 367 15.3 1 1964 1964-01-01 00:00:00 blood 2581 #> 368 15.3 1 1964 1964-01-01 00:00:00 bone 2581 #> 369 17.7 2 1964 1964-01-31 12:00:01 blood 2582 #> 370 17.7 2 1964 1964-01-31 12:00:01 bone 2582 #> 371 16.5 3 1964 1964-03-02 00:00:01 blood 2583 #> 372 16.5 3 1964 1964-03-02 00:00:01 bone 2583 #> 373 8.6 4 1964 1964-04-01 12:00:00 blood 2584 #> 374 8.6 4 1964 1964-04-01 12:00:00 bone 2584 #> 375 9.5 5 1964 1964-05-02 00:00:01 blood 2585 #> 376 9.5 5 1964 1964-05-02 00:00:01 bone 2585 #> 377 9.1 6 1964 1964-06-01 12:00:01 blood 2586 #> 378 9.1 6 1964 1964-06-01 12:00:01 bone 2586 #> 379 3.1 7 1964 1964-07-02 00:00:00 blood 2587 #> 380 3.1 7 1964 1964-07-02 00:00:00 bone 2587 #> 381 9.3 8 1964 1964-08-01 12:00:01 blood 2588 #> 382 9.3 8 1964 1964-08-01 12:00:01 bone 2588 #> 383 4.7 9 1964 1964-09-01 00:00:01 blood 2589 #> 384 4.7 9 1964 1964-09-01 00:00:01 bone 2589 #> 385 6.1 10 1964 1964-10-01 12:00:00 blood 2590 #> 386 6.1 10 1964 1964-10-01 12:00:00 bone 2590 #> 387 7.4 11 1964 1964-11-01 00:00:01 blood 2591 #> 388 7.4 11 1964 1964-11-01 00:00:01 bone 2591 #> 389 15.1 12 1964 1964-12-01 12:00:01 blood 2592 #> 390 15.1 12 1964 1964-12-01 12:00:01 bone 2592 #> 391 17.5 1 1965 1965-01-01 00:00:00 blood 2593 #> 392 17.5 1 1965 1965-01-01 00:00:00 bone 2593 #> 393 14.2 2 1965 1965-01-31 10:00:01 blood 2594 #> 394 14.2 2 1965 1965-01-31 10:00:01 bone 2594 #> 395 11.7 3 1965 1965-03-02 20:00:01 blood 2595 #> 396 11.7 3 1965 1965-03-02 20:00:01 bone 2595 #> 397 6.8 4 1965 1965-04-02 06:00:00 blood 2596 #> 398 6.8 4 1965 1965-04-02 06:00:00 bone 2596 #> 399 24.1 5 1965 1965-05-02 16:00:01 blood 2597 #> 400 24.1 5 1965 1965-05-02 16:00:01 bone 2597 #> 401 15.9 6 1965 1965-06-02 02:00:01 blood 2598 #> 402 15.9 6 1965 1965-06-02 02:00:01 bone 2598 #> 403 11.9 7 1965 1965-07-02 12:00:00 blood 2599 #> 404 11.9 7 1965 1965-07-02 12:00:00 bone 2599 #> 405 8.9 8 1965 1965-08-01 22:00:01 blood 2600 #> 406 8.9 8 1965 1965-08-01 22:00:01 bone 2600 #> 407 16.8 9 1965 1965-09-01 08:00:01 blood 2601 #> 408 16.8 9 1965 1965-09-01 08:00:01 bone 2601 #> 409 20.1 10 1965 1965-10-01 18:00:00 blood 2602 #> 410 20.1 10 1965 1965-10-01 18:00:00 bone 2602 #> 411 15.8 11 1965 1965-11-01 04:00:01 blood 2603 #> 412 15.8 11 1965 1965-11-01 04:00:01 bone 2603 #> 413 17.0 12 1965 1965-12-01 14:00:01 blood 2604 #> 414 17.0 12 1965 1965-12-01 14:00:01 bone 2604 #> 415 28.2 1 1966 1966-01-01 00:00:00 blood 2605 #> 416 28.2 1 1966 1966-01-01 00:00:00 bone 2605 #> 417 24.4 2 1966 1966-01-31 10:00:01 blood 2606 #> 418 24.4 2 1966 1966-01-31 10:00:01 bone 2606 #> 419 25.3 3 1966 1966-03-02 20:00:01 blood 2607 #> 420 25.3 3 1966 1966-03-02 20:00:01 bone 2607 #> 421 48.7 4 1966 1966-04-02 06:00:00 blood 2608 #> 422 48.7 4 1966 1966-04-02 06:00:00 bone 2608 #> 423 45.3 5 1966 1966-05-02 16:00:01 blood 2609 #> 424 45.3 5 1966 1966-05-02 16:00:01 bone 2609 #> 425 47.7 6 1966 1966-06-02 02:00:01 blood 2610 #> 426 47.7 6 1966 1966-06-02 02:00:01 bone 2610 #> 427 56.7 7 1966 1966-07-02 12:00:00 blood 2611 #> 428 56.7 7 1966 1966-07-02 12:00:00 bone 2611 #> 429 51.2 8 1966 1966-08-01 22:00:01 blood 2612 #> 430 51.2 8 1966 1966-08-01 22:00:01 bone 2612 #> 431 50.2 9 1966 1966-09-01 08:00:01 blood 2613 #> 432 50.2 9 1966 1966-09-01 08:00:01 bone 2613 #> 433 57.2 10 1966 1966-10-01 18:00:00 blood 2614 #> 434 57.2 10 1966 1966-10-01 18:00:00 bone 2614 #> 435 57.2 11 1966 1966-11-01 04:00:01 blood 2615 #> 436 57.2 11 1966 1966-11-01 04:00:01 bone 2615 #> 437 70.4 12 1966 1966-12-01 14:00:01 blood 2616 #> 438 70.4 12 1966 1966-12-01 14:00:01 bone 2616 #> 439 110.9 1 1967 1967-01-01 00:00:00 blood 2617 #> 440 110.9 1 1967 1967-01-01 00:00:00 bone 2617 #> 441 93.6 2 1967 1967-01-31 10:00:01 blood 2618 #> 442 93.6 2 1967 1967-01-31 10:00:01 bone 2618 #> 443 111.8 3 1967 1967-03-02 20:00:01 blood 2619 #> 444 111.8 3 1967 1967-03-02 20:00:01 bone 2619 #> 445 69.5 4 1967 1967-04-02 06:00:00 blood 2620 #> 446 69.5 4 1967 1967-04-02 06:00:00 bone 2620 #> 447 86.5 5 1967 1967-05-02 16:00:01 blood 2621 #> 448 86.5 5 1967 1967-05-02 16:00:01 bone 2621 #> 449 67.3 6 1967 1967-06-02 02:00:01 blood 2622 #> 450 67.3 6 1967 1967-06-02 02:00:01 bone 2622 #> 451 91.5 7 1967 1967-07-02 12:00:00 blood 2623 #> 452 91.5 7 1967 1967-07-02 12:00:00 bone 2623 #> 453 107.2 8 1967 1967-08-01 22:00:01 blood 2624 #> 454 107.2 8 1967 1967-08-01 22:00:01 bone 2624 #> 455 76.8 9 1967 1967-09-01 08:00:01 blood 2625 #> 456 76.8 9 1967 1967-09-01 08:00:01 bone 2625 #> 457 88.2 10 1967 1967-10-01 18:00:00 blood 2626 #> 458 88.2 10 1967 1967-10-01 18:00:00 bone 2626 #> 459 94.3 11 1967 1967-11-01 04:00:01 blood 2627 #> 460 94.3 11 1967 1967-11-01 04:00:01 bone 2627 #> 461 126.4 12 1967 1967-12-01 14:00:01 blood 2628 #> 462 126.4 12 1967 1967-12-01 14:00:01 bone 2628 #> 463 121.8 1 1968 1968-01-01 00:00:00 blood 2629 #> 464 121.8 1 1968 1968-01-01 00:00:00 bone 2629 #> 465 111.9 2 1968 1968-01-31 12:00:01 blood 2630 #> 466 111.9 2 1968 1968-01-31 12:00:01 bone 2630 #> 467 92.2 3 1968 1968-03-02 00:00:01 blood 2631 #> 468 92.2 3 1968 1968-03-02 00:00:01 bone 2631 #> 469 81.2 4 1968 1968-04-01 12:00:00 blood 2632 #> 470 81.2 4 1968 1968-04-01 12:00:00 bone 2632 #> 471 127.2 5 1968 1968-05-02 00:00:01 blood 2633 #> 472 127.2 5 1968 1968-05-02 00:00:01 bone 2633 #> 473 110.3 6 1968 1968-06-01 12:00:01 blood 2634 #> 474 110.3 6 1968 1968-06-01 12:00:01 bone 2634 #> 475 96.1 7 1968 1968-07-02 00:00:00 blood 2635 #> 476 96.1 7 1968 1968-07-02 00:00:00 bone 2635 #> 477 109.3 8 1968 1968-08-01 12:00:01 blood 2636 #> 478 109.3 8 1968 1968-08-01 12:00:01 bone 2636 #> 479 117.2 9 1968 1968-09-01 00:00:01 blood 2637 #> 480 117.2 9 1968 1968-09-01 00:00:01 bone 2637 #> 481 107.7 10 1968 1968-10-01 12:00:00 blood 2638 #> 482 107.7 10 1968 1968-10-01 12:00:00 bone 2638 #> 483 86.0 11 1968 1968-11-01 00:00:01 blood 2639 #> 484 86.0 11 1968 1968-11-01 00:00:01 bone 2639 #> 485 109.8 12 1968 1968-12-01 12:00:01 blood 2640 #> 486 109.8 12 1968 1968-12-01 12:00:01 bone 2640 #> 487 104.4 1 1969 1969-01-01 00:00:00 blood 2641 #> 488 104.4 1 1969 1969-01-01 00:00:00 bone 2641 #> 489 120.5 2 1969 1969-01-31 10:00:01 blood 2642 #> 490 120.5 2 1969 1969-01-31 10:00:01 bone 2642 #> 491 135.8 3 1969 1969-03-02 20:00:01 blood 2643 #> 492 135.8 3 1969 1969-03-02 20:00:01 bone 2643 #> 493 106.8 4 1969 1969-04-02 06:00:00 blood 2644 #> 494 106.8 4 1969 1969-04-02 06:00:00 bone 2644 #> 495 120.0 5 1969 1969-05-02 16:00:01 blood 2645 #> 496 120.0 5 1969 1969-05-02 16:00:01 bone 2645 #> 497 106.0 6 1969 1969-06-02 02:00:01 blood 2646 #> 498 106.0 6 1969 1969-06-02 02:00:01 bone 2646 #> 499 96.8 7 1969 1969-07-02 12:00:00 blood 2647 #> 500 96.8 7 1969 1969-07-02 12:00:00 bone 2647 #> 501 98.0 8 1969 1969-08-01 22:00:01 blood 2648 #> 502 98.0 8 1969 1969-08-01 22:00:01 bone 2648 #> 503 91.3 9 1969 1969-09-01 08:00:01 blood 2649 #> 504 91.3 9 1969 1969-09-01 08:00:01 bone 2649 #> 505 95.7 10 1969 1969-10-01 18:00:00 blood 2650 #> 506 95.7 10 1969 1969-10-01 18:00:00 bone 2650 #> 507 93.5 11 1969 1969-11-01 04:00:01 blood 2651 #> 508 93.5 11 1969 1969-11-01 04:00:01 bone 2651 #> 509 97.9 12 1969 1969-12-01 14:00:01 blood 2652 #> 510 97.9 12 1969 1969-12-01 14:00:01 bone 2652 #> 511 111.5 1 1970 1970-01-01 00:00:00 blood 2653 #> 512 111.5 1 1970 1970-01-01 00:00:00 bone 2653 #> 513 127.8 2 1970 1970-01-31 10:00:00 blood 2654 #> 514 127.8 2 1970 1970-01-31 10:00:00 bone 2654 #> 515 102.9 3 1970 1970-03-02 20:00:00 blood 2655 #> 516 102.9 3 1970 1970-03-02 20:00:00 bone 2655 #> 517 109.5 4 1970 1970-04-02 06:00:00 blood 2656 #> 518 109.5 4 1970 1970-04-02 06:00:00 bone 2656 #> 519 127.5 5 1970 1970-05-02 16:00:00 blood 2657 #> 520 127.5 5 1970 1970-05-02 16:00:00 bone 2657 #> 521 106.8 6 1970 1970-06-02 02:00:00 blood 2658 #> 522 106.8 6 1970 1970-06-02 02:00:00 bone 2658 #> 523 112.5 7 1970 1970-07-02 12:00:00 blood 2659 #> 524 112.5 7 1970 1970-07-02 12:00:00 bone 2659 #> 525 93.0 8 1970 1970-08-01 22:00:00 blood 2660 #> 526 93.0 8 1970 1970-08-01 22:00:00 bone 2660 #> 527 99.5 9 1970 1970-09-01 08:00:00 blood 2661 #> 528 99.5 9 1970 1970-09-01 08:00:00 bone 2661 #> 529 86.6 10 1970 1970-10-01 18:00:00 blood 2662 #> 530 86.6 10 1970 1970-10-01 18:00:00 bone 2662 #> 531 95.2 11 1970 1970-11-01 04:00:00 blood 2663 #> 532 95.2 11 1970 1970-11-01 04:00:00 bone 2663 #> 533 83.5 12 1970 1970-12-01 14:00:00 blood 2664 #> 534 83.5 12 1970 1970-12-01 14:00:00 bone 2664 #> 535 91.3 1 1971 1971-01-01 00:00:00 blood 2665 #> 536 91.3 1 1971 1971-01-01 00:00:00 bone 2665 #> 537 79.0 2 1971 1971-01-31 10:00:00 blood 2666 #> 538 79.0 2 1971 1971-01-31 10:00:00 bone 2666 #> 539 60.7 3 1971 1971-03-02 20:00:00 blood 2667 #> 540 60.7 3 1971 1971-03-02 20:00:00 bone 2667 #> 541 71.8 4 1971 1971-04-02 06:00:00 blood 2668 #> 542 71.8 4 1971 1971-04-02 06:00:00 bone 2668 #> 543 57.5 5 1971 1971-05-02 16:00:00 blood 2669 #> 544 57.5 5 1971 1971-05-02 16:00:00 bone 2669 #> 545 49.8 6 1971 1971-06-02 02:00:00 blood 2670 #> 546 49.8 6 1971 1971-06-02 02:00:00 bone 2670 #> 547 81.0 7 1971 1971-07-02 12:00:00 blood 2671 #> 548 81.0 7 1971 1971-07-02 12:00:00 bone 2671 #> 549 61.4 8 1971 1971-08-01 22:00:00 blood 2672 #> 550 61.4 8 1971 1971-08-01 22:00:00 bone 2672 #> 551 50.2 9 1971 1971-09-01 08:00:00 blood 2673 #> 552 50.2 9 1971 1971-09-01 08:00:00 bone 2673 #> 553 51.7 10 1971 1971-10-01 18:00:00 blood 2674 #> 554 51.7 10 1971 1971-10-01 18:00:00 bone 2674 #> 555 63.2 11 1971 1971-11-01 04:00:00 blood 2675 #> 556 63.2 11 1971 1971-11-01 04:00:00 bone 2675 #> 557 82.2 12 1971 1971-12-01 14:00:00 blood 2676 #> 558 82.2 12 1971 1971-12-01 14:00:00 bone 2676 #> 559 61.5 1 1972 1972-01-01 00:00:00 blood 2677 #> 560 61.5 1 1972 1972-01-01 00:00:00 bone 2677 #> 561 88.4 2 1972 1972-01-31 12:00:00 blood 2678 #> 562 88.4 2 1972 1972-01-31 12:00:00 bone 2678 #> 563 80.1 3 1972 1972-03-02 00:00:00 blood 2679 #> 564 80.1 3 1972 1972-03-02 00:00:00 bone 2679 #> 565 63.2 4 1972 1972-04-01 12:00:00 blood 2680 #> 566 63.2 4 1972 1972-04-01 12:00:00 bone 2680 #> 567 80.5 5 1972 1972-05-02 00:00:00 blood 2681 #> 568 80.5 5 1972 1972-05-02 00:00:00 bone 2681 #> 569 88.0 6 1972 1972-06-01 12:00:00 blood 2682 #> 570 88.0 6 1972 1972-06-01 12:00:00 bone 2682 #> 571 76.5 7 1972 1972-07-02 00:00:00 blood 2683 #> 572 76.5 7 1972 1972-07-02 00:00:00 bone 2683 #> 573 76.8 8 1972 1972-08-01 12:00:00 blood 2684 #> 574 76.8 8 1972 1972-08-01 12:00:00 bone 2684 #> 575 64.0 9 1972 1972-09-01 00:00:00 blood 2685 #> 576 64.0 9 1972 1972-09-01 00:00:00 bone 2685 #> 577 61.3 10 1972 1972-10-01 12:00:00 blood 2686 #> 578 61.3 10 1972 1972-10-01 12:00:00 bone 2686 #> 579 41.6 11 1972 1972-11-01 00:00:00 blood 2687 #> 580 41.6 11 1972 1972-11-01 00:00:00 bone 2687 #> 581 45.3 12 1972 1972-12-01 12:00:00 blood 2688 #> 582 45.3 12 1972 1972-12-01 12:00:00 bone 2688 #> 583 43.4 1 1973 1973-01-01 00:00:00 blood 2689 #> 584 43.4 1 1973 1973-01-01 00:00:00 bone 2689 #> 585 42.9 2 1973 1973-01-31 10:00:00 blood 2690 #> 586 42.9 2 1973 1973-01-31 10:00:00 bone 2690 #> 587 46.0 3 1973 1973-03-02 20:00:00 blood 2691 #> 588 46.0 3 1973 1973-03-02 20:00:00 bone 2691 #> 589 57.7 4 1973 1973-04-02 06:00:00 blood 2692 #> 590 57.7 4 1973 1973-04-02 06:00:00 bone 2692 #> 591 42.4 5 1973 1973-05-02 16:00:00 blood 2693 #> 592 42.4 5 1973 1973-05-02 16:00:00 bone 2693 #> 593 39.5 6 1973 1973-06-02 02:00:00 blood 2694 #> 594 39.5 6 1973 1973-06-02 02:00:00 bone 2694 #> 595 23.1 7 1973 1973-07-02 12:00:00 blood 2695 #> 596 23.1 7 1973 1973-07-02 12:00:00 bone 2695 #> 597 25.6 8 1973 1973-08-01 22:00:00 blood 2696 #> 598 25.6 8 1973 1973-08-01 22:00:00 bone 2696 #> 599 59.3 9 1973 1973-09-01 08:00:00 blood 2697 #> 600 59.3 9 1973 1973-09-01 08:00:00 bone 2697 #> 601 30.7 10 1973 1973-10-01 18:00:00 blood 2698 #> 602 30.7 10 1973 1973-10-01 18:00:00 bone 2698 #> 603 23.9 11 1973 1973-11-01 04:00:00 blood 2699 #> 604 23.9 11 1973 1973-11-01 04:00:00 bone 2699 #> 605 23.3 12 1973 1973-12-01 14:00:00 blood 2700 #> 606 23.3 12 1973 1973-12-01 14:00:00 bone 2700 #> 607 27.6 1 1974 1974-01-01 00:00:00 blood 2701 #> 608 27.6 1 1974 1974-01-01 00:00:00 bone 2701 #> 609 26.0 2 1974 1974-01-31 10:00:00 blood 2702 #> 610 26.0 2 1974 1974-01-31 10:00:00 bone 2702 #> 611 21.3 3 1974 1974-03-02 20:00:00 blood 2703 #> 612 21.3 3 1974 1974-03-02 20:00:00 bone 2703 #> 613 40.3 4 1974 1974-04-02 06:00:00 blood 2704 #> 614 40.3 4 1974 1974-04-02 06:00:00 bone 2704 #> 615 39.5 5 1974 1974-05-02 16:00:00 blood 2705 #> 616 39.5 5 1974 1974-05-02 16:00:00 bone 2705 #> 617 36.0 6 1974 1974-06-02 02:00:00 blood 2706 #> 618 36.0 6 1974 1974-06-02 02:00:00 bone 2706 #> 619 55.8 7 1974 1974-07-02 12:00:00 blood 2707 #> 620 55.8 7 1974 1974-07-02 12:00:00 bone 2707 #> 621 33.6 8 1974 1974-08-01 22:00:00 blood 2708 #> 622 33.6 8 1974 1974-08-01 22:00:00 bone 2708 #> 623 40.2 9 1974 1974-09-01 08:00:00 blood 2709 #> 624 40.2 9 1974 1974-09-01 08:00:00 bone 2709 #> 625 47.1 10 1974 1974-10-01 18:00:00 blood 2710 #> 626 47.1 10 1974 1974-10-01 18:00:00 bone 2710 #> 627 25.0 11 1974 1974-11-01 04:00:00 blood 2711 #> 628 25.0 11 1974 1974-11-01 04:00:00 bone 2711 #> 629 20.5 12 1974 1974-12-01 14:00:00 blood 2712 #> 630 20.5 12 1974 1974-12-01 14:00:00 bone 2712 #> 631 18.9 1 1975 1975-01-01 00:00:00 blood 2713 #> 632 18.9 1 1975 1975-01-01 00:00:00 bone 2713 #> 633 11.5 2 1975 1975-01-31 10:00:00 blood 2714 #> 634 11.5 2 1975 1975-01-31 10:00:00 bone 2714 #> 635 11.5 3 1975 1975-03-02 20:00:00 blood 2715 #> 636 11.5 3 1975 1975-03-02 20:00:00 bone 2715 #> 637 5.1 4 1975 1975-04-02 06:00:00 blood 2716 #> 638 5.1 4 1975 1975-04-02 06:00:00 bone 2716 #> 639 9.0 5 1975 1975-05-02 16:00:00 blood 2717 #> 640 9.0 5 1975 1975-05-02 16:00:00 bone 2717 #> 641 11.4 6 1975 1975-06-02 02:00:00 blood 2718 #> 642 11.4 6 1975 1975-06-02 02:00:00 bone 2718 #> 643 28.2 7 1975 1975-07-02 12:00:00 blood 2719 #> 644 28.2 7 1975 1975-07-02 12:00:00 bone 2719 #> 645 39.7 8 1975 1975-08-01 22:00:00 blood 2720 #> 646 39.7 8 1975 1975-08-01 22:00:00 bone 2720 #> 647 13.9 9 1975 1975-09-01 08:00:00 blood 2721 #> 648 13.9 9 1975 1975-09-01 08:00:00 bone 2721 #> 649 9.1 10 1975 1975-10-01 18:00:00 blood 2722 #> 650 9.1 10 1975 1975-10-01 18:00:00 bone 2722 #> 651 19.4 11 1975 1975-11-01 04:00:00 blood 2723 #> 652 19.4 11 1975 1975-11-01 04:00:00 bone 2723 #> 653 7.8 12 1975 1975-12-01 14:00:00 blood 2724 #> 654 7.8 12 1975 1975-12-01 14:00:00 bone 2724 #> 655 8.1 1 1976 1976-01-01 00:00:00 blood 2725 #> 656 8.1 1 1976 1976-01-01 00:00:00 bone 2725 #> 657 4.3 2 1976 1976-01-31 12:00:00 blood 2726 #> 658 4.3 2 1976 1976-01-31 12:00:00 bone 2726 #> 659 21.9 3 1976 1976-03-02 00:00:00 blood 2727 #> 660 21.9 3 1976 1976-03-02 00:00:00 bone 2727 #> 661 18.8 4 1976 1976-04-01 12:00:00 blood 2728 #> 662 18.8 4 1976 1976-04-01 12:00:00 bone 2728 #> 663 12.4 5 1976 1976-05-02 00:00:00 blood 2729 #> 664 12.4 5 1976 1976-05-02 00:00:00 bone 2729 #> 665 12.2 6 1976 1976-06-01 12:00:00 blood 2730 #> 666 12.2 6 1976 1976-06-01 12:00:00 bone 2730 #> 667 1.9 7 1976 1976-07-02 00:00:00 blood 2731 #> 668 1.9 7 1976 1976-07-02 00:00:00 bone 2731 #> 669 16.4 8 1976 1976-08-01 12:00:00 blood 2732 #> 670 16.4 8 1976 1976-08-01 12:00:00 bone 2732 #> 671 13.5 9 1976 1976-09-01 00:00:00 blood 2733 #> 672 13.5 9 1976 1976-09-01 00:00:00 bone 2733 #> 673 20.6 10 1976 1976-10-01 12:00:00 blood 2734 #> 674 20.6 10 1976 1976-10-01 12:00:00 bone 2734 #> 675 5.2 11 1976 1976-11-01 00:00:00 blood 2735 #> 676 5.2 11 1976 1976-11-01 00:00:00 bone 2735 #> 677 15.3 12 1976 1976-12-01 12:00:00 blood 2736 #> 678 15.3 12 1976 1976-12-01 12:00:00 bone 2736 #> 679 16.4 1 1977 1977-01-01 00:00:00 blood 2737 #> 680 16.4 1 1977 1977-01-01 00:00:00 bone 2737 #> 681 23.1 2 1977 1977-01-31 10:00:00 blood 2738 #> 682 23.1 2 1977 1977-01-31 10:00:00 bone 2738 #> 683 8.7 3 1977 1977-03-02 20:00:00 blood 2739 #> 684 8.7 3 1977 1977-03-02 20:00:00 bone 2739 #> 685 12.9 4 1977 1977-04-02 06:00:00 blood 2740 #> 686 12.9 4 1977 1977-04-02 06:00:00 bone 2740 #> 687 18.6 5 1977 1977-05-02 16:00:00 blood 2741 #> 688 18.6 5 1977 1977-05-02 16:00:00 bone 2741 #> 689 38.5 6 1977 1977-06-02 02:00:00 blood 2742 #> 690 38.5 6 1977 1977-06-02 02:00:00 bone 2742 #> 691 21.4 7 1977 1977-07-02 12:00:00 blood 2743 #> 692 21.4 7 1977 1977-07-02 12:00:00 bone 2743 #> 693 30.1 8 1977 1977-08-01 22:00:00 blood 2744 #> 694 30.1 8 1977 1977-08-01 22:00:00 bone 2744 #> 695 44.0 9 1977 1977-09-01 08:00:00 blood 2745 #> 696 44.0 9 1977 1977-09-01 08:00:00 bone 2745 #> 697 43.8 10 1977 1977-10-01 18:00:00 blood 2746 #> 698 43.8 10 1977 1977-10-01 18:00:00 bone 2746 #> 699 29.1 11 1977 1977-11-01 04:00:00 blood 2747 #> 700 29.1 11 1977 1977-11-01 04:00:00 bone 2747 #> 701 43.2 12 1977 1977-12-01 14:00:00 blood 2748 #> 702 43.2 12 1977 1977-12-01 14:00:00 bone 2748 #> 703 51.9 1 1978 1978-01-01 00:00:00 blood 2749 #> 704 51.9 1 1978 1978-01-01 00:00:00 bone 2749 #> 705 93.6 2 1978 1978-01-31 10:00:00 blood 2750 #> 706 93.6 2 1978 1978-01-31 10:00:00 bone 2750 #> 707 76.5 3 1978 1978-03-02 20:00:00 blood 2751 #> 708 76.5 3 1978 1978-03-02 20:00:00 bone 2751 #> 709 99.7 4 1978 1978-04-02 06:00:00 blood 2752 #> 710 99.7 4 1978 1978-04-02 06:00:00 bone 2752 #> 711 82.7 5 1978 1978-05-02 16:00:00 blood 2753 #> 712 82.7 5 1978 1978-05-02 16:00:00 bone 2753 #> 713 95.1 6 1978 1978-06-02 02:00:00 blood 2754 #> 714 95.1 6 1978 1978-06-02 02:00:00 bone 2754 #> 715 70.4 7 1978 1978-07-02 12:00:00 blood 2755 #> 716 70.4 7 1978 1978-07-02 12:00:00 bone 2755 #> 717 58.1 8 1978 1978-08-01 22:00:00 blood 2756 #> 718 58.1 8 1978 1978-08-01 22:00:00 bone 2756 #> 719 138.2 9 1978 1978-09-01 08:00:00 blood 2757 #> 720 138.2 9 1978 1978-09-01 08:00:00 bone 2757 #> 721 125.1 10 1978 1978-10-01 18:00:00 blood 2758 #> 722 125.1 10 1978 1978-10-01 18:00:00 bone 2758 #> 723 97.9 11 1978 1978-11-01 04:00:00 blood 2759 #> 724 97.9 11 1978 1978-11-01 04:00:00 bone 2759 #> 725 122.7 12 1978 1978-12-01 14:00:00 blood 2760 #> 726 122.7 12 1978 1978-12-01 14:00:00 bone 2760 #> 727 166.6 1 1979 1979-01-01 00:00:00 blood 2761 #> 728 166.6 1 1979 1979-01-01 00:00:00 bone 2761 #> 729 137.5 2 1979 1979-01-31 10:00:00 blood 2762 #> 730 137.5 2 1979 1979-01-31 10:00:00 bone 2762 #> 731 138.0 3 1979 1979-03-02 20:00:00 blood 2763 #> 732 138.0 3 1979 1979-03-02 20:00:00 bone 2763 #> 733 101.5 4 1979 1979-04-02 06:00:00 blood 2764 #> 734 101.5 4 1979 1979-04-02 06:00:00 bone 2764 #> 735 134.4 5 1979 1979-05-02 16:00:00 blood 2765 #> 736 134.4 5 1979 1979-05-02 16:00:00 bone 2765 #> 737 149.5 6 1979 1979-06-02 02:00:00 blood 2766 #> 738 149.5 6 1979 1979-06-02 02:00:00 bone 2766 #> 739 159.4 7 1979 1979-07-02 12:00:00 blood 2767 #> 740 159.4 7 1979 1979-07-02 12:00:00 bone 2767 #> 741 142.2 8 1979 1979-08-01 22:00:00 blood 2768 #> 742 142.2 8 1979 1979-08-01 22:00:00 bone 2768 #> 743 188.4 9 1979 1979-09-01 08:00:00 blood 2769 #> 744 188.4 9 1979 1979-09-01 08:00:00 bone 2769 #> 745 186.2 10 1979 1979-10-01 18:00:00 blood 2770 #> 746 186.2 10 1979 1979-10-01 18:00:00 bone 2770 #> 747 183.3 11 1979 1979-11-01 04:00:00 blood 2771 #> 748 183.3 11 1979 1979-11-01 04:00:00 bone 2771 #> 749 176.3 12 1979 1979-12-01 14:00:00 blood 2772 #> 750 176.3 12 1979 1979-12-01 14:00:00 bone 2772 #> 751 159.6 1 1980 1980-01-01 00:00:00 blood 2773 #> 752 159.6 1 1980 1980-01-01 00:00:00 bone 2773 #> 753 155.0 2 1980 1980-01-31 12:00:00 blood 2774 #> 754 155.0 2 1980 1980-01-31 12:00:00 bone 2774 #> 755 126.2 3 1980 1980-03-02 00:00:00 blood 2775 #> 756 126.2 3 1980 1980-03-02 00:00:00 bone 2775 #> 757 164.1 4 1980 1980-04-01 12:00:00 blood 2776 #> 758 164.1 4 1980 1980-04-01 12:00:00 bone 2776 #> 759 179.9 5 1980 1980-05-02 00:00:00 blood 2777 #> 760 179.9 5 1980 1980-05-02 00:00:00 bone 2777 #> 761 157.3 6 1980 1980-06-01 12:00:00 blood 2778 #> 762 157.3 6 1980 1980-06-01 12:00:00 bone 2778 #> 763 136.3 7 1980 1980-07-02 00:00:00 blood 2779 #> 764 136.3 7 1980 1980-07-02 00:00:00 bone 2779 #> 765 135.4 8 1980 1980-08-01 12:00:00 blood 2780 #> 766 135.4 8 1980 1980-08-01 12:00:00 bone 2780 #> 767 155.0 9 1980 1980-09-01 00:00:00 blood 2781 #> 768 155.0 9 1980 1980-09-01 00:00:00 bone 2781 #> 769 164.7 10 1980 1980-10-01 12:00:00 blood 2782 #> 770 164.7 10 1980 1980-10-01 12:00:00 bone 2782 #> 771 147.9 11 1980 1980-11-01 00:00:00 blood 2783 #> 772 147.9 11 1980 1980-11-01 00:00:00 bone 2783 #> 773 174.4 12 1980 1980-12-01 12:00:00 blood 2784 #> 774 174.4 12 1980 1980-12-01 12:00:00 bone 2784 #> 775 114.0 1 1981 1981-01-01 00:00:00 blood 2785 #> 776 114.0 1 1981 1981-01-01 00:00:00 bone 2785 #> 777 141.3 2 1981 1981-01-31 10:00:00 blood 2786 #> 778 141.3 2 1981 1981-01-31 10:00:00 bone 2786 #> 779 135.5 3 1981 1981-03-02 20:00:00 blood 2787 #> 780 135.5 3 1981 1981-03-02 20:00:00 bone 2787 #> 781 156.4 4 1981 1981-04-02 06:00:00 blood 2788 #> 782 156.4 4 1981 1981-04-02 06:00:00 bone 2788 #> 783 127.5 5 1981 1981-05-02 16:00:00 blood 2789 #> 784 127.5 5 1981 1981-05-02 16:00:00 bone 2789 #> 785 90.0 6 1981 1981-06-02 02:00:00 blood 2790 #> 786 90.0 6 1981 1981-06-02 02:00:00 bone 2790 #> 787 143.8 7 1981 1981-07-02 12:00:00 blood 2791 #> 788 143.8 7 1981 1981-07-02 12:00:00 bone 2791 #> 789 158.7 8 1981 1981-08-01 22:00:00 blood 2792 #> 790 158.7 8 1981 1981-08-01 22:00:00 bone 2792 #> 791 167.3 9 1981 1981-09-01 08:00:00 blood 2793 #> 792 167.3 9 1981 1981-09-01 08:00:00 bone 2793 #> 793 162.4 10 1981 1981-10-01 18:00:00 blood 2794 #> 794 162.4 10 1981 1981-10-01 18:00:00 bone 2794 #> 795 137.5 11 1981 1981-11-01 04:00:00 blood 2795 #> 796 137.5 11 1981 1981-11-01 04:00:00 bone 2795 #> 797 150.1 12 1981 1981-12-01 14:00:00 blood 2796 #> 798 150.1 12 1981 1981-12-01 14:00:00 bone 2796 #> 799 111.2 1 1982 1982-01-01 00:00:00 blood 2797 #> 800 111.2 1 1982 1982-01-01 00:00:00 bone 2797 #> 801 163.6 2 1982 1982-01-31 10:00:00 blood 2798 #> 802 163.6 2 1982 1982-01-31 10:00:00 bone 2798 #> 803 153.8 3 1982 1982-03-02 20:00:00 blood 2799 #> 804 153.8 3 1982 1982-03-02 20:00:00 bone 2799 #> 805 122.0 4 1982 1982-04-02 06:00:00 blood 2800 #> 806 122.0 4 1982 1982-04-02 06:00:00 bone 2800 #> 807 82.2 5 1982 1982-05-02 16:00:00 blood 2801 #> 808 82.2 5 1982 1982-05-02 16:00:00 bone 2801 #> 809 110.4 6 1982 1982-06-02 02:00:00 blood 2802 #> 810 110.4 6 1982 1982-06-02 02:00:00 bone 2802 #> 811 106.1 7 1982 1982-07-02 12:00:00 blood 2803 #> 812 106.1 7 1982 1982-07-02 12:00:00 bone 2803 #> 813 107.6 8 1982 1982-08-01 22:00:00 blood 2804 #> 814 107.6 8 1982 1982-08-01 22:00:00 bone 2804 #> 815 118.8 9 1982 1982-09-01 08:00:00 blood 2805 #> 816 118.8 9 1982 1982-09-01 08:00:00 bone 2805 #> 817 94.7 10 1982 1982-10-01 18:00:00 blood 2806 #> 818 94.7 10 1982 1982-10-01 18:00:00 bone 2806 #> 819 98.1 11 1982 1982-11-01 04:00:00 blood 2807 #> 820 98.1 11 1982 1982-11-01 04:00:00 bone 2807 #> 821 127.0 12 1982 1982-12-01 14:00:00 blood 2808 #> 822 127.0 12 1982 1982-12-01 14:00:00 bone 2808 #> 823 84.3 1 1983 1983-01-01 00:00:00 blood 2809 #> 824 84.3 1 1983 1983-01-01 00:00:00 bone 2809 #> 825 51.0 2 1983 1983-01-31 10:00:00 blood 2810 #> 826 51.0 2 1983 1983-01-31 10:00:00 bone 2810 #> 827 66.5 3 1983 1983-03-02 20:00:00 blood 2811 #> 828 66.5 3 1983 1983-03-02 20:00:00 bone 2811 #> 829 80.7 4 1983 1983-04-02 06:00:00 blood 2812 #> 830 80.7 4 1983 1983-04-02 06:00:00 bone 2812 #> 831 99.2 5 1983 1983-05-02 16:00:00 blood 2813 #> 832 99.2 5 1983 1983-05-02 16:00:00 bone 2813 #> 833 91.1 6 1983 1983-06-02 02:00:00 blood 2814 #> 834 91.1 6 1983 1983-06-02 02:00:00 bone 2814 #> 835 82.2 7 1983 1983-07-02 12:00:00 blood 2815 #> 836 82.2 7 1983 1983-07-02 12:00:00 bone 2815 #> 837 71.8 8 1983 1983-08-01 22:00:00 blood 2816 #> 838 71.8 8 1983 1983-08-01 22:00:00 bone 2816 #> 839 50.3 9 1983 1983-09-01 08:00:00 blood 2817 #> 840 50.3 9 1983 1983-09-01 08:00:00 bone 2817 #> 841 55.8 10 1983 1983-10-01 18:00:00 blood 2818 #> 842 55.8 10 1983 1983-10-01 18:00:00 bone 2818 #> 843 33.3 11 1983 1983-11-01 04:00:00 blood 2819 #> 844 33.3 11 1983 1983-11-01 04:00:00 bone 2819 #> 845 33.4 12 1983 1983-12-01 14:00:00 blood 2820 #> 846 33.4 12 1983 1983-12-01 14:00:00 bone 2820 #> # An xts object example library(xts) #> Warning: package 'xts' was built under R version 4.3.2 #> Loading required package: zoo #> Warning: package 'zoo' was built under R version 4.3.2 #> #> Attaching package: 'zoo' #> The following objects are masked from 'package:base': #> #> as.Date, as.Date.numeric dates <- seq(as.Date(\"2001-05-01\"), length=30, by=\"quarter\") data <- cbind(c(gas = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001))))) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) #> gas oil #> 2001-05-01 1 0 #> 2001-08-01 1 5 #> 2001-11-01 0 0 #> 2002-02-01 0 1 #> 2002-05-01 2 1 #> 2002-08-01 0 1 series_to_mvgam(series, freq = 4, train_prop = 0.85) #> $data_train #> y season year date series time #> 1 1 2 2001 2001-05-01 gas 1 #> 2 0 2 2001 2001-05-01 oil 1 #> 3 1 3 2001 2001-08-01 gas 2 #> 4 5 3 2001 2001-08-01 oil 2 #> 5 0 4 2001 2001-11-01 gas 3 #> 6 0 4 2001 2001-11-01 oil 3 #> 7 0 1 2002 2002-02-01 gas 4 #> 8 1 1 2002 2002-02-01 oil 4 #> 9 2 2 2002 2002-05-01 gas 5 #> 10 1 2 2002 2002-05-01 oil 5 #> 11 0 3 2002 2002-08-01 gas 6 #> 12 1 3 2002 2002-08-01 oil 6 #> 13 2 4 2002 2002-11-01 gas 7 #> 14 3 4 2002 2002-11-01 oil 7 #> 15 2 1 2003 2003-02-01 gas 8 #> 16 0 1 2003 2003-02-01 oil 8 #> 17 1 2 2003 2003-05-01 gas 9 #> 18 1 2 2003 2003-05-01 oil 9 #> 19 3 3 2003 2003-08-01 gas 10 #> 20 0 3 2003 2003-08-01 oil 10 #> 21 0 4 2003 2003-11-01 gas 11 #> 22 1 4 2003 2003-11-01 oil 11 #> 23 2 1 2004 2004-02-01 gas 12 #> 24 1 1 2004 2004-02-01 oil 12 #> 25 0 2 2004 2004-05-01 gas 13 #> 26 1 2 2004 2004-05-01 oil 13 #> 27 3 3 2004 2004-08-01 gas 14 #> 28 0 3 2004 2004-08-01 oil 14 #> 29 2 4 2004 2004-11-01 gas 15 #> 30 2 4 2004 2004-11-01 oil 15 #> 31 1 1 2005 2005-02-01 gas 16 #> 32 5 1 2005 2005-02-01 oil 16 #> 33 1 2 2005 2005-05-01 gas 17 #> 34 3 2 2005 2005-05-01 oil 17 #> 35 0 3 2005 2005-08-01 gas 18 #> 36 1 3 2005 2005-08-01 oil 18 #> 37 2 4 2005 2005-11-01 gas 19 #> 38 2 4 2005 2005-11-01 oil 19 #> 39 1 1 2006 2006-02-01 gas 20 #> 40 1 1 2006 2006-02-01 oil 20 #> 41 1 2 2006 2006-05-01 gas 21 #> 42 0 2 2006 2006-05-01 oil 21 #> 43 2 3 2006 2006-08-01 gas 22 #> 44 1 3 2006 2006-08-01 oil 22 #> 45 0 4 2006 2006-11-01 gas 23 #> 46 0 4 2006 2006-11-01 oil 23 #> 47 0 1 2007 2007-02-01 gas 24 #> 48 0 1 2007 2007-02-01 oil 24 #> 49 2 2 2007 2007-05-01 gas 25 #> 50 1 2 2007 2007-05-01 oil 25 #> #> $data_test #> y season year date series time #> 1 1 3 2007 2007-08-01 gas 26 #> 2 3 3 2007 2007-08-01 oil 26 #> 3 0 4 2007 2007-11-01 gas 27 #> 4 2 4 2007 2007-11-01 oil 27 #> 5 0 1 2008 2008-02-01 gas 28 #> 6 0 1 2008 2008-02-01 oil 28 #> 7 3 2 2008 2008-05-01 gas 29 #> 8 2 2 2008 2008-05-01 oil 29 #> 9 2 3 2008 2008-08-01 gas 30 #> 10 2 3 2008 2008-08-01 oil 30 #>"},{"path":"https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","title":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","text":"function simulates discrete time series data fitting multivariate GAM includes shared seasonality dependence state-space latent dynamic factors. Random dependencies among series, .e. correlations long-term trends, included form correlated loadings latent dynamic factors","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","text":"","code":"sim_mvgam( T = 100, n_series = 3, seasonality = \"shared\", use_lv = FALSE, n_lv = 1, trend_model = \"RW\", drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","text":"T integer. Number observations (timepoints) n_series integer. Number discrete time series seasonality character. Either shared, meaning series share exact seasonal pattern, hierarchical, meaning global seasonality series' pattern can deviate slightly use_lv logical. TRUE, use dynamic factors estimate series' latent trends reduced dimension format. FALSE, estimate independent latent trends series n_lv integer. Number latent dynamic factors generating series' trends trend_model character specifying time series dynamics latent trend. Options : None (latent trend component; .e. GAM component contributes linear predictor, observation process source error; similarly estimated gam) RW (random walk possible drift) AR1 (possible drift) AR2 (possible drift) AR3 (possible drift) VAR1 (contemporaneously uncorrelated VAR1) VAR1cor (contemporaneously correlated VAR1) GP (Gaussian Process squared exponential kernel) See mvgam_trends details drift logical, simulate drift term trend prop_trend numeric. Relative importance trend series. 0 1 trend_rel Depracated. Use prop_trend instead freq integer. seasonal frequency series family family specifying exponential observation family series. Currently supported families : nb(), poisson(), tweedie(), gaussian(), betar(), lognormal(), student() Gamma() phi vector dispersion parameters series (.e. size Negative Binomial phi Tweedie Beta). length(phi) < n_series, first element phi replicated n_series times. Defaults 5 Negative Binomial Tweedie; 10 Beta shape vector shape parameters series (.e. shape Gamma) length(shape) < n_series, first element shape replicated n_series times. Defaults 10 sigma vector scale parameters series (.e. sd Normal Student-T, log(sd) LogNormal). length(sigma) < n_series, first element sigma replicated n_series times. Defaults 0.5 Normal Student-T; 0.2 Lognormal nu vector degrees freedom parameters series (.e. nu Student-T) length(nu) < n_series, first element nu replicated n_series times. Defaults 3 mu vector location parameters series. length(mu) < n_series, first element mu replicated n_series times. Defaults small random values -0.5 0.5 link scale prop_missing numeric stating proportion observations missing. 0 0.8, inclusive prop_train numeric stating proportion data use training. 0.2 1","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","text":"list object containing outputs needed mvgam, including 'data_train' 'data_test', well additional information simulated seasonality trend dependencies","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simulate a set of discrete time series for mvgam modelling — sim_mvgam","text":"","code":"#Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam(family = betar(), trend_model = 'GP', prop_trend = 0.6) plot_mvgam_series(data = sim_data$data_train, series = 'all') #Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam(family = nb(), trend_model = 'GP', prop_trend = 0.6, phi = 10) plot_mvgam_series(data = sim_data$data_train, series = 'all')"},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Summary for a fitted mvgam object — summary.mvgam","title":"Summary for a fitted mvgam object — summary.mvgam","text":"functions take fitted mvgam object return various useful summaries","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summary for a fitted mvgam object — summary.mvgam","text":"","code":"# S3 method for mvgam summary(object, include_betas = TRUE, digits = 2, ...) # S3 method for mvgam_prefit summary(object, ...) # S3 method for mvgam coef(object, summarise = TRUE, ...)"},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summary for a fitted mvgam object — summary.mvgam","text":"object list object returned mvgam include_betas Logical. Print summary includes posterior summaries linear predictor beta coefficients (including spline coefficients)? Defaults TRUE use FALSE concise summary digits number significant digits printing summary; defaults 2. ... Ignored summarise logical. Summaries coefficients returned TRUE. Otherwise full posterior distribution returned","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summary for a fitted mvgam object — summary.mvgam","text":"summary.mvgam summary.mvgam_prefit, list printed -screen showing summaries model coef.mvgam, either matrix posterior coefficient distributions (summarise == FALSE data.frame coefficient summaries)","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Summary for a fitted mvgam object — summary.mvgam","text":"summary.mvgam summary.mvgam_prefit return brief summaries model's call, along posterior intervals key parameters model. Note smooths extra penalties null space, summaries rho parameters may include penalty terms number smooths original model formula. Approximate p-values smooth terms also returned, methods used calculation following used mgcv equivalents (see summary.gam details). Estimated Degrees Freedom (edf) smooth terms computed using edf.type = 1 described documentation jagam. Experiments suggest p-values tend conservative might returned equivalent model fit summary.gam using method = 'REML' coef.mvgam returns either summaries full posterior estimates GAM component coefficients","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Summary for a fitted mvgam object — summary.mvgam","text":"Nicholas J Clark","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/update.mvgam.html","id":null,"dir":"Reference","previous_headings":"","what":"Update an existing mvgam object — update.mvgam","title":"Update an existing mvgam object — update.mvgam","text":"function allows previously fitted mvgam model updated","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/update.mvgam.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update an existing mvgam object — update.mvgam","text":"","code":"# S3 method for mvgam update( object, formula, trend_formula, data, newdata, trend_model, trend_map, use_lv, n_lv, family, priors, lfo = FALSE, ... )"},{"path":"https://nicholasjclark.github.io/mvgam/reference/update.mvgam.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update an existing mvgam object — update.mvgam","text":"object list object returned mvgam. See mvgam() formula Optional new formula object. Note, mvgam currently support dynamic formula updates removal specific terms - term. updating, entire formula needs supplied trend_formula optional character string specifying GAM process model formula. supplied, linear predictor modelled latent trends capture process model evolution separately observation model. response variable specified left-hand side formula (.e. valid option ~ season + s(year)). Also note use identifier series formula specify effects vary across time series. Instead use trend. ensure models trend_map supplied still work consistently (.e. allowing effects vary across process models, even time series share underlying process model). feature currently available RW(), AR() VAR() trend models. nmix() family models, trend_formula used set linear predictor underlying latent abundance data dataframe list containing model response variable covariates required GAM formula optional trend_formula. include columns: series (factor index series IDs;number levels identical number unique series labels (.e. n_series = length(levels(data$series)))) time (numeric integer index time point observation). variables included linear predictor formula must also present newdata Optional dataframe list test data containing least series time addition variables included linear predictor formula. included, observations variable y set NA fitting model posterior simulations can obtained trend_model character function specifying time series dynamics latent trend. Options : None (latent trend component; .e. GAM component contributes linear predictor, observation process source error; similarly estimated gam) 'RW' RW() 'AR1' AR(p = 1) 'AR2' AR(p = 2) 'AR3' AR(p = 3) 'VAR1' VAR()(available Stan) 'PWlogistic, 'PWlinear' PW() (available Stan) 'GP' GP() (Gaussian Process squared exponential kernel; available Stan) trend types apart GP() PW(), moving average /correlated process error terms can also estimated (example, RW(cor = TRUE) set multivariate Random Walk n_series > 1). See mvgam_trends details trend_map Optional data.frame specifying series depend latent trends. Useful allowing multiple series depend latent trend process, different observation processes. supplied, latent factor model set setting use_lv = TRUE using mapping set shared trends. Needs column names series trend, integer values trend column state trend series depend . series column single unique entry series data (names perfectly match factor levels series variable data). See examples details use_lv logical. TRUE, use dynamic factors estimate series' latent trends reduced dimension format. available RW(), AR() GP() trend models. Defaults FALSE n_lv integer number latent dynamic factors use use_lv == TRUE. > n_series. Defaults arbitrarily min(2, floor(n_series / 2)) family family specifying exponential observation family series. Currently supported families : nb() count data poisson() count data gaussian() real-valued data betar() proportional data (0,1) lognormal() non-negative real-valued data student_t() real-valued data Gamma() non-negative real-valued data nmix() count data imperfect detection modeled via State-Space N-Mixture model. latent states Poisson, capturing 'true' latent abundance, observation process Binomial account imperfect detection. See mvgam_families example use family Note nb() poisson() available using JAGS backend. Default poisson(). See mvgam_families details priors optional data.frame prior definitions (JAGS Stan syntax). using Stan, can also object class brmsprior (see. prior details). See get_mvgam_priors 'Details' information changing default prior distributions lfo Logical indicating whether part call lfo_cv.mvgam. Returns lighter version model residuals fewer monitored parameters speed post-processing. downstream functions work properly, users always leave set FALSE ... arguments passed mvgam","code":""},{"path":"https://nicholasjclark.github.io/mvgam/reference/update.mvgam.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Update an existing mvgam object — update.mvgam","text":"","code":"if (FALSE) { # Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) summary(mod) # Update to an AR2 model updated_mod <- update(mod, trend_model = 'AR2') summary(updated_mod) # Now update to a Negative Binomial AR1 updated_mod <- update(mod, family = nb()) summary(updated_mod) }"}] ================================================ FILE: docs/sitemap.xml ================================================ https://nicholasjclark.github.io/mvgam/404.html https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html https://nicholasjclark.github.io/mvgam/articles/index.html https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html https://nicholasjclark.github.io/mvgam/articles/nmixtures.html https://nicholasjclark.github.io/mvgam/articles/shared_states.html https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html https://nicholasjclark.github.io/mvgam/authors.html https://nicholasjclark.github.io/mvgam/index.html https://nicholasjclark.github.io/mvgam/LICENSE-text.html https://nicholasjclark.github.io/mvgam/LICENSE.html https://nicholasjclark.github.io/mvgam/reference/add_tweedie_lines.html https://nicholasjclark.github.io/mvgam/reference/all_neon_tick_data.html https://nicholasjclark.github.io/mvgam/reference/code.html https://nicholasjclark.github.io/mvgam/reference/conditional_effects.mvgam.html https://nicholasjclark.github.io/mvgam/reference/dynamic.html https://nicholasjclark.github.io/mvgam/reference/evaluate_mvgams.html https://nicholasjclark.github.io/mvgam/reference/fitted.mvgam.html https://nicholasjclark.github.io/mvgam/reference/forecast.mvgam.html https://nicholasjclark.github.io/mvgam/reference/formula.mvgam.html https://nicholasjclark.github.io/mvgam/reference/get_monitor_pars.html https://nicholasjclark.github.io/mvgam/reference/get_mvgam_priors.html https://nicholasjclark.github.io/mvgam/reference/GP.html https://nicholasjclark.github.io/mvgam/reference/hindcast.mvgam.html https://nicholasjclark.github.io/mvgam/reference/index-mvgam.html https://nicholasjclark.github.io/mvgam/reference/index.html https://nicholasjclark.github.io/mvgam/reference/lfo_cv.mvgam.html https://nicholasjclark.github.io/mvgam/reference/logLik.mvgam.html https://nicholasjclark.github.io/mvgam/reference/loo.mvgam.html https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html https://nicholasjclark.github.io/mvgam/reference/mcmc_plot.mvgam.html https://nicholasjclark.github.io/mvgam/reference/model.frame.mvgam.html https://nicholasjclark.github.io/mvgam/reference/monotonic.html https://nicholasjclark.github.io/mvgam/reference/mvgam-class.html https://nicholasjclark.github.io/mvgam/reference/mvgam.html https://nicholasjclark.github.io/mvgam/reference/mvgam_diagnostics.html https://nicholasjclark.github.io/mvgam/reference/mvgam_draws.html https://nicholasjclark.github.io/mvgam/reference/mvgam_families.html https://nicholasjclark.github.io/mvgam/reference/mvgam_forecast-class.html https://nicholasjclark.github.io/mvgam/reference/mvgam_formulae.html https://nicholasjclark.github.io/mvgam/reference/mvgam_marginaleffects.html https://nicholasjclark.github.io/mvgam/reference/mvgam_trends.html https://nicholasjclark.github.io/mvgam/reference/pairs.mvgam.html https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_fc.html https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_init.html https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_online.html https://nicholasjclark.github.io/mvgam/reference/pfilter_mvgam_smooth.html https://nicholasjclark.github.io/mvgam/reference/piecewise_trends.html https://nicholasjclark.github.io/mvgam/reference/pipe.html https://nicholasjclark.github.io/mvgam/reference/plot.mvgam.html https://nicholasjclark.github.io/mvgam/reference/plot.mvgam_lfo.html https://nicholasjclark.github.io/mvgam/reference/plot_effects.mvgam.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_factors.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_forecasts.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_pterms.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_randomeffects.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_resids.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_series.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_smooth.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_trend.html https://nicholasjclark.github.io/mvgam/reference/plot_mvgam_uncertainty.html https://nicholasjclark.github.io/mvgam/reference/portal_data.html https://nicholasjclark.github.io/mvgam/reference/posterior_epred.mvgam.html https://nicholasjclark.github.io/mvgam/reference/posterior_linpred.mvgam.html https://nicholasjclark.github.io/mvgam/reference/posterior_predict.mvgam.html https://nicholasjclark.github.io/mvgam/reference/ppc.mvgam.html https://nicholasjclark.github.io/mvgam/reference/predict.mvgam.html https://nicholasjclark.github.io/mvgam/reference/print.mvgam.html https://nicholasjclark.github.io/mvgam/reference/residuals.mvgam.html https://nicholasjclark.github.io/mvgam/reference/RW.html https://nicholasjclark.github.io/mvgam/reference/score.mvgam_forecast.html https://nicholasjclark.github.io/mvgam/reference/series_to_mvgam.html https://nicholasjclark.github.io/mvgam/reference/sim_mvgam.html https://nicholasjclark.github.io/mvgam/reference/summary.mvgam.html https://nicholasjclark.github.io/mvgam/reference/update.mvgam.html ================================================ FILE: index.Rmd ================================================ --- output: github_document always_allow_html: true --- # mvgam > **M**ulti**V**ariate (Dynamic) **G**eneralized **A**dditive **M**odels The `mvgam` 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs) that can include highly flexible nonlinear predictor effects, latent variables and multivariate time series models. The package does this by relying on functionalities from the impressive [`brms`](https://paulbuerkner.com/brms/){target="_blank"} and [`mgcv`](https://cran.r-project.org/package=mgcv){target="_blank"} packages. Parameters are estimated using the probabilistic programming language [`Stan`](https://mc-stan.org/), giving users access to the most advanced Bayesian inference algorithms available. This allows `mvgam` to fit a very wide range of models, including: * [Multivariate State-Space Time Series Models](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html){target="_blank"} * [Continuous-Time Autoregressive Time Series Models](https://nicholasjclark.github.io/mvgam/reference/RW.html#ref-examples){target="_blank"} * [Shared Signal Time Series Models](https://nicholasjclark.github.io/mvgam/articles/shared_states.html){target="_blank"} * [Dynamic Factor Models](https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html){target="_blank"} * [Hierarchical N-mixture Models](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html){target="_blank"} * [Hierarchical Generalized Additive Models](https://www.youtube.com/watch?v=2POK_FVwCHk){target="_blank"} * [Joint Species Distribution Models](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html){target="_blank"} ## Installation Install the stable version from `CRAN` using: `install.packages('mvgam')`, or install the development version from `GitHub` using: `devtools::install_github("nicholasjclark/mvgam")`. You will also need a working version of `Stan` installed (along with either `rstan` and/or `cmdstanr`). Please refer to installation links for `Stan` with `rstan` [here](https://mc-stan.org/users/interfaces/rstan){target="_blank"}, or for `Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/){target="_blank"}. ## Introductory seminar
## Cheatsheet [![`mvgam` usage cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.png)](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) ## Getting started `mvgam` was originally designed to analyse and forecast non-negative integer-valued data (counts). These data are traditionally challenging to analyse with existing time-series analysis packages. But further development of `mvgam` has resulted in support for a growing number of observation families that extend to other types of data. Currently, the package can handle data for the following families: * `gaussian()` for real-valued data * `student_t()` for heavy-tailed real-valued data * `lognormal()` for non-negative real-valued data * `Gamma()` for non-negative real-valued data * `betar()` for proportional data on `(0,1)` * `bernoulli()` for binary data * `poisson()` for count data * `nb()` for overdispersed count data * `binomial()` for count data with known number of trials * `beta_binomial()` for overdispersed count data with known number of trials * `nmix()` for count data with imperfect detection (unknown number of trials) See `?mvgam_families` for more information. Below is a simple example for simulating and modelling proportional data with `Beta` observations over a set of seasonal series with independent Gaussian Process dynamic trends: ```{r, include = FALSE} library(mvgam) ``` ```{r} set.seed(100) data <- sim_mvgam( family = betar(), T = 80, trend_model = GP(), prop_trend = 0.5, seasonality = 'shared' ) ``` Plot the series to see how they evolve over time ```{r, eval = FALSE} plot_mvgam_series( data = data$data_train, series = 'all' ) ``` ![Visualizing multivariate proportional time series using the mvgam R package #rstats](man/figures/README-beta_sim-1.png) Fit a State-Space GAM to these series that uses a hierarchical cyclic seasonal smooth term to capture variation in seasonality among series. The model also includes series-specific latent Gaussian Processes with squared exponential covariance functions to capture temporal dynamics ```{r, eval = FALSE} mod <- mvgam( y ~ s(season, bs = 'cc', k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar() ) ``` Plot the estimated posterior hindcast and forecast distributions for each series ```{r eval = FALSE} library(patchwork) fc <- forecast(mod) wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2 ) ``` ![Forecasting multivariate time series with Dynamic Generalized Additive Models](man/figures/README-beta_fc-1.png) Various `S3` functions can be used to inspect parameter estimates, plot smooth functions and residuals, and evaluate models through posterior predictive checks or forecast comparisons. Please see [the package documentation](https://nicholasjclark.github.io/mvgam/reference/index.html) for more detailed examples. ## Vignettes You can set `build_vignettes = TRUE` when installing but be aware this will slow down the installation drastically. Instead, you can always access the vignette htmls online at [https://nicholasjclark.github.io/mvgam/articles/](https://nicholasjclark.github.io/mvgam/articles/) ## Citing `mvgam` and related software When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work (but feel free to ⭐ [the repo](https://github.com/nicholasjclark/mvgam) as well), so we highly encourage you to cite any packages that you rely on for your research. When using `mvgam`, please cite the following: > Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. *Methods in Ecology and Evolution*. DOI: https://doi.org/10.1111/2041-210X.13974 As `mvgam` acts as an interface to `Stan`, please additionally cite: > Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan: A probabilistic programming language. *Journal of Statistical Software*. 76(1). DOI: https://doi.org/10.18637/jss.v076.i01 `mvgam` relies on several other `R` packages and, of course, on `R` itself. Use `how_to_cite()` after you have fitted a model to simplify the process of finding appropriate citations for your software setup. ## Other resources A number of case studies and step-by-step webinars have been compiled to highlight how GAMs and DGAMs can be useful for analysing multivariate data: * [Time series in R and Stan using the `mvgam` package](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3){target="_blank"} * [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"} * [State-Space Vector Autoregressions in `mvgam`](https://ecogambler.netlify.app/blog/vector-autoregressions/){target="_blank"} * [How to interpret and report nonlinear effects from Generalized Additive Models](https://ecogambler.netlify.app/blog/interpreting-gams/){target="_blank"} * [Phylogenetic smoothing using mgcv](https://ecogambler.netlify.app/blog/phylogenetic-smooths-mgcv/){target="_blank"} * [Distributed lags (and hierarchical distributed lags) using mgcv and mvgam](https://ecogambler.netlify.app/blog/distributed-lags-mgcv/){target="_blank"} * [Incorporating time-varying seasonality in forecast models](https://ecogambler.netlify.app/blog/time-varying-seasonality/){target="_blank"} ## Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on [GitHub](https://github.com/nicholasjclark/mvgam/issues). Please also feel free to use the [`mvgam` Discussion Board](https://github.com/nicholasjclark/mvgam/discussions) to hunt for or post other discussion topics related to the package, and do check out the [`mvgam` changelog](https://nicholasjclark.github.io/mvgam/news/index.html) for any updates about recent upgrades that the package has incorporated. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark'at'uq.edu.au). Other contributions are also very welcome, but please see [The Contributor Instructions](https://github.com/nicholasjclark/mvgam/blob/master/.github/CONTRIBUTING.md) for general guidelines. Note that by participating in this project you agree to abide by the terms of its [Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). ================================================ FILE: index.md ================================================ # mvgam > **M**ulti**V**ariate (Dynamic) **G**eneralized **A**dditive **M**odels The `mvgam` 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs) that can include highly flexible nonlinear predictor effects, latent variables and multivariate time series models. The package does this by relying on functionalities from the impressive brms and mgcv packages. Parameters are estimated using the probabilistic programming language [`Stan`](https://mc-stan.org/), giving users access to the most advanced Bayesian inference algorithms available. This allows `mvgam` to fit a very wide range of models, including: - Multivariate State-Space Time Series Models - Continuous-Time Autoregressive Time Series Models - Shared Signal Time Series Models - Dynamic Factor Models - Hierarchical N-mixture Models - Hierarchical Generalized Additive Models - Joint Species Distribution Models ## Installation Install the stable version from `CRAN` using: `install.packages('mvgam')`, or install the development version from `GitHub` using: `devtools::install_github("nicholasjclark/mvgam")`. You will also need a working version of `Stan` installed (along with either `rstan` and/or `cmdstanr`). Please refer to installation links for `Stan` with `rstan` here, or for `Stan` with `cmdstandr` here. ## Introductory seminar
## Cheatsheet [![`mvgam` usage cheatsheet](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.png)](https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf) ## Getting started `mvgam` was originally designed to analyse and forecast non-negative integer-valued data (counts). These data are traditionally challenging to analyse with existing time-series analysis packages. But further development of `mvgam` has resulted in support for a growing number of observation families that extend to other types of data. Currently, the package can handle data for the following families: - `gaussian()` for real-valued data - `student_t()` for heavy-tailed real-valued data - `lognormal()` for non-negative real-valued data - `Gamma()` for non-negative real-valued data - `betar()` for proportional data on `(0,1)` - `bernoulli()` for binary data - `poisson()` for count data - `nb()` for overdispersed count data - `binomial()` for count data with known number of trials - `beta_binomial()` for overdispersed count data with known number of trials - `nmix()` for count data with imperfect detection (unknown number of trials) See `?mvgam_families` for more information. Below is a simple example for simulating and modelling proportional data with `Beta` observations over a set of seasonal series with independent Gaussian Process dynamic trends: set.seed(100) data <- sim_mvgam( family = betar(), T = 80, trend_model = GP(), prop_trend = 0.5, seasonality = 'shared' ) Plot the series to see how they evolve over time plot_mvgam_series( data = data$data_train, series = 'all' )
Visualizing multivariate proportional time series using the mvgam R package #rstats
Fit a State-Space GAM to these series that uses a hierarchical cyclic seasonal smooth term to capture variation in seasonality among series. The model also includes series-specific latent Gaussian Processes with squared exponential covariance functions to capture temporal dynamics mod <- mvgam( y ~ s(season, bs = 'cc', k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar() ) Plot the estimated posterior hindcast and forecast distributions for each series library(patchwork) fc <- forecast(mod) wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2 )
Forecasting multivariate time series with Dynamic Generalized Additive Models
Various `S3` functions can be used to inspect parameter estimates, plot smooth functions and residuals, and evaluate models through posterior predictive checks or forecast comparisons. Please see [the package documentation](https://nicholasjclark.github.io/mvgam/reference/index.html) for more detailed examples. ## Vignettes You can set `build_vignettes = TRUE` when installing but be aware this will slow down the installation drastically. Instead, you can always access the vignette htmls online at ## Citing `mvgam` and related software When using any software please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work (but feel free to ⭐ [the repo](https://github.com/nicholasjclark/mvgam) as well), so we highly encourage you to cite any packages that you rely on for your research. When using `mvgam`, please cite the following: > Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models > (DGAMs) for forecasting discrete ecological time series. *Methods in > Ecology and Evolution*. DOI: As `mvgam` acts as an interface to `Stan`, please additionally cite: > Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., > Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). > Stan: A probabilistic programming language. *Journal of Statistical > Software*. 76(1). DOI: `mvgam` relies on several other `R` packages and, of course, on `R` itself. Use `how_to_cite()` after you have fitted a model to simplify the process of finding appropriate citations for your software setup. ## Other resources A number of case studies and step-by-step webinars have been compiled to highlight how GAMs and DGAMs can be useful for analysing multivariate data: - Time series in R and Stan using the mvgam package - Ecological Forecasting with Dynamic Generalized Additive Models - State-Space Vector Autoregressions in mvgam - How to interpret and report nonlinear effects from Generalized Additive Models - Phylogenetic smoothing using mgcv - Distributed lags (and hierarchical distributed lags) using mgcv and mvgam - Incorporating time-varying seasonality in forecast models ## Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on [GitHub](https://github.com/nicholasjclark/mvgam/issues). Please also feel free to use the [`mvgam` Discussion Board](https://github.com/nicholasjclark/mvgam/discussions) to hunt for or post other discussion topics related to the package, and do check out the [`mvgam` changelog](https://nicholasjclark.github.io/mvgam/news/index.html) for any updates about recent upgrades that the package has incorporated. ## Interested in contributing? I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark’at’uq.edu.au). Other contributions are also very welcome, but please see [The Contributor Instructions](https://github.com/nicholasjclark/mvgam/blob/master/.github/CONTRIBUTING.md) for general guidelines. Note that by participating in this project you agree to abide by the terms of its [Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). ================================================ FILE: inst/CITATION ================================================ citHeader("To cite mvgam in publications use:") bibentry( bibtype = "Article", title = "Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series", author = c(person(given = "Nicholas J", family = "Clark"), person(given = "Konstans", family = "Wells")), journal = "Methods in Ecology and Evolution", year = "2023", volume = "14", number = "", pages = "771-784", doi = "10.18637/jss.v100.i05", textVersion = paste( "Clark & Wells (2023).", "Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series.", "Methods in Ecology and Evolution, 14, 771-784.", "doi.org/10.1111/2041-210X.13974" ), encoding = "UTF-8" ) ================================================ FILE: inst/doc/data_in_mvgam.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- simdat <- sim_mvgam( n_series = 4, T = 24, prop_missing = 0.2 ) head(simdat$data_train, 16) ## -------------------------------------------------------------------------------- class(simdat$data_train$series) levels(simdat$data_train$series) ## -------------------------------------------------------------------------------- all( levels(simdat$data_train$series) %in% unique(simdat$data_train$series) ) ## -------------------------------------------------------------------------------- summary(glm( y ~ series + time, data = simdat$data_train, family = poisson() )) ## -------------------------------------------------------------------------------- summary(mgcv::gam( y ~ series + s(time, by = series), data = simdat$data_train, family = poisson() )) ## -------------------------------------------------------------------------------- gauss_dat <- data.frame( outcome = rnorm(10), series = factor("series1", levels = "series1"), time = 1:10 ) gauss_dat ## -------------------------------------------------------------------------------- mgcv::gam(outcome ~ time, family = betar(), data = gauss_dat) ## ----error=TRUE------------------------------------------------------------------ try({ mvgam(outcome ~ time, family = betar(), data = gauss_dat) }) ## -------------------------------------------------------------------------------- # A function to ensure all timepoints within a sequence are identical all_times_avail <- function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame( series = simdat$data_train$series, time = simdat$data_train$time ) %>% dplyr::group_by(series) %>% dplyr::summarise( all_there = all_times_avail( time, min_time, max_time ) ) -> checked_times if (any(checked_times$all_there == FALSE)) { warning( "One or more series in is missing observations for one or more timepoints" ) } else { cat("All series have observations at all timepoints :)") } ## -------------------------------------------------------------------------------- bad_times <- data.frame( time = seq(1, 16, by = 2), series = factor("series_1"), outcome = rnorm(8) ) bad_times ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian()) }) ## -------------------------------------------------------------------------------- bad_times %>% dplyr::right_join(expand.grid( time = seq( min(bad_times$time), max(bad_times$time) ), series = factor(unique(bad_times$series), levels = levels(bad_times$series)) )) %>% dplyr::arrange(time) -> good_times good_times ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian()) }) ## -------------------------------------------------------------------------------- bad_levels <- data.frame( time = 1:8, series = factor( "series_1", levels = c( "series_1", "series_2" ) ), outcome = rnorm(8) ) levels(bad_levels$series) ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian()) }) ## -------------------------------------------------------------------------------- setdiff(levels(bad_levels$series), unique(bad_levels$series)) ## -------------------------------------------------------------------------------- bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors( outcome ~ 1, data = good_levels, family = gaussian() ) }) ## -------------------------------------------------------------------------------- miss_dat <- data.frame( outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor("series1", levels = "series1"), time = 1:10 ) miss_dat ## ----error = TRUE---------------------------------------------------------------- try({ get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) }) ## -------------------------------------------------------------------------------- miss_dat <- list( outcome = rnorm(10), series = factor("series1", levels = "series1"), time = 1:10 ) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2, 3] <- NA ## ----error=TRUE------------------------------------------------------------------ try({ get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) }) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, y = "y", series = "all" ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, y = "y", series = 1 ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, y = "y", series = 1 ) ## -------------------------------------------------------------------------------- data("all_neon_tick_data") str(dplyr::ungroup(all_neon_tick_data)) ## -------------------------------------------------------------------------------- plotIDs <- c( "SCBI_013", "SCBI_002", "SERC_001", "SERC_005", "SERC_006", "SERC_012", "BLAN_012", "BLAN_005" ) ## -------------------------------------------------------------------------------- model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) ## -------------------------------------------------------------------------------- model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid( plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52) )) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join( all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct() ) -> model_dat ## -------------------------------------------------------------------------------- model_dat %>% dplyr::mutate( series = plotID, y = target ) %>% dplyr::mutate( siteID = factor(siteID), series = factor(series) ) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat ## -------------------------------------------------------------------------------- model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat ## -------------------------------------------------------------------------------- levels(model_dat$series) ## ----error=TRUE------------------------------------------------------------------ try({ get_mvgam_priors( y ~ 1, data = model_dat, family = poisson() ) }) ## -------------------------------------------------------------------------------- testmod <- mvgam( y ~ s(epiWeek, by = series, bs = "cc") + s(series, bs = "re"), trend_model = AR(), data = model_dat, backend = "cmdstanr", run_model = FALSE ) ## -------------------------------------------------------------------------------- str(testmod$model_data) ## -------------------------------------------------------------------------------- stancode(testmod) ================================================ FILE: inst/doc/data_in_mvgam.Rmd ================================================ --- title: "Formatting data for use in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Formatting data for use in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how to take raw data and format it for use in `mvgam`. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for `mvgam`. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html) and [the growing set of walk through video tutorials on `mvgam` applications](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3&si=lyg7qUrMLbD-tHCB). ## Required *tidy* data format Manipulating the data into a 'long' format (i.e. *tidy* format) is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to pass as data for to the two primary modelling functions, `mvgam()` and `jsdgam()`. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} simdat <- sim_mvgam( n_series = 4, T = 24, prop_missing = 0.2 ) head(simdat$data_train, 16) ``` ### `series` as a `factor` variable Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a `factor` variable. ```{r} class(simdat$data_train$series) levels(simdat$data_train$series) ``` It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including `mvgam()` and `get_mvgam_priors()`) will give an error if this is not the case, but it may be worth checking anyway: ```{r} all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series)) ``` Note that you can technically supply data that does not have a `series` indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the `?AR` documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the `?ZMVN` documentation). ### A single outcome variable You may also have notices that we do not spread the `numeric / integer`-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data (though the outcome does not have to be labelled `y`). This is another important requirement in `mvgam`, but it shouldn't be too unfamiliar to `R` users who frequently use modelling packages such as `lme4`, `mgcv`, `brms` or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series: ```{r} summary(glm( y ~ series + time, data = simdat$data_train, family = poisson() )) ``` ```{r} summary(mgcv::gam( y ~ series + s(time, by = series), data = simdat$data_train, family = poisson() )) ``` Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values `>= 1` or `<= 0` are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in `R` will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. `mvgam` will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be `< 1`) and attempt a Beta regression in `mvgam` using the `betar` family: ```{r} gauss_dat <- data.frame( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) gauss_dat ``` A call to `gam()` using the `mgcv` package leads to a model that actually fits (though it does give an unhelpful warning message): ```{r} mgcv::gam(outcome ~ time, family = betar(), data = gauss_dat ) ``` But the same call to `mvgam()` gives us something more useful: ```{r error=TRUE} mvgam(outcome ~ time, family = betar(), data = gauss_dat ) ``` Please see `?mvgam_families` for more information on the types of responses that the package can handle and their restrictions ### A `time` variable The other requirement for most models that can be fit in `mvgam` is a `numeric / integer`-classed variable labelled `time`. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in `mvgam` (see `?mvgam_trends` for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc...). But note that you can have missing observations for some (or all) series. `mvgam()` will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple `dplyr` call: ```{r} # A function to ensure all timepoints within a sequence are identical all_times_avail <- function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame( series = simdat$data_train$series, time = simdat$data_train$time ) %>% dplyr::group_by(series) %>% dplyr::summarise(all_there = all_times_avail( time, min_time, max_time )) -> checked_times if (any(checked_times$all_there == FALSE)) { warning("One or more series in is missing observations for one or more timepoints") } else { cat("All series have observations at all timepoints :)") } ``` Note that models which use dynamic components will assume that smaller values of `time` are *older* (i.e. `time = 1` came *before* `time = 2`, etc...) ### Irregular sampling intervals? Most `mvgam` dynamic trend models expect `time` to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the `CAR()` trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the **Examples** section in `?CAR`. You can also use `trend_model = 'None'` (the default in `mvgam()`) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the `?brms::gp` for details. But to reiterate the point from above, if you do not have time series data (or don't want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use `trend_model = ZMVN(...)` without the need for a `time` variable (see `?ZMVN` for details). ## Checking data with `get_mvgam_priors()` The `get_mvgam_priors()` function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn't anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to `get_mvgam_priors()`. Here we first simulate some data in which some of the timepoints in the `time` variable are not included in the data: ```{r} bad_times <- data.frame( time = seq(1, 16, by = 2), series = factor("series_1"), outcome = rnorm(8) ) bad_times ``` Next we call `get_mvgam_priors()` by simply specifying an intercept-only model, which is enough to trigger all the checks: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian() ) ``` This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don't have any covariates, it should be pretty easy using `expand.grid()`: ```{r} bad_times %>% dplyr::right_join(expand.grid( time = seq( min(bad_times$time), max(bad_times$time) ), series = factor(unique(bad_times$series), levels = levels(bad_times$series) ) )) %>% dplyr::arrange(time) -> good_times good_times ``` Now the call to `get_mvgam_priors()`, using our filled in data, should work: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian() ) ``` This function should also pick up on misaligned factor levels for the `series` variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data: ```{r} bad_levels <- data.frame( time = 1:8, series = factor("series_1", levels = c( "series_1", "series_2" ) ), outcome = rnorm(8) ) levels(bad_levels$series) ``` Another call to `get_mvgam_priors()` brings up a useful error: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian() ) ``` Following the message's advice tells us there is a level for `series_2` in the `series` variable, but there are no observations for this series in the data: ```{r} setdiff(levels(bad_levels$series), unique(bad_levels$series)) ``` Re-assigning the levels fixes the issue: ```{r} bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ 1, data = good_levels, family = gaussian() ) ``` ### Covariates with no `NA`s Covariates can be used in models just as you would when using `mgcv` (see `?formula.gam` for details of the formula syntax). But although the outcome variable can have `NA`s, covariates cannot. Most regression software will silently drop any raws in the model matrix that have `NA`s, which is not helpful when debugging. Both the `mvgam()` and `get_mvgam_priors()` functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values: ```{r} miss_dat <- data.frame( outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` Just like with the `mgcv` package, `mvgam` can also accept data as a `list` object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by `mvgam` should still work on these data. Here we change the `cov` predictor to be a `matrix`: ```{r} miss_dat <- list( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2, 3] <- NA ``` A call to `get_mvgam_priors()` returns the same error: ```{r error=TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` ## Plotting with `plot_mvgam_series()` Plotting the data is a useful way to ensure everything looks ok, once you've gone throug the above checks on factor levels and timepoint x series combinations. The `plot_mvgam_series()` function will take supplied data and plot either a series of line plots (if you choose `series = 'all'`) or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = "all" ) ``` Or we can look more closely at the distribution for the first time series: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = 1 ) ``` If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, y = "y", series = 1 ) ``` ## Example with NEON tick data To give one example of how data can be reformatted for `mvgam` modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. *Ixodes scapularis* is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised [long-term monitoring of tick abundances as well as other important indicators of ecological change](https://www.neonscience.org/data-collection/ticks){target="_blank"}. Nymphal abundance of *I. scapularis* is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs. We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in [Clark & Wells 2022](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13974){target="_blank"}. You can read a bit about the data using the call `?all_neon_tick_data` ```{r} data("all_neon_tick_data") str(dplyr::ungroup(all_neon_tick_data)) ``` For this exercise, we will use the `epiWeek` variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the `plotID` column): ```{r} plotIDs <- c( "SCBI_013", "SCBI_002", "SERC_001", "SERC_005", "SERC_006", "SERC_012", "BLAN_012", "BLAN_005" ) ``` Now we can select the target species we want (*I. scapularis*), filter to the correct plot IDs and convert the `epiWeek` variable from `character` to `numeric`: ```{r} model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) ``` Now is the tricky part: we need to fill in missing observations with `NA`s. The tick data are sparse in that field observers do not go out and sample in each possible `epiWeek`. So there are many particular weeks in which observations are not included in the data. But we can use `expand.grid()` again to take care of this: ```{r} model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid( plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52) )) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join(all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct()) -> model_dat ``` Create the `series` variable needed for `mvgam` modelling: ```{r} model_dat %>% dplyr::mutate( series = plotID, y = target ) %>% dplyr::mutate( siteID = factor(siteID), series = factor(series) ) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat ``` Now create the `time` variable, which needs to track `Year` and `epiWeek` for each unique series. The `n` function from `dplyr` is often useful if generating a `time` index for grouped dataframes: ```{r} model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat ``` Check factor levels for the `series`: ```{r} levels(model_dat$series) ``` This looks good, as does a more rigorous check using `get_mvgam_priors()`: ```{r error=TRUE} get_mvgam_priors( y ~ 1, data = model_dat, family = poisson() ) ``` We can also set up a model in `mvgam()` but use `run_model = FALSE` to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the `cmdstanr` backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated `Stan` code for any inefficiencies that can be fixed to lead to sampling performance improvements: ```{r} testmod <- mvgam( y ~ s(epiWeek, by = series, bs = "cc") + s(series, bs = "re"), trend_model = AR(), data = model_dat, backend = "cmdstanr", run_model = FALSE ) ``` This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling: ```{r} str(testmod$model_data) ``` ```{r} stancode(testmod) ``` ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/data_in_mvgam.html ================================================ Formatting data for use in mvgam

Formatting data for use in mvgam

Nicholas J Clark

2026-01-19

This vignette gives an example of how to take raw data and format it for use in mvgam. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for mvgam. For full details on the basic mvgam functionality, please see the introductory vignette and the growing set of walk through video tutorials on mvgam applications.

Required tidy data format

Manipulating the data into a ‘long’ format (i.e. tidy format) is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to pass as data for to the two primary modelling functions, mvgam() and jsdgam(). A simple example can be viewed by simulating data using the sim_mvgam() function. See ?sim_mvgam for more details

simdat <- sim_mvgam(
  n_series = 4, 
  T = 24, 
  prop_missing = 0.2
)
head(simdat$data_train, 16)
#>     y season year   series time
#> 1   3      1    1 series_1    1
#> 2   2      1    1 series_2    1
#> 3   2      1    1 series_3    1
#> 4   7      1    1 series_4    1
#> 5   1      2    1 series_1    2
#> 6   3      2    1 series_2    2
#> 7   3      2    1 series_3    2
#> 8   1      2    1 series_4    2
#> 9   1      3    1 series_1    3
#> 10  4      3    1 series_2    3
#> 11  4      3    1 series_3    3
#> 12 NA      3    1 series_4    3
#> 13 NA      4    1 series_1    4
#> 14  2      4    1 series_2    4
#> 15  2      4    1 series_3    4
#> 16  5      4    1 series_4    4

series as a factor variable

Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a factor variable.

class(simdat$data_train$series)
#> [1] "factor"
levels(simdat$data_train$series)
#> [1] "series_1" "series_2" "series_3" "series_4"

It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including mvgam() and get_mvgam_priors()) will give an error if this is not the case, but it may be worth checking anyway:

all(levels(simdat$data_train$series) %in% 
      unique(simdat$data_train$series))
#> [1] TRUE

Note that you can technically supply data that does not have a series indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the ?AR documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the ?ZMVN documentation).

A single outcome variable

You may also have notices that we do not spread the numeric / integer-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data (though the outcome does not have to be labelled y). This is another important requirement in mvgam, but it shouldn’t be too unfamiliar to R users who frequently use modelling packages such as lme4, mgcv, brms or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series:

summary(glm(
  y ~ series + time,
  data = simdat$data_train,
  family = poisson()
))
#> 
#> Call:
#> glm(formula = y ~ series + time, family = poisson(), data = simdat$data_train)
#> 
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)    -0.31987    0.37647  -0.850 0.395515    
#> seriesseries_2  1.28070    0.37645   3.402 0.000669 ***
#> seriesseries_3  1.18080    0.38064   3.102 0.001921 ** 
#> seriesseries_4  1.17583    0.38161   3.081 0.002061 ** 
#> time           -0.01996    0.01888  -1.057 0.290507    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 115.3  on 59  degrees of freedom
#> Residual deviance:  96.7  on 55  degrees of freedom
#>   (12 observations deleted due to missingness)
#> AIC: 214.96
#> 
#> Number of Fisher Scoring iterations: 5
summary(mgcv::gam(
  y ~ series + s(time, by = series),
  data = simdat$data_train,
  family = poisson()
))
#> 
#> Family: poisson 
#> Link function: log 
#> 
#> Formula:
#> y ~ series + s(time, by = series)
#> 
#> Parametric coefficients:
#>                Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)     -0.8004     0.4355  -1.838  0.06608 . 
#> seriesseries_2   0.9043     0.5742   1.575  0.11526   
#> seriesseries_3   1.4777     0.4741   3.117  0.00183 **
#> seriesseries_4   1.3673     0.4806   2.845  0.00445 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                          edf Ref.df Chi.sq p-value  
#> s(time):seriesseries_1 1.000  1.000  4.784  0.0287 *
#> s(time):seriesseries_2 5.767  6.810 15.826  0.0213 *
#> s(time):seriesseries_3 1.000  1.000  0.214  0.6433  
#> s(time):seriesseries_4 3.589  4.434 10.772  0.0395 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.541   Deviance explained = 60.4%
#> UBRE = 0.27372  Scale est. = 1         n = 60

Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values >= 1 or <= 0 are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in R will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. mvgam will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be < 1) and attempt a Beta regression in mvgam using the betar family:

gauss_dat <- data.frame(
  outcome = rnorm(10),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
gauss_dat
#>        outcome  series time
#> 1  -0.57990666 series1    1
#> 2  -0.86642679 series1    2
#> 3   0.20127362 series1    3
#> 4   1.36763744 series1    4
#> 5  -0.03516434 series1    5
#> 6   0.23979092 series1    6
#> 7   0.01013158 series1    7
#> 8  -0.54771525 series1    8
#> 9  -0.48140890 series1    9
#> 10 -1.20075974 series1   10

A call to gam() using the mgcv package leads to a model that actually fits (though it does give an unhelpful warning message):

mgcv::gam(outcome ~ time,
  family = betar(),
  data = gauss_dat
)
#> 
#> Family: Beta regression(0.124) 
#> Link function: logit 
#> 
#> Formula:
#> outcome ~ time
#> Total model degrees of freedom 2 
#> 
#> REML score: -180.7085

But the same call to mvgam() gives us something more useful:

mvgam(outcome ~ time,
  family = betar(),
  data = gauss_dat
)
#> Error: Values <= 0 not allowed for beta responses

Please see ?mvgam_families for more information on the types of responses that the package can handle and their restrictions

A time variable

The other requirement for most models that can be fit in mvgam is a numeric / integer-classed variable labelled time. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in mvgam (see ?mvgam_trends for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc…). But note that you can have missing observations for some (or all) series. mvgam() will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple dplyr call:

# A function to ensure all timepoints within a sequence are identical
all_times_avail <- function(time, min_time, max_time) {
  identical(
    as.numeric(sort(time)),
    as.numeric(seq.int(from = min_time, to = max_time))
  )
}

# Get min and max times from the data
min_time <- min(simdat$data_train$time)
max_time <- max(simdat$data_train$time)

# Check that all times are recorded for each series
data.frame(
  series = simdat$data_train$series,
  time = simdat$data_train$time
) %>%
  dplyr::group_by(series) %>%
  dplyr::summarise(all_there = all_times_avail(
    time,
    min_time,
    max_time
  )) -> checked_times
if (any(checked_times$all_there == FALSE)) {
  warning("One or more series in is missing observations for one or more timepoints")
} else {
  cat("All series have observations at all timepoints :)")
}
#> All series have observations at all timepoints :)

Note that models which use dynamic components will assume that smaller values of time are older (i.e. time = 1 came before time = 2, etc…)

Irregular sampling intervals?

Most mvgam dynamic trend models expect time to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the CAR() trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the Examples section in ?CAR. You can also use trend_model = 'None' (the default in mvgam()) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the ?brms::gp for details. But to reiterate the point from above, if you do not have time series data (or don’t want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use trend_model = ZMVN(...) without the need for a time variable (see ?ZMVN for details).

Checking data with get_mvgam_priors()

The get_mvgam_priors() function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn’t anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to get_mvgam_priors(). Here we first simulate some data in which some of the timepoints in the time variable are not included in the data:

bad_times <- data.frame(
  time = seq(1, 16, by = 2),
  series = factor("series_1"),
  outcome = rnorm(8)
)
bad_times
#>   time   series    outcome
#> 1    1 series_1  1.6357848
#> 2    3 series_1 -0.3858940
#> 3    5 series_1  1.7655861
#> 4    7 series_1 -1.4477319
#> 5    9 series_1 -1.0557525
#> 6   11 series_1  0.4308398
#> 7   13 series_1  1.9072537
#> 8   15 series_1  0.1525545

Next we call get_mvgam_priors() by simply specifying an intercept-only model, which is enough to trigger all the checks:

get_mvgam_priors(outcome ~ 1,
  data = bad_times,
  family = gaussian()
)
#> Error: One or more series in data is missing observations for one or more timepoints

This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don’t have any covariates, it should be pretty easy using expand.grid():

bad_times %>%
  dplyr::right_join(expand.grid(
    time = seq(
      min(bad_times$time),
      max(bad_times$time)
    ),
    series = factor(unique(bad_times$series),
      levels = levels(bad_times$series)
    )
  )) %>%
  dplyr::arrange(time) -> good_times
good_times
#>    time   series    outcome
#> 1     1 series_1  1.6357848
#> 2     2 series_1         NA
#> 3     3 series_1 -0.3858940
#> 4     4 series_1         NA
#> 5     5 series_1  1.7655861
#> 6     6 series_1         NA
#> 7     7 series_1 -1.4477319
#> 8     8 series_1         NA
#> 9     9 series_1 -1.0557525
#> 10   10 series_1         NA
#> 11   11 series_1  0.4308398
#> 12   12 series_1         NA
#> 13   13 series_1  1.9072537
#> 14   14 series_1         NA
#> 15   15 series_1  0.1525545

Now the call to get_mvgam_priors(), using our filled in data, should work:

get_mvgam_priors(outcome ~ 1,
  data = good_times,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                   example_change
#> 1 (Intercept) ~ student_t(3, 0.3, 2.5);      (Intercept) ~ normal(0, 1);
#> 2  sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.76, 0.83);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

This function should also pick up on misaligned factor levels for the series variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data:

bad_levels <- data.frame(
  time = 1:8,
  series = factor("series_1",
    levels = c(
      "series_1",
      "series_2"
    )
  ),
  outcome = rnorm(8)
)

levels(bad_levels$series)
#> [1] "series_1" "series_2"

Another call to get_mvgam_priors() brings up a useful error:

get_mvgam_priors(outcome ~ 1,
  data = bad_levels,
  family = gaussian()
)
#> Error: Mismatch between factor levels of "series" and unique values of "series"
#> Use
#>   `setdiff(levels(data$series), unique(data$series))` 
#> and
#>   `intersect(levels(data$series), unique(data$series))`
#> for guidance

Following the message’s advice tells us there is a level for series_2 in the series variable, but there are no observations for this series in the data:

setdiff(levels(bad_levels$series), 
        unique(bad_levels$series))
#> [1] "series_2"

Re-assigning the levels fixes the issue:

bad_levels %>%
  dplyr::mutate(series = droplevels(series)) -> good_levels
levels(good_levels$series)
#> [1] "series_1"
get_mvgam_priors(
  outcome ~ 1,
  data = good_levels,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                  prior                  example_change
#> 1  (Intercept) ~ student_t(3, 0, 2.5);     (Intercept) ~ normal(0, 1);
#> 2 sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(0.46, 0.96);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

Covariates with no NAs

Covariates can be used in models just as you would when using mgcv (see ?formula.gam for details of the formula syntax). But although the outcome variable can have NAs, covariates cannot. Most regression software will silently drop any raws in the model matrix that have NAs, which is not helpful when debugging. Both the mvgam() and get_mvgam_priors() functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values:

miss_dat <- data.frame(
  outcome = rnorm(10),
  cov = c(NA, rnorm(9)),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
miss_dat
#>       outcome          cov  series time
#> 1  -0.5965054           NA series1    1
#> 2   0.2126416  0.154650377 series1    2
#> 3   0.9601485  1.553717403 series1    3
#> 4  -0.8857684 -0.507988552 series1    4
#> 5  -0.4037936  0.245187700 series1    5
#> 6  -0.4738641 -0.009847922 series1    6
#> 7  -1.2390329  0.342620485 series1    7
#> 8   1.9631220 -0.642393988 series1    8
#> 9  -1.6783068 -1.335488789 series1    9
#> 10 -1.3909946 -0.254555529 series1   10
get_mvgam_priors(
  outcome ~ cov,
  data = miss_dat,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                  cov            1     cov fixed effect
#> 3 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                    prior                   example_change
#> 1 (Intercept) ~ student_t(3, -0.5, 2.5);      (Intercept) ~ normal(0, 1);
#> 2              cov ~ student_t(3, 0, 2);              cov ~ normal(0, 1);
#> 3   sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.43, 0.49);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA

Just like with the mgcv package, mvgam can also accept data as a list object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by mvgam should still work on these data. Here we change the cov predictor to be a matrix:

miss_dat <- list(
  outcome = rnorm(10),
  series = factor("series1",
    levels = "series1"
  ),
  time = 1:10
)
miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10)
miss_dat$cov[2, 3] <- NA

A call to get_mvgam_priors() returns the same error:

get_mvgam_priors(
  outcome ~ cov,
  data = miss_dat,
  family = gaussian()
)
#>                             param_name param_length           param_info
#> 1                          (Intercept)            1          (Intercept)
#> 2                                 cov1            1    cov1 fixed effect
#> 3                                 cov2            1    cov2 fixed effect
#> 4                                 cov3            1    cov3 fixed effect
#> 5                                 cov4            1    cov4 fixed effect
#> 6                                 cov5            1    cov5 fixed effect
#> 7 vector<lower=0>[n_series] sigma_obs;            1 observation error sd
#>                                   prior                   example_change
#> 1 (Intercept) ~ student_t(3, 0.2, 2.5);      (Intercept) ~ normal(0, 1);
#> 2            cov1 ~ student_t(3, 0, 2);             cov1 ~ normal(0, 1);
#> 3            cov2 ~ student_t(3, 0, 2);             cov2 ~ normal(0, 1);
#> 4            cov3 ~ student_t(3, 0, 2);             cov3 ~ normal(0, 1);
#> 5            cov4 ~ student_t(3, 0, 2);             cov4 ~ normal(0, 1);
#> 6            cov5 ~ student_t(3, 0, 2);             cov5 ~ normal(0, 1);
#> 7  sigma_obs ~ inv_gamma(1.418, 0.452); sigma_obs ~ normal(-0.86, 0.33);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA
#> 3             NA             NA
#> 4             NA             NA
#> 5             NA             NA
#> 6             NA             NA
#> 7             NA             NA

Plotting with plot_mvgam_series()

Plotting the data is a useful way to ensure everything looks ok, once you’ve gone throug the above checks on factor levels and timepoint x series combinations. The plot_mvgam_series() function will take supplied data and plot either a series of line plots (if you choose series = 'all') or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use:

plot_mvgam_series(
  data = simdat$data_train,
  y = "y",
  series = "all"
)

Plotting time series features for GAM models in mvgam

Or we can look more closely at the distribution for the first time series:

plot_mvgam_series(
  data = simdat$data_train,
  y = "y",
  series = 1
)

Plotting time series features for GAM models in mvgam

If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots:

plot_mvgam_series(
  data = simdat$data_train,
  newdata = simdat$data_test,
  y = "y",
  series = 1
)

Plotting time series features for GAM models in mvgam

Example with NEON tick data

To give one example of how data can be reformatted for mvgam modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. Ixodes scapularis is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised long-term monitoring of tick abundances as well as other important indicators of ecological change. Nymphal abundance of I. scapularis is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs.

We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in Clark & Wells 2022. You can read a bit about the data using the call ?all_neon_tick_data

data("all_neon_tick_data")
str(dplyr::ungroup(all_neon_tick_data))
#> tibble [3,505 × 24] (S3: tbl_df/tbl/data.frame)
#>  $ Year                : num [1:3505] 2015 2015 2015 2015 2015 ...
#>  $ epiWeek             : chr [1:3505] "37" "38" "39" "40" ...
#>  $ yearWeek            : chr [1:3505] "201537" "201538" "201539" "201540" ...
#>  $ plotID              : chr [1:3505] "BLAN_005" "BLAN_005" "BLAN_005" "BLAN_005" ...
#>  $ siteID              : chr [1:3505] "BLAN" "BLAN" "BLAN" "BLAN" ...
#>  $ nlcdClass           : chr [1:3505] "deciduousForest" "deciduousForest" "deciduousForest" "deciduousForest" ...
#>  $ decimalLatitude     : num [1:3505] 39.1 39.1 39.1 39.1 39.1 ...
#>  $ decimalLongitude    : num [1:3505] -78 -78 -78 -78 -78 ...
#>  $ elevation           : num [1:3505] 168 168 168 168 168 ...
#>  $ totalSampledArea    : num [1:3505] 162 NA NA NA 162 NA NA NA NA 164 ...
#>  $ amblyomma_americanum: num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ ixodes_scapularis   : num [1:3505] 2 NA NA NA 0 NA NA NA NA 0 ...
#>  $ time                : Date[1:3505], format: "2015-09-13" "2015-09-20" ...
#>  $ RHMin_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMin_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_precent       : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ RHMax_variance      : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMin_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_degC     : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ airTempMax_variance : num [1:3505] NA NA NA NA NA NA NA NA NA NA ...
#>  $ soi                 : num [1:3505] -18.4 -17.9 -23.5 -28.4 -25.9 ...
#>  $ cum_sdd             : num [1:3505] 173 173 173 173 173 ...
#>  $ cum_gdd             : num [1:3505] 1129 1129 1129 1129 1129 ...

For this exercise, we will use the epiWeek variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the plotID column):

plotIDs <- c(
  "SCBI_013", "SCBI_002",
  "SERC_001", "SERC_005",
  "SERC_006", "SERC_012",
  "BLAN_012", "BLAN_005"
)

Now we can select the target species we want (I. scapularis), filter to the correct plot IDs and convert the epiWeek variable from character to numeric:

model_dat <- all_neon_tick_data %>%
  dplyr::ungroup() %>%
  dplyr::mutate(target = ixodes_scapularis) %>%
  dplyr::filter(plotID %in% plotIDs) %>%
  dplyr::select(Year, epiWeek, plotID, target) %>%
  dplyr::mutate(epiWeek = as.numeric(epiWeek))

Now is the tricky part: we need to fill in missing observations with NAs. The tick data are sparse in that field observers do not go out and sample in each possible epiWeek. So there are many particular weeks in which observations are not included in the data. But we can use expand.grid() again to take care of this:

model_dat %>%
  # Create all possible combos of plotID, Year and epiWeek;
  # missing outcomes will be filled in as NA
  dplyr::full_join(expand.grid(
    plotID = unique(model_dat$plotID),
    Year = unique(model_dat$Year),
    epiWeek = seq(1, 52)
  )) %>%
  # left_join back to original data so plotID and siteID will
  # match up, in case you need the siteID for anything else later on
  dplyr::left_join(all_neon_tick_data %>%
    dplyr::select(siteID, plotID) %>%
    dplyr::distinct()) -> model_dat

Create the series variable needed for mvgam modelling:

model_dat %>%
  dplyr::mutate(
    series = plotID,
    y = target
  ) %>%
  dplyr::mutate(
    siteID = factor(siteID),
    series = factor(series)
  ) %>%
  dplyr::select(-target, -plotID) %>%
  dplyr::arrange(Year, epiWeek, series) -> model_dat

Now create the time variable, which needs to track Year and epiWeek for each unique series. The n function from dplyr is often useful if generating a time index for grouped dataframes:

model_dat %>%
  dplyr::ungroup() %>%
  dplyr::group_by(series) %>%
  dplyr::arrange(Year, epiWeek) %>%
  dplyr::mutate(time = seq(1, dplyr::n())) %>%
  dplyr::ungroup() -> model_dat

Check factor levels for the series:

levels(model_dat$series)
#> [1] "BLAN_005" "BLAN_012" "SCBI_002" "SCBI_013" "SERC_001" "SERC_005" "SERC_006"
#> [8] "SERC_012"

This looks good, as does a more rigorous check using get_mvgam_priors():

get_mvgam_priors(
  y ~ 1,
  data = model_dat,
  family = poisson()
)
#>    param_name param_length  param_info                                  prior
#> 1 (Intercept)            1 (Intercept) (Intercept) ~ student_t(3, -2.3, 2.5);
#>                example_change new_lowerbound new_upperbound
#> 1 (Intercept) ~ normal(0, 1);             NA             NA

We can also set up a model in mvgam() but use run_model = FALSE to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the cmdstanr backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated Stan code for any inefficiencies that can be fixed to lead to sampling performance improvements:

testmod <- mvgam(
  y ~ s(epiWeek, by = series, bs = "cc") +
    s(series, bs = "re"),
  trend_model = AR(),
  data = model_dat,
  backend = "cmdstanr",
  run_model = FALSE
)

This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling:

str(testmod$model_data)
#> List of 25
#>  $ y           : num [1:416, 1:8] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
#>  $ n           : int 416
#>  $ X           : num [1:3328, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:3328] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ zero        : num [1:73] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ S2          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S3          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S4          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S5          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S6          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S7          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ S8          : num [1:8, 1:8] 1.037 -0.416 0.419 0.117 0.188 ...
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 1.02
#>  $ ytimes      : int [1:416, 1:8] 1 9 17 25 33 41 49 57 65 73 ...
#>  $ n_series    : int 8
#>  $ sp          : Named num [1:9] 0.368 0.368 0.368 0.368 0.368 ...
#>   ..- attr(*, "names")= chr [1:9] "s(epiWeek):seriesBLAN_005" "s(epiWeek):seriesBLAN_012" "s(epiWeek):seriesSCBI_002" "s(epiWeek):seriesSCBI_013" ...
#>  $ y_observed  : num [1:416, 1:8] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ total_obs   : int 3328
#>  $ num_basis   : int 73
#>  $ n_sp        : num 9
#>  $ n_nonmissing: int 400
#>  $ obs_ind     : int [1:400] 89 93 98 101 115 118 121 124 127 130 ...
#>  $ flat_ys     : num [1:400] 2 0 0 0 0 0 0 25 36 14 ...
#>  $ flat_xs     : num [1:400, 1:73] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:400] "705" "737" "777" "801" ...
#>   .. ..$ : chr [1:73] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "AR1"
stancode(testmod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   matrix[8, 8] S2; // mgcv smooth penalty matrix S2
#>   matrix[8, 8] S3; // mgcv smooth penalty matrix S3
#>   matrix[8, 8] S4; // mgcv smooth penalty matrix S4
#>   matrix[8, 8] S5; // mgcv smooth penalty matrix S5
#>   matrix[8, 8] S6; // mgcv smooth penalty matrix S6
#>   matrix[8, 8] S7; // mgcv smooth penalty matrix S7
#>   matrix[8, 8] S8; // mgcv smooth penalty matrix S8
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 65] = b_raw[1 : 65];
#>   b[66 : 73] = mu_raw[1] + b_raw[66 : 73] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ inv_gamma(1.418, 0.452);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, -2.3, 2.5);
#>   
#>   // prior for s(epiWeek):seriesBLAN_005...
#>   b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior for s(epiWeek):seriesBLAN_012...
#>   b_raw[10 : 17] ~ multi_normal_prec(zero[10 : 17],
#>                                      S2[1 : 8, 1 : 8] * lambda[2]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_002...
#>   b_raw[18 : 25] ~ multi_normal_prec(zero[18 : 25],
#>                                      S3[1 : 8, 1 : 8] * lambda[3]);
#>   
#>   // prior for s(epiWeek):seriesSCBI_013...
#>   b_raw[26 : 33] ~ multi_normal_prec(zero[26 : 33],
#>                                      S4[1 : 8, 1 : 8] * lambda[4]);
#>   
#>   // prior for s(epiWeek):seriesSERC_001...
#>   b_raw[34 : 41] ~ multi_normal_prec(zero[34 : 41],
#>                                      S5[1 : 8, 1 : 8] * lambda[5]);
#>   
#>   // prior for s(epiWeek):seriesSERC_005...
#>   b_raw[42 : 49] ~ multi_normal_prec(zero[42 : 49],
#>                                      S6[1 : 8, 1 : 8] * lambda[6]);
#>   
#>   // prior for s(epiWeek):seriesSERC_006...
#>   b_raw[50 : 57] ~ multi_normal_prec(zero[50 : 57],
#>                                      S7[1 : 8, 1 : 8] * lambda[7]);
#>   
#>   // prior for s(epiWeek):seriesSERC_012...
#>   b_raw[58 : 65] ~ multi_normal_prec(zero[58 : 65],
#>                                      S8[1 : 8, 1 : 8] * lambda[8]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[66 : 73] ~ std_normal();
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ inv_gamma(1.418, 0.452);
#>   
#>   // trend estimates
#>   trend[1, 1 : n_series] ~ normal(0, sigma);
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] ~ normal(ar1[s] * trend[1 : (n - 1), s], sigma[s]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Further reading

The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice:

Clark, Nicholas J. and Wells, K. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. (2023): 14, 771-784.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

de Sousa, Heitor C., et al. Severe fire regimes decrease resilience of ectothermic populations. Journal of Animal Ecology (2024): 93(11), 1656-1669.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis (2023): 179, 107659.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Zhu, L., et al. Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts. Functional Ecology (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/forecast_evaluation.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(1) simdat <- sim_mvgam( T = 100, n_series = 3, mu = 2, trend_model = GP(), prop_trend = 0.75, family = poisson(), prop_missing = 0.10 ) ## -------------------------------------------------------------------------------- str(simdat) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, series = "all" ) ## ----fig.alt = "Plotting time series features for GAM models in mvgam"----------- plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 1 ) ## ----include=FALSE--------------------------------------------------------------- mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, newdata = simdat$data_test ) ## ----eval=FALSE------------------------------------------------------------------ # mod1 <- mvgam( # y ~ s(season, bs = "cc", k = 8) + # s(time, by = series, bs = "cr", k = 20), # knots = list(season = c(0.5, 12.5)), # trend_model = "None", # data = simdat$data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod1, include_betas = FALSE) ## ----fig.alt = "Plotting GAM smooth functions using mvgam"----------------------- conditional_effects(mod1, type = "link") ## ----include=FALSE, message=FALSE------------------------------------------------ mod2 <- mvgam( y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ## ----eval=FALSE------------------------------------------------------------------ # mod2 <- mvgam(y ~ 1, # trend_formula = ~ s(season, bs = "cc", k = 8) - 1, # trend_knots = list(season = c(0.5, 12.5)), # trend_model = AR(cor = TRUE), # noncentred = TRUE, # data = simdat$data_train, # silent = 1 # ) ## -------------------------------------------------------------------------------- summary(mod2, include_betas = FALSE) ## ----fig.alt = "Summarising latent Gaussian Process parameters in mvgam"--------- mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas") ## ----fig.alt = "Summarising latent Gaussian Process parameters in mvgam"--------- mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas") ## ----fig.alt = "Plotting latent Gaussian Process effects in mvgam and marginaleffects"---- conditional_effects(mod2, type = "link") ## -------------------------------------------------------------------------------- fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) ## -------------------------------------------------------------------------------- str(fc_mod1) ## -------------------------------------------------------------------------------- plot(fc_mod1, series = 1) plot(fc_mod2, series = 1) plot(fc_mod1, series = 2) plot(fc_mod2, series = 2) ## ----include=FALSE--------------------------------------------------------------- mod2 <- mvgam( y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod2 <- mvgam(y ~ 1, # trend_formula = ~ s(season, bs = "cc", k = 8) - 1, # trend_knots = list(season = c(0.5, 12.5)), # trend_model = AR(cor = TRUE), # noncentred = TRUE, # data = simdat$data_train, # newdata = simdat$data_test, # silent = 2 # ) ## -------------------------------------------------------------------------------- fc_mod2 <- forecast(mod2) ## ----warning=FALSE, fig.alt = "Plotting posterior forecast distributions using mvgam and R"---- plot(fc_mod2, series = 1) ## ----warning=FALSE--------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps") str(crps_mod1) crps_mod1$series_1 ## ----warning=FALSE--------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6) crps_mod1$series_1 ## -------------------------------------------------------------------------------- link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link") score(link_mod1, score = "elpd")$series_1 ## -------------------------------------------------------------------------------- energy_mod2 <- score(fc_mod2, score = "energy") str(energy_mod2) ## -------------------------------------------------------------------------------- energy_mod2$all_series ## -------------------------------------------------------------------------------- crps_mod1 <- score(fc_mod1, score = "crps") crps_mod2 <- score(fc_mod2, score = "crps") diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title( main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) ) ) ================================================ FILE: inst/doc/forecast_evaluation.Rmd ================================================ --- title: "Forecasting and forecast evaluation in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Forecasting and forecast evaluation in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules. ## Simulating discrete time series We begin by simulating some data to show how forecasts are computed and evaluated in `mvgam`. The `sim_mvgam()` function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting `trend_model = GP()` and `prop_trend = 0.75`, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing. ```{r} set.seed(1) simdat <- sim_mvgam( T = 100, n_series = 3, mu = 2, trend_model = GP(), prop_trend = 0.75, family = poisson(), prop_missing = 0.10 ) ``` The returned object is a `list` containing training and testing data (`sim_mvgam()` automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data ```{r} str(simdat) ``` Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, series = "all" ) ``` For individual series, we can plot the training and testing data, as well as some more specific features of the observed data: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 1 ) ``` ### Modelling dynamics with splines The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible: ```{r include=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, newdata = simdat$data_test ) ``` ```{r eval=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, bs = "cr", k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, silent = 2 ) ``` The model fits without issue: ```{r} summary(mod1, include_betas = FALSE) ``` And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear ```{r, fig.alt = "Plotting GAM smooth functions using mvgam"} conditional_effects(mod1, type = "link") ``` ### Modelling dynamics with a correlated AR1 Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See `?AR` for more details. ```{r include=FALSE, message=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` The summary for this model now contains information on the autoregressive and process error parameters for each time series: ```{r} summary(mod2, include_betas = FALSE) ``` We can plot the posteriors for these parameters, and for any other parameter for that matter, using `bayesplot` routines. First the autoregressive parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas") ``` And now the variance ($\sigma$) parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas") ``` We can again plot the conditional seasonal effect: ```{r, fig.alt = "Plotting latent Gaussian Process effects in mvgam and marginaleffects"} conditional_effects(mod2, type = "link") ``` The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts ## Forecasting with the `forecast()` function Probabilistic forecasts can be computed in two main ways in `mvgam`. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding `newdata` to the `forecast()` function. It is crucial that any `newdata` fed to the `forecast()` function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the `forecast()` function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see `?forecast.mvgam` for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions ```{r} fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) ``` The objects we have created are of class `mvgam_forecast`, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data: ```{r} str(fc_mod1) ``` We can plot the forecasts for some series from each model using the `S3 plot` method for objects of this class: ```{r} plot(fc_mod1, series = 1) plot(fc_mod2, series = 1) plot(fc_mod1, series = 2) plot(fc_mod2, series = 2) ``` Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment. ## Forecasting with `newdata` in `mvgam()` The second way we can produce forecasts in `mvgam` is to feed the testing data directly to the `mvgam()` function as `newdata`. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the `generated quantities` block in `Stan`. As an example, we can refit `mod2` but include the testing data for automatic forecasts: ```{r include=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` Because the model already contains a forecast distribution, we do not need to feed `newdata` to the `forecast()` function: ```{r} fc_mod2 <- forecast(mod2) ``` The forecasts will be nearly identical to those calculated previously: ```{r warning=FALSE, fig.alt = "Plotting posterior forecast distributions using mvgam and R"} plot(fc_mod2, series = 1) ``` ## Scoring forecast distributions A primary purpose of the `mvgam_forecast` class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See `?mvgam::score.mvgam_forecast` to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution. ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps") str(crps_mod1) crps_mod1$series_1 ``` The returned list contains a `data.frame` for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the `in_interval` column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval: ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6) crps_mod1$series_1 ``` We can also compare forecasts against out of sample observations using the [Expected Log Predictive Density (ELPD; also known as the log score)](https://link.springer.com/article/10.1007/s11222-016-9696-4){target="_blank"}. The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the `forecast()` function: ```{r} link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link") score(link_mod1, score = "elpd")$series_1 ``` Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. `mvgam` offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute: ```{r} energy_mod2 <- score(fc_mod2, score = "energy") str(energy_mod2) ``` The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the `all_series` slot): ```{r} energy_mod2$all_series ``` You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (`mod2`) is better, while a positive value means the spline model (`mod1`) is better. ```{r} crps_mod1 <- score(fc_mod1, score = "crps") crps_mod2 <- score(fc_mod2, score = "crps") diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) ``` The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data ## Further reading The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules: Clark N.J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ* 13:e18929 (2025) https://doi.org/10.7717/peerj.18929 Hyndman, Rob J., and George Athanasopoulos. [Forecasting: principles and practice](https://otexts.com/fpp3/distaccuracy.html). *OTexts*, (2018). Gneiting, Tilmann, and Adrian E. Raftery. [Strictly proper scoring rules, prediction, and estimation](https://www.tandfonline.com/doi/abs/10.1198/016214506000001437) *Journal of the American statistical Association* 102.477 (2007) 359-378. Simonis, Juniper L., et al. [Evaluating probabilistic ecological forecasts](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecy.3431) *Ecology* 102.8 (2021) e03431. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/forecast_evaluation.html ================================================ Forecasting and forecast evaluation in mvgam

Forecasting and forecast evaluation in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules.

Simulating discrete time series

We begin by simulating some data to show how forecasts are computed and evaluated in mvgam. The sim_mvgam() function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting trend_model = GP() and prop_trend = 0.75, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing.

set.seed(1)
simdat <- sim_mvgam(
  T = 100,
  n_series = 3,
  mu = 2,
  trend_model = GP(),
  prop_trend = 0.75,
  family = poisson(),
  prop_missing = 0.10
)

The returned object is a list containing training and testing data (sim_mvgam() automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data

str(simdat)
#> List of 6
#>  $ data_train        :'data.frame':  225 obs. of  5 variables:
#>   ..$ y     : int [1:225] 6 NA 11 2 5 20 7 8 NA 11 ...
#>   ..$ season: int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>   ..$ year  : int [1:225] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:225] 1 1 1 2 2 2 3 3 3 4 ...
#>  $ data_test         :'data.frame':  75 obs. of  5 variables:
#>   ..$ y     : int [1:75] 4 23 8 3 NA 3 1 20 8 3 ...
#>   ..$ season: int [1:75] 4 4 4 5 5 5 6 6 6 7 ...
#>   ..$ year  : int [1:75] 7 7 7 7 7 7 7 7 7 7 ...
#>   ..$ series: Factor w/ 3 levels "series_1","series_2",..: 1 2 3 1 2 3 1 2 3 1 ...
#>   ..$ time  : int [1:75] 76 76 76 77 77 77 78 78 78 79 ...
#>  $ true_corrs        : num [1:3, 1:3] 1 0.0861 0.1161 0.0861 1 ...
#>  $ true_trends       : num [1:100, 1:3] -0.851 -0.758 -0.664 -0.571 -0.48 ...
#>  $ global_seasonality: num [1:100] -0.966 -0.197 0.771 1.083 0.37 ...
#>  $ trend_params      :List of 2
#>   ..$ alpha: num [1:3] 0.883 0.936 1.036
#>   ..$ rho  : num [1:3] 7.54 4.01 7.49

Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts:

plot_mvgam_series(
  data = simdat$data_train,
  series = "all"
)

Plotting time series features for GAM models in mvgam

For individual series, we can plot the training and testing data, as well as some more specific features of the observed data:

plot_mvgam_series(
  data = simdat$data_train,
  newdata = simdat$data_test,
  series = 1
)

Plotting time series features for GAM models in mvgam

Modelling dynamics with splines

The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible:

mod1 <- mvgam(
  y ~ s(season, bs = "cc", k = 8) +
    s(time, by = series, bs = "cr", k = 20),
  knots = list(season = c(0.5, 12.5)),
  trend_model = "None",
  data = simdat$data_train,
  silent = 2
)

The model fits without issue:

summary(mod1, include_betas = FALSE)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#> <environment: 0x0000022e41d5a728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.9 1.9     2    1   979
#> 
#> Approximate significance of GAM smooths:
#>                          edf Ref.df Chi.sq  p-value    
#> s(season)              3.418      6  22.91  < 2e-16 ***
#> s(time):seriesseries_1 8.763     19  30.85 4.25e-06 ***
#> s(time):seriesseries_2 9.635     19  41.49  < 2e-16 ***
#> s(time):seriesseries_3 6.676     19  56.28   0.0862 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✖ 51 of 2000 iterations saturated the maximum tree depth of 10 (2.55%)
#>     Try a larger max_treedepth to avoid saturation
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear

conditional_effects(mod1, type = "link")

Plotting GAM smooth functions using mvgamPlotting GAM smooth functions using mvgam

Modelling dynamics with a correlated AR1

Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See ?AR for more details.

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  silent = 1
)

The summary for this model now contains information on the autoregressive and process error parameters for each time series:

summary(mod2, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x0000022e41d5a728>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 8) - 1
#> <environment: 0x0000022e41d5a728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(cor = TRUE)
#> 
#> N process models:
#> 3 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)  1.8   2   2.4 1.01   512
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.23 0.32  0.44 1.01   318
#> sigma[2] 0.30 0.43  0.58 1.01   498
#> sigma[3] 0.18 0.25  0.36 1.01   329
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.75 0.89  0.99 1.01   438
#> ar1[2] 0.66 0.83  0.96 1.02   379
#> ar1[3] 0.87 0.96  1.00 1.01   478
#> 
#> Approximate significance of GAM process smooths:
#>             edf Ref.df Chi.sq  p-value    
#> s(season) 1.737      6  23.81 1.22e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

We can plot the posteriors for these parameters, and for any other parameter for that matter, using bayesplot routines. First the autoregressive parameters:

mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

And now the variance (\(\sigma\)) parameters:

mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas")

Summarising latent Gaussian Process parameters in mvgam

We can again plot the conditional seasonal effect:

conditional_effects(mod2, type = "link")

Plotting latent Gaussian Process effects in mvgam and marginaleffects

The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts

Forecasting with the forecast() function

Probabilistic forecasts can be computed in two main ways in mvgam. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding newdata to the forecast() function. It is crucial that any newdata fed to the forecast() function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the forecast() function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see ?forecast.mvgam for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions

fc_mod1 <- forecast(mod1, newdata = simdat$data_test)
fc_mod2 <- forecast(mod2, newdata = simdat$data_test)

The objects we have created are of class mvgam_forecast, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data:

str(fc_mod1)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20)
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000022e41d5a728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:75] 6 2 7 11 8 6 9 11 7 4 ...
#>   ..$ series_2: int [1:75] NA 5 8 2 1 NA 2 4 0 2 ...
#>   ..$ series_3: int [1:75] 11 20 NA 36 44 34 57 50 26 28 ...
#>  $ train_times       :List of 3
#>   ..$ series_1: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ series_2: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ series_3: int [1:75] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:25] 4 3 1 3 1 NA NA 7 9 8 ...
#>   ..$ series_2: int [1:25] 23 NA 20 20 14 7 6 6 6 1 ...
#>   ..$ series_3: int [1:25] 8 3 8 3 NA 1 1 9 8 NA ...
#>  $ test_times        :List of 3
#>   ..$ series_1: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>   ..$ series_2: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>   ..$ series_3: int [1:25] 76 77 78 79 80 81 82 83 84 85 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:75] 3 3 1 0 2 3 5 5 2 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:2000, 1:75] 3 4 2 5 7 8 7 2 7 10 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:2000, 1:75] 11 28 13 12 14 20 12 7 31 27 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:75] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: num [1:2000, 1:25] 4 1 1 5 5 6 2 3 2 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,1]" "ypred[77,1]" "ypred[78,1]" "ypred[79,1]" ...
#>   ..$ series_2: num [1:2000, 1:25] 26 33 21 34 12 28 33 16 23 39 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,2]" "ypred[77,2]" "ypred[78,2]" "ypred[79,2]" ...
#>   ..$ series_3: num [1:2000, 1:25] 10 5 3 7 4 2 8 11 3 10 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:25] "ypred[76,3]" "ypred[77,3]" "ypred[78,3]" "ypred[79,3]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

We can plot the forecasts for some series from each model using the S3 plot method for objects of this class:

plot(fc_mod1, series = 1)

plot(fc_mod2, series = 1)


plot(fc_mod1, series = 2)

plot(fc_mod2, series = 2)

Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment.

Forecasting with newdata in mvgam()

The second way we can produce forecasts in mvgam is to feed the testing data directly to the mvgam() function as newdata. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the generated quantities block in Stan. As an example, we can refit mod2 but include the testing data for automatic forecasts:

mod2 <- mvgam(y ~ 1,
  trend_formula = ~ s(season, bs = "cc", k = 8) - 1,
  trend_knots = list(season = c(0.5, 12.5)),
  trend_model = AR(cor = TRUE),
  noncentred = TRUE,
  data = simdat$data_train,
  newdata = simdat$data_test,
  silent = 2
)

Because the model already contains a forecast distribution, we do not need to feed newdata to the forecast() function:

fc_mod2 <- forecast(mod2)

The forecasts will be nearly identical to those calculated previously:

plot(fc_mod2, series = 1)

Plotting posterior forecast distributions using mvgam and R

Scoring forecast distributions

A primary purpose of the mvgam_forecast class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See ?mvgam::score.mvgam_forecast to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution.

crps_mod1 <- score(fc_mod1, score = "crps")
str(crps_mod1)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 0.993 0.817 0.334 0.998 0.277 ...
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_2  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 2.01 NA 6.55 14.69 17.43 ...
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 0 0 0 0 0 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ series_3  :'data.frame':  25 obs. of  5 variables:
#>   ..$ score         : num [1:25] 3.487 0.463 4.064 0.5 NA ...
#>   ..$ in_interval   : num [1:25] 0 1 0 1 NA 1 1 0 0 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type    : chr [1:25] "crps" "crps" "crps" "crps" ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 6.49 NA 10.95 16.18 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "sum_crps" "sum_crps" "sum_crps" "sum_crps" ...
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   0.9932195           1            0.9            1       crps
#> 2   0.8173380           1            0.9            2       crps
#> 3   0.3338443           1            0.9            3       crps
#> 4   0.9980710           1            0.9            4       crps
#> 5   0.2773030           1            0.9            5       crps
#> 6          NA          NA            0.9            6       crps
#> 7          NA          NA            0.9            7       crps
#> 8   6.1295615           0            0.9            8       crps
#> 9   8.2855480           0            0.9            9       crps
#> 10  7.4110365           0            0.9           10       crps
#> 11 21.3898007           0            0.9           11       crps
#> 12 35.2857677           0            0.9           12       crps
#> 13 37.2882082           0            0.9           13       crps
#> 14 36.4251945           0            0.9           14       crps
#> 15 39.3858395           0            0.9           15       crps
#> 16 42.3677532           0            0.9           16       crps
#> 17 42.5461592           0            0.9           17       crps
#> 18 12.7316780           0            0.9           18       crps
#> 19 13.7700235           0            0.9           19       crps
#> 20  9.7282697           0            0.9           20       crps
#> 21  4.7711443           0            0.9           21       crps
#> 22  4.8054445           0            0.9           22       crps
#> 23  2.7825032           0            0.9           23       crps
#> 24  0.8591737           1            0.9           24       crps
#> 25  3.7808390           0            0.9           25       crps

The returned list contains a data.frame for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the in_interval column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval:

crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6)
crps_mod1$series_1
#>         score in_interval interval_width eval_horizon score_type
#> 1   0.9932195           1            0.6            1       crps
#> 2   0.8173380           1            0.6            2       crps
#> 3   0.3338443           1            0.6            3       crps
#> 4   0.9980710           1            0.6            4       crps
#> 5   0.2773030           1            0.6            5       crps
#> 6          NA          NA            0.6            6       crps
#> 7          NA          NA            0.6            7       crps
#> 8   6.1295615           0            0.6            8       crps
#> 9   8.2855480           0            0.6            9       crps
#> 10  7.4110365           0            0.6           10       crps
#> 11 21.3898007           0            0.6           11       crps
#> 12 35.2857677           0            0.6           12       crps
#> 13 37.2882082           0            0.6           13       crps
#> 14 36.4251945           0            0.6           14       crps
#> 15 39.3858395           0            0.6           15       crps
#> 16 42.3677532           0            0.6           16       crps
#> 17 42.5461592           0            0.6           17       crps
#> 18 12.7316780           0            0.6           18       crps
#> 19 13.7700235           0            0.6           19       crps
#> 20  9.7282697           0            0.6           20       crps
#> 21  4.7711443           0            0.6           21       crps
#> 22  4.8054445           0            0.6           22       crps
#> 23  2.7825032           0            0.6           23       crps
#> 24  0.8591737           0            0.6           24       crps
#> 25  3.7808390           0            0.6           25       crps

We can also compare forecasts against out of sample observations using the Expected Log Predictive Density (ELPD; also known as the log score). The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the forecast() function:

link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link")
score(link_mod1, score = "elpd")$series_1
#>         score eval_horizon score_type
#> 1   -2.155156            1       elpd
#> 2   -1.956860            2       elpd
#> 3   -1.242909            3       elpd
#> 4   -2.208022            4       elpd
#> 5   -1.218081            5       elpd
#> 6          NA            6       elpd
#> 7          NA            7       elpd
#> 8   -7.113590            8       elpd
#> 9   -8.499192            9       elpd
#> 10  -7.975085           10       elpd
#> 11 -18.627673           11       elpd
#> 12 -30.187736           12       elpd
#> 13 -28.528770           13       elpd
#> 14 -27.474431           14       elpd
#> 15 -27.138400           15       elpd
#> 16 -24.018949           16       elpd
#> 17 -28.766709           17       elpd
#> 18  -9.455606           18       elpd
#> 19 -10.169118           19       elpd
#> 20  -7.741233           20       elpd
#> 21  -6.998068           21       elpd
#> 22  -7.030657           22       elpd
#> 23  -5.715523           23       elpd
#> 24  -3.015423           24       elpd
#> 25  -6.271717           25       elpd

Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. mvgam offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute:

energy_mod2 <- score(fc_mod2, score = "energy")
str(energy_mod2)
#> List of 4
#>  $ series_1  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 1 NA NA 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_2  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 NA 1 1 1 1 1 1 1 1 ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ series_3  :'data.frame':  25 obs. of  3 variables:
#>   ..$ in_interval   : num [1:25] 1 1 1 1 NA 1 1 1 1 NA ...
#>   ..$ interval_width: num [1:25] 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
#>   ..$ eval_horizon  : int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ all_series:'data.frame':  25 obs. of  3 variables:
#>   ..$ score       : num [1:25] 4.74 NA 5.03 5.36 NA ...
#>   ..$ eval_horizon: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ score_type  : chr [1:25] "energy" "energy" "energy" "energy" ...

The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the all_series slot):

energy_mod2$all_series
#>        score eval_horizon score_type
#> 1   4.736450            1     energy
#> 2         NA            2     energy
#> 3   5.025547            3     energy
#> 4   5.363993            4     energy
#> 5         NA            5     energy
#> 6         NA            6     energy
#> 7         NA            7     energy
#> 8   3.918395            8     energy
#> 9   4.113319            9     energy
#> 10        NA           10     energy
#> 11 13.149358           11     energy
#> 12 22.547040           12     energy
#> 13        NA           13     energy
#> 14 21.170257           14     energy
#> 15 24.184433           15     energy
#> 16 25.110374           16     energy
#> 17 27.945911           17     energy
#> 18  6.180386           18     energy
#> 19 10.674543           19     energy
#> 20  4.093666           20     energy
#> 21  2.870332           21     energy
#> 22  3.443291           22     energy
#> 23        NA           23     energy
#> 24  8.866093           24     energy
#> 25  7.883124           25     energy

You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (mod2) is better, while a positive value means the spline model (mod1) is better.

crps_mod1 <- score(fc_mod1, score = "crps")
crps_mod2 <- score(fc_mod2, score = "crps")

diff_scores <- crps_mod2$series_1$score -
  crps_mod1$series_1$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))



diff_scores <- crps_mod2$series_2$score -
  crps_mod1$series_2$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))


diff_scores <- crps_mod2$series_3$score -
  crps_mod1$series_3$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(CRPS[AR1] ~ -~ CRPS[spline])
)
abline(h = 0, lty = "dashed", lwd = 2)
ar1_better <- length(which(diff_scores < 0))
title(main = paste0(
  "AR(1) better in ",
  ar1_better,
  " of ",
  length(diff_scores),
  " evaluations",
  "\nMean difference = ",
  round(mean(diff_scores, na.rm = TRUE), 2)
))

The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data

Further reading

The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules:

Clark N.J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ 13:e18929 (2025) https://doi.org/10.7717/peerj.18929

Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles and practice. OTexts, (2018).

Gneiting, Tilmann, and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation Journal of the American statistical Association 102.477 (2007) 359-378.

Simonis, Juniper L., et al. Evaluating probabilistic ecological forecasts Ecology 102.8 (2021) e03431.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/mvgam_overview.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## ----Access time series data----------------------------------------------------- data("portal_data") ## ----Inspect data format and structure------------------------------------------- head(portal_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(portal_data) ## -------------------------------------------------------------------------------- data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ## ----Wrangle data for modelling-------------------------------------------------- portal_data %>% # Filter the data to only contain captures of the 'PP' dplyr::filter(series == 'PP') %>% droplevels() %>% dplyr::mutate(count = captures) %>% # Add a 'year' variable dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data ## -------------------------------------------------------------------------------- head(model_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) ## ----Summarise variables--------------------------------------------------------- summary(model_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = model_data, series = 1, y = "count") ## -------------------------------------------------------------------------------- model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) levels(model_data$year_fac) ## ----model1, include=FALSE, results='hide'--------------------------------------- model1 <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model1 <- mvgam( # count ~ s(year_fac, bs = "re") - 1, # family = poisson(), # data = model_data # ) ## -------------------------------------------------------------------------------- get_mvgam_priors( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ## -------------------------------------------------------------------------------- summary(model1) ## ----Extract coefficient posteriors---------------------------------------------- beta_post <- as.data.frame(model1, variable = "betas") dplyr::glimpse(beta_post) ## -------------------------------------------------------------------------------- stancode(model1) ## ----Plot random effect estimates------------------------------------------------ plot(model1, type = "re") ## -------------------------------------------------------------------------------- mcmc_plot( object = model1, variable = "betas", type = "areas" ) ## -------------------------------------------------------------------------------- pp_check(object = model1) ## ----Plot posterior hindcasts---------------------------------------------------- plot(model1, type = "forecast") ## ----Extract posterior hindcast-------------------------------------------------- hc <- hindcast(model1) str(hc) ## ----Extract hindcasts on the linear predictor scale----------------------------- hc <- hindcast(model1, type = "link") range(hc$hindcasts$PP) ## ----Plot posterior residuals---------------------------------------------------- plot(model1, type = "residuals") ## -------------------------------------------------------------------------------- model_data %>% dplyr::filter(time <= 70) -> data_train model_data %>% dplyr::filter(time > 70) -> data_test ## ----include=FALSE, message=FALSE, warning=FALSE--------------------------------- model1b <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model1b <- mvgam( # count ~ s(year_fac, bs = "re") - 1, # family = poisson(), # data = data_train, # newdata = data_test # ) ## ----Plotting predictions against test data-------------------------------------- plot(model1b, type = "forecast", newdata = data_test) ## ----Extract posterior forecasts------------------------------------------------- fc <- forecast(model1b) str(fc) ## ----model2, include=FALSE, message=FALSE, warning=FALSE------------------------- model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model2 <- mvgam( # count ~ s(year_fac, bs = "re") + # ndvi_ma12 - 1, # family = poisson(), # data = data_train, # newdata = data_test # ) ## ----class.output="scroll-300"--------------------------------------------------- summary(model2) ## ----Posterior quantiles of model coefficients----------------------------------- coef(model2) ## -------------------------------------------------------------------------------- beta_post <- as.data.frame(model2, variable = "betas") dplyr::glimpse(beta_post) ## ----Histogram of NDVI effects--------------------------------------------------- hist( beta_post$ndvi_ma12, xlim = c( -1 * max(abs(beta_post$ndvi_ma12)), max(abs(beta_post$ndvi)) ), col = "darkred", border = "white", xlab = expression(beta[NDVI]), ylab = "", yaxt = "n", main = "", lwd = 2 ) abline(v = 0, lwd = 2.5) ## ----warning=FALSE--------------------------------------------------------------- conditional_effects(model2) ## ----model3, include=FALSE, message=FALSE, warning=FALSE------------------------- model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model3 <- mvgam( # count ~ s(time, bs = "bs", k = 15) + # ndvi_ma12, # family = poisson(), # data = data_train, # newdata = data_test # ) ## -------------------------------------------------------------------------------- summary(model3) ## ----warning=FALSE--------------------------------------------------------------- conditional_effects(model3, type = "link") ## ----class.output="scroll-300"--------------------------------------------------- stancode(model3) ## -------------------------------------------------------------------------------- plot(model3, type = "forecast", newdata = data_test) ## ----Plot extrapolated temporal functions using newdata-------------------------- plot_mvgam_smooth( model3, smooth = "s(time)", # pass newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame( time = 1:max(data_test$time), ndvi_ma12 = 0 ) ) abline(v = max(data_train$time), lty = "dashed", lwd = 2) ## ----model4, include=FALSE------------------------------------------------------- model4 <- mvgam( count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR(), parallel = FALSE ) ## ----eval=FALSE------------------------------------------------------------------ # model4 <- mvgam( # count ~ s(ndvi_ma12, k = 6), # family = poisson(), # data = data_train, # newdata = data_test, # trend_model = AR() # ) ## ----Summarise the mvgam autocorrelated error model, class.output="scroll-300"---- summary(model4) ## -------------------------------------------------------------------------------- plot(model4, type = "forecast", newdata = data_test) ## -------------------------------------------------------------------------------- plot(model4, type = "trend", newdata = data_test) ## -------------------------------------------------------------------------------- loo_compare(model3, model4) ## -------------------------------------------------------------------------------- fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = "drps") score_mod4 <- score(fc_mod4, score = "drps") sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ================================================ FILE: inst/doc/mvgam_overview.Rmd ================================================ --- title: "Overview of the mvgam package" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Overview of the mvgam package} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to give a general overview of the `mvgam` package and its primary functions. ## Dynamic GAMs `mvgam` is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal *trend* evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"}. Briefly, assume $\tilde{\boldsymbol{y}}_{i,t}$ is the conditional expectation of response variable $\boldsymbol{i}$ at time $\boldsymbol{t}$. Assuming $\boldsymbol{y_i}$ is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as: $$for~i~in~1:N_{series}~...$$ $$for~t~in~1:N_{timepoints}~...$$ $$g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,$$ Here $\alpha$ are the unknown intercepts, the $\boldsymbol{s}$'s are unknown smooth functions of covariates ($\boldsymbol{x}$'s), which can potentially vary among the response series, and $\boldsymbol{z}$ are dynamic latent processes. Each smooth function $\boldsymbol{s_j}$ is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between $\boldsymbol{x}_{j}$ and $g^{-1}(\tilde{\boldsymbol{y}})$. The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see [this blogpost on how to interpret nonlinear effects from Generalized Additive Models](https://ecogambler.netlify.app/blog/interpreting-gams/){target="_blank"}. Latent processes are captured with $\boldsymbol{Z}\boldsymbol{z}_{i,t}$, where $\boldsymbol{Z}$ is an $i~by~k$ matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and $\boldsymbol{z}_{k,t}$ are a set of $K$ latent factors that can also include their own GAM linear predictors (see the [State-Space models vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html)), the [N-mixtures vignette](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html) and the example in [`jsdgam`](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html) to get an idea of how flexible these processes can be. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. `mvgam` supports a number of different observation families, which are summarized below: ## Supported observation families |Distribution | Function | Support | Extra parameter(s) | |:----------------:|:---------------:| :------------------------------------------------:|:--------------------:| |Gaussian (identity link) | `gaussian()` | Real values in $(-\infty, \infty)$ | $\sigma$ | |Student's T (identity link) | `student-t()` | Heavy-tailed real values in $(-\infty, \infty)$ | $\sigma$, $\nu$ | |LogNormal (identity link) | `lognormal()` | Positive real values in $[0, \infty)$ | $\sigma$ | |Gamma (log link) | `Gamma()` | Positive real values in $[0, \infty)$ | $\alpha$ | |Beta (logit link) | `betar()` | Real values (proportional) in $[0,1]$ | $\phi$ | |Bernoulli (logit link) | `bernoulli()` | Binary data in ${0,1}$ | - | |Poisson (log link) | `poisson()` | Non-negative integers in $(0,1,2,...)$ | - | |Negative Binomial2 (log link)| `nb()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Binomial (logit link) | `binomial()` | Non-negative integers in $(0,1,2,...)$ | - | |Beta-Binomial (logit link) | `beta_binomial()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Poisson Binomial N-mixture (log link)| `nmix()` | Non-negative integers in $(0,1,2,...)$ | - | For all supported observation families, any extra parameters that need to be estimated (i.e. the $\sigma$ in a Gaussian model or the $\phi$ in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using `share_obs_params = TRUE` in `mvgam()`. Note that default link functions cannot currently be changed. ## Supported temporal dynamic processes As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the `mvgam()` function, the user chooses between different process models with the `trend_model` argument. Available process models are described in detail below. ### Correlated multivariate processes If more than one observational unit (usually referred to as 'series') is included in `data` $(N_{series} > 1)$, use `trend_model = ZMVN()` to set up a model where the outcomes for different observational units may be correlated according to: \begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*} The covariance matrix $\Sigma$ will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Note that this `trend_model` does not assume that measurements occur over *time*, as users can specify what variable in the `data` represents the unit of analysis (i.e. outcomes could be counts of different *species* across different *sites* or *regions*, for example; see [`?ZMVN()](https://nicholasjclark.github.io/mvgam/reference/ZMVN.html) for guidelines). ### Independent Random Walks Use `trend_model = 'RW'` or `trend_model = RW()` to set up a model where each series in `data` has independent latent temporal dynamics of the form: \begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*} Process error parameters $\sigma$ are modeled independently for each series. If a moving average process is required, use `trend_model = RW(ma = TRUE)` to set up the following: \begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*} Moving average coefficients $\theta$ are independently estimated for each series and will be forced to be stationary by default $(abs(\theta)<1)$. Only moving averages of order $q=1$ are currently allowed. ### Multivariate Random Walks If more than one series is included in `data` $(N_{series} > 1)$, a multivariate Random Walk can be set up using `trend_model = RW(cor = TRUE)`, resulting in the following: \begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*} Where the latent process estimate $z_t$ now takes the form of a vector. The covariance matrix $\Sigma$ will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients $\theta$ will be parameterised as an $N_{series} * N_{series}$ matrix ### Autoregressive processes Autoregressive models up to $p=3$, in which the autoregressive coefficients are estimated independently for each series, can be used by specifying `trend_model = 'AR1'`, `trend_model = 'AR2'`, `trend_model = 'AR3'`, or `trend_model = AR(p = 1, 2, or 3)`. For example, a univariate AR(1) model takes the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*} All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See `?get_mvgam_priors` for more details. ### Vector Autoregressive processes A Vector Autoregression of order $p=1$ can be specified if $N_{series} > 1$ using `trend_model = 'VAR1'` or `trend_model = VAR()`. A VAR(1) model takes the form: \begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*} Where $A$ is an $N_{series} * N_{series}$ matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time $t$ on its own estimate at time $t+1$), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time $t$ on the process for another series at time $t+1$). By default, the covariance matrix $\Sigma$ will assume no process error covariance by fixing the off-diagonals to $0$. To allow for correlated errors, use `trend_model = 'VAR1cor'` or `trend_model = VAR(cor = TRUE)`. A moving average of order $q=1$ can also be included using `trend_model = VAR(ma = TRUE, cor = TRUE)`. Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in [Heaps 2022](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648) Heaps, Sarah E. "[Enforcing stationarity through the prior in vector autoregressions.](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648)" *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. ### Hierarchical processes Several of the above-mentioned `trend_model` options can be modified to account for grouping structures in `data` by setting up hierarchical latent processes. If an optional grouping variable (`gr`; which must be a `factor` in the supplied `data`) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of `gr` are modelled hierarchically: \begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*} where $\Omega_{global}$ is a *global* correlation matrix, $\Omega_{group, local}$ is a *local deviation* correlation matrix and $\alpha_{cor}$ is a weighting parameter controlling how strongly the local correlation matrix $\Omega_{group}$ (i.e. the derived correlation matrix that will be used for each level of the grouping factor `gr`) is shrunk towards the global correlation matrix $\Omega_{global}$ (larger values of $\alpha_{cor}$ indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. *financial assets* or *species*, for example) are measured in different strata (i.e. *regions*, *countries* or *experimental units*, for example). Currently hierarchical correlations can be included for `AR()`, `VAR()` or `ZMVN()` `trend_model` options. ### Gaussian Processes The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using `trend_model = 'GP'`. The dynamics for each latent process are modelled as: \begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*} The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on $\rho$ (often called the length scale parameter) to control how quickly the correlations between the model's errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter $\alpha$, which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. `mvgam` capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which [considerably speed up computation at little cost to accuracy or prediction performance](https://link.springer.com/article/10.1007/s11222-022-10167-2){target="_blank"}. ### Piecewise logistic and linear trends Modeling growth for many types of time series is often similar to modeling population growth in natural ecosystems, where there series exhibits nonlinear growth that saturates at some particular carrying capacity. The logistic trend model available in {`mvgam`} allows for a time-varying capacity $C(t)$ as well as a non-constant growth rate. Changes in the base growth rate $k$ are incorporated by explicitly defining changepoints throughout the training period where the growth rate is allowed to vary. The changepoint vector $a$ is represented as a vector of `1`s and `0`s, and the rate of growth at time $t$ is represented as $k+a(t)^T\delta$. Potential changepoints are selected uniformly across the training period, and the number of changepoints, as well as the flexibility of the potential rate changes at these changepoints, can be controlled using `trend_model = PW()`. The full piecewise logistic growth model is then: \begin{align*} z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*} For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as: \begin{align*} z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*} In both trend models, $m$ is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends [in this paper by Taylor and Letham](https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1380080){target="_blank"}. Sean J. Taylor and Benjamin Letham. "[Forecasting at scale.](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1380080)" *The American Statistician* 72.1 (2018): 37-45. ### Continuous time AR(1) processes Most trend models in the `mvgam()` function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we'd like to model autoregressive properties of these. The `trend_model = CAR()` can be useful to set up these models, which currently only support autoregressive processes of order `1`. The evolution of the latent dynamic process follows the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*} Where $distance$ is a vector of non-negative measurements of the time differences between successive observations. These models are perhaps more widely known as Ornstein–Uhlenbeck processes. See the **Examples** section in `?CAR` for an illustration of how to set these models up. ## Regression formulae `mvgam` supports an observation model regression formula, built off the `mgcv` package, as well as an optional process model regression formula. The formulae supplied to `mvgam()` are exactly like those supplied to `glm()` except that smooth terms, `s()`, `te()`, `ti()` and `t2()`, time-varying effects using `dynamic()`, monotonically increasing (using `s(x, bs = 'moi')`) or decreasing splines (using `s(x, bs = 'mod')`; see `?smooth.construct.moi.smooth.spec` for details), as well as Gaussian Process functions using `gp()`, can be added to the right hand side (and `.` is not supported in `mvgam` formulae). See `?mvgam_formulae` for more guidance. For setting up State-Space models, the optional process model formula can be used (see [the State-Space model vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) and [the shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) for guidance on using trend formulae). ## Example time series data The 'portal_data' object contains time series of rodent captures from the Portal Project, [a long-term monitoring study based near the town of Portal, Arizona](https://portal.weecology.org/){target="_blank"}. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970's. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc...). You can read about the full sampling protocol [in this preprint by Ernest et al on the Biorxiv](https://www.biorxiv.org/content/10.1101/332783v3.full){target="_blank"}. ```{r Access time series data} data("portal_data") ``` As the data come pre-loaded with the `mvgam` package, you can read a little about it in the help page using `?portal_data`. Before working with data, it is important to inspect how the data are structured, first using `head()`: ```{r Inspect data format and structure} head(portal_data) ``` But the `glimpse()` function in `dplyr` is also useful for understanding how variables are structured ```{r} dplyr::glimpse(portal_data) ``` We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse *Chaetodipus penicillatus*. This species is interesting in that it goes into a kind of "hibernation" during the colder months, leading to very low captures during the winter period ## Manipulating data for modelling Manipulating the data into a 'long' format is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ``` Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data. We also must supply a variable labelled `time` to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the [State-Space vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html). Below are the steps needed to shape our `portal_data` object into the correct form. First, we create a `time` variable, select the column representing counts of our target species (`PP`), and select appropriate variables that we can use as predictors ```{r Wrangle data for modelling} portal_data %>% # Filter the data to only contain captures of the 'PP' dplyr::filter(series == 'PP') %>% droplevels() %>% dplyr::mutate(count = captures) %>% # Add a 'year' variable dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data ``` The data now contain six variables: `series`, a factor indexing which time series each observation belongs to `year`, the year of sampling `time`, the indicator of which time step each observation belongs to `count`, the response variable representing the number of captures of the species `PP` in each sampling observation `mintemp`, the monthly average minimum temperature at each time step `ndvi_ma12`, a 12-month moving average of the monthly Normalized Difference Vegetation Index at each time step Now check the data structure again ```{r} head(model_data) ``` ```{r} dplyr::glimpse(model_data) ``` You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values ```{r Summarise variables} summary(model_data) ``` We have some `NA`s in our response variable `count`. These observations will generally be thrown out by most modelling packages in \R. But as you will see when we work through the tutorials, `mvgam` keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using `plot_mvgam_series()`: ```{r} plot_mvgam_series(data = model_data, series = 1, y = "count") ``` ## GLMs with temporal random effects Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our `count` observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for `year`. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc...). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert `year` to a factor so that we can use a random effect basis in `mvgam`. See `?smooth.terms` and `?smooth.construct.re.smooth.spec` for details about the `re` basis construction that is used by both `mvgam` and `mgcv` ```{r} model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data ``` Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data ```{r} dplyr::glimpse(model_data) levels(model_data$year_fac) ``` We are now ready for our first `mvgam` model. The syntax will be familiar to users who have previously built models with `mgcv`. But for a refresher, see `?formula.gam` and the examples in `?gam`. Random effects can be specified using the `s` wrapper with the `re` basis. Note that we can also suppress the primary intercept using the usual `R` formula syntax `- 1`. `mvgam` has a number of possible observation families that can be used, see `?mvgam_families` for more information. We will use `Stan` as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the `Cmdstan` backend when possible, so it is recommended that users have an up-to-date installation of `Cmdstan` and the associated `cmdstanr` interface on their machines (note that you can set the backend yourself using the `backend` argument: see `?mvgam` for details). Interested users should consult the [`Stan` user's guide](https://mc-stan.org/docs/stan-users-guide/index.html){target="_blank"} for more information about the software and the enormous variety of models that can be tackled with HMC. ```{r model1, include=FALSE, results='hide'} model1 <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data, parallel = FALSE ) ``` ```{r eval=FALSE} model1 <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` The model can be described mathematically for each timepoint $t$ as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*} Where the $\beta_{year}$ effects are drawn from a *population* distribution that is parameterized by a common mean $(\mu_{year})$ and variance $(\sigma_{year})$. Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in `brms`. For example, the default priors on $(\mu_{year})$ and $(\sigma_{year})$ can be viewed using the following code: ```{r} get_mvgam_priors(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` See examples in `?get_mvgam_priors` to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the `summary()` to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters ```{r} summary(model1) ``` The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class `brmsfit` can (see `?mvgam::mvgam_draws` for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the $\beta$'s) into a `data.frame` using: ```{r Extract coefficient posteriors} beta_post <- as.data.frame(model1, variable = "betas") dplyr::glimpse(beta_post) ``` With any model fitted in `mvgam`, the underlying `Stan` code can be viewed using the `stancode()` function: ```{r} stancode(model1) ``` ### Plotting effects and residuals Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using `plot.mvgam()` with `type = 're'`. See `?plot.mvgam` for more details about the types of plots that can be produced from fitted `mvgam` objects ```{r Plot random effect estimates} plot(model1, type = "re") ``` ### `bayesplot` support We can also capitalize on most of the useful MCMC plotting functions from the `bayesplot` package to visualize posterior distributions and diagnostics (see `?mvgam::mcmc_plot.mvgam` for details): ```{r} mcmc_plot( object = model1, variable = "betas", type = "areas" ) ``` We can also use the wide range of posterior checking functions available in `bayesplot` (see `?mvgam::ppc_check.mvgam` for details): ```{r} pp_check(object = model1) ``` There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using `plot.mvgam()` with `type = 'forecast'` ```{r Plot posterior hindcasts} plot(model1, type = "forecast") ``` If you wish to extract these hindcasts for other downstream analyses, the `hindcast()` function can be used. This will return a list object of class `mvgam_forecast`. In the `hindcasts` slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example): ```{r Extract posterior hindcast} hc <- hindcast(model1) str(hc) ``` You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation: ```{r Extract hindcasts on the linear predictor scale} hc <- hindcast(model1, type = "link") range(hc$hindcasts$PP) ``` In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the [Dunn-Smyth, or randomized quantile, residual](https://www.jstor.org/stable/1390802){target="_blank"}. Inspect Dunn-Smyth residuals from the model using `plot.mvgam()` with `type = 'residuals'` ```{r Plot posterior residuals} plot(model1, type = "residuals") ``` ## Automatic forecasting for new data These temporal random effects do not have a sense of "time". Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as `newdata`. For splitting, we will make use of the `filter()` function from `dplyr` ```{r} model_data %>% dplyr::filter(time <= 70) -> data_train model_data %>% dplyr::filter(time > 70) -> data_test ``` ```{r include=FALSE, message=FALSE, warning=FALSE} model1b <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model1b <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test ) ``` We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set ```{r Plotting predictions against test data} plot(model1b, type = "forecast", newdata = data_test) ``` As with the `hindcast()` function, we can use the `forecast()` function to automatically extract the posterior distributions for these predictions. This also returns an object of class `mvgam_forecast`, but now it will contain both the hindcasts and forecasts for each series in the data: ```{r Extract posterior forecasts} fc <- forecast(model1b) str(fc) ``` ## Adding predictors as "fixed" effects Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of `ndvi_ma12` as a linear predictor: ```{r model2, include=FALSE, message=FALSE, warning=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the $\beta_{year}$ effects are the same as before but we now have another predictor $(\beta_{ndvi})$ that applies to the `ndvi_ma12` value at each timepoint $t$. Inspect the summary of this model ```{r, class.output="scroll-300"} summary(model2) ``` Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of `ndvi` (and other linear predictor coefficients) using `coef`: ```{r Posterior quantiles of model coefficients} coef(model2) ``` Look at the estimated effect of `ndvi` using using a histogram. This can be done by first extracting the posterior coefficients: ```{r} beta_post <- as.data.frame(model2, variable = "betas") dplyr::glimpse(beta_post) ``` The posterior distribution for the effect of `ndvi_ma12` is stored in the `ndvi_ma12` column. A quick histogram confirms our inference that `log(counts)` respond positively to increases in `ndvi`: ```{r Histogram of NDVI effects} hist(beta_post$ndvi_ma12, xlim = c( -1 * max(abs(beta_post$ndvi_ma12)), max(abs(beta_post$ndvi)) ), col = "darkred", border = "white", xlab = expression(beta[NDVI]), ylab = "", yaxt = "n", main = "", lwd = 2 ) abline(v = 0, lwd = 2.5) ``` ### `marginaleffects` support Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the `marginaleffects` package makes this relatively straightforward. Objects of class `mvgam` can be used with `marginaleffects` to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like `brms`, `mvgam` has the simple `conditional_effects()` function to make quick and informative plots for main effects, which rely on `marginaleffects` support. This will likely be your go-to function for quickly understanding patterns from fitted `mvgam` models ```{r warning=FALSE} conditional_effects(model2) ``` ## Adding predictors as smooths Smooth functions, using penalized splines, are a major feature of `mvgam`. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package `mgcv` will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We'll create a set of 6 basis functions to represent possible variation in the effect of `time` on our outcome.In addition to constructing the basis functions, `mgcv` also creates a penalty matrix $S$, which contains **known** coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters ($\lambda$) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the $\lambda$'s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of `time`. We will need a reasonably complex function (large `k`) to try and accommodate the temporal variation in our observations. Following some [useful advice by Gavin Simpson](https://fromthebottomoftheheap.net/2020/06/03/extrapolating-with-gams/){target="_blank"}, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no `-1` in the formula now): ```{r model3, include=FALSE, message=FALSE, warning=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the smooth function $f_{time}$ is built by summing across a set of weighted basis functions. The basis functions $(b)$ are constructed using a thin plate regression basis in `mgcv`. The weights $(\beta_{smooth})$ are drawn from a penalized multivariate normal distribution where the precision matrix $(\Omega$) is multiplied by a smoothing penalty $(\lambda)$. If $\lambda$ becomes large, this acts to *squeeze* the covariances among the weights $(\beta_{smooth})$, leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before ```{r} summary(model3) ``` The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of `time`. We can visualize `conditional_effects` as before: ```{r warning=FALSE} conditional_effects(model3, type = "link") ``` Inspect the underlying `Stan` code to gain some idea of how the spline is being penalized: ```{r, class.output="scroll-300"} stancode(model3) ``` The line below `// prior for s(time)...` shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix $S$ is penalized by two different smoothing parameters (the $\lambda$'s) to enforce smoothness and reduce overfitting ## Latent dynamics in `mvgam` Forecasts from the above model are not ideal: ```{r} plot(model3, type = "forecast", newdata = data_test) ``` Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly *forever* beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions: ```{r Plot extrapolated temporal functions using newdata} plot_mvgam_smooth( model3, smooth = "s(time)", # pass newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame( time = 1:max(data_test$time), ndvi_ma12 = 0 ) ) abline(v = max(data_train$time), lty = "dashed", lwd = 2) ``` This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of `time`. Now onto another prominent feature of `mvgam`: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the `trend_model` argument (see `?mvgam_trends` for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of `ndvi_ma12` in this model, rather than the parametric term that was used above, to showcase that `mvgam` can include combinations of smooths and dynamic components: ```{r model4, include=FALSE} model4 <- mvgam(count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR(), parallel = FALSE ) ``` ```{r eval=FALSE} model4 <- mvgam( count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR() ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*} Here the term $z_t$ captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process: ```{r Summarise the mvgam autocorrelated error model, class.output="scroll-300"} summary(model4) ``` View posterior hindcasts / forecasts and compare against the out of sample test data ```{r} plot(model4, type = "forecast", newdata = data_test) ``` The trend is evolving as an AR1 process, which we can also view: ```{r} plot(model4, type = "trend", newdata = data_test) ``` In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the `loo` package (a higher value is preferred for this metric): ```{r} loo_compare(model3, model4) ``` The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data. Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the `forecast` and `score` functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric) ```{r} fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = "drps") score_mod4 <- score(fc_mod4, score = "drps") sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ``` A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3) ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/mvgam_overview.html ================================================ Overview of the mvgam package

Overview of the mvgam package

Nicholas J Clark

2026-01-19

The purpose of this vignette is to give a general overview of the mvgam package and its primary functions.

Dynamic GAMs

mvgam is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal trend evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: Ecological Forecasting with Dynamic Generalized Additive Models. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\) is the conditional expectation of response variable \(\boldsymbol{i}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y_i}\) is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as:

\[for~i~in~1:N_{series}~...\] \[for~t~in~1:N_{timepoints}~...\]

\[g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,\] Here \(\alpha\) are the unknown intercepts, the \(\boldsymbol{s}\)’s are unknown smooth functions of covariates (\(\boldsymbol{x}\)’s), which can potentially vary among the response series, and \(\boldsymbol{z}\) are dynamic latent processes. Each smooth function \(\boldsymbol{s_j}\) is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between \(\boldsymbol{x}_{j}\) and \(g^{-1}(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see this blogpost on how to interpret nonlinear effects from Generalized Additive Models. Latent processes are captured with \(\boldsymbol{Z}\boldsymbol{z}_{i,t}\), where \(\boldsymbol{Z}\) is an \(i~by~k\) matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and \(\boldsymbol{z}_{k,t}\) are a set of \(K\) latent factors that can also include their own GAM linear predictors (see the State-Space models vignette), the N-mixtures vignette and the example in jsdgam to get an idea of how flexible these processes can be.

Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. mvgam supports a number of different observation families, which are summarized below:

Supported observation families

Distribution Function Support Extra parameter(s)
Gaussian (identity link) gaussian() Real values in \((-\infty, \infty)\) \(\sigma\)
Student’s T (identity link) student-t() Heavy-tailed real values in \((-\infty, \infty)\) \(\sigma\), \(\nu\)
LogNormal (identity link) lognormal() Positive real values in \([0, \infty)\) \(\sigma\)
Gamma (log link) Gamma() Positive real values in \([0, \infty)\) \(\alpha\)
Beta (logit link) betar() Real values (proportional) in \([0,1]\) \(\phi\)
Bernoulli (logit link) bernoulli() Binary data in \({0,1}\) -
Poisson (log link) poisson() Non-negative integers in \((0,1,2,...)\) -
Negative Binomial2 (log link) nb() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Binomial (logit link) binomial() Non-negative integers in \((0,1,2,...)\) -
Beta-Binomial (logit link) beta_binomial() Non-negative integers in \((0,1,2,...)\) \(\phi\)
Poisson Binomial N-mixture (log link) nmix() Non-negative integers in \((0,1,2,...)\) -

For all supported observation families, any extra parameters that need to be estimated (i.e. the \(\sigma\) in a Gaussian model or the \(\phi\) in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using share_obs_params = TRUE in mvgam(). Note that default link functions cannot currently be changed.

Supported temporal dynamic processes

As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the mvgam() function, the user chooses between different process models with the trend_model argument. Available process models are described in detail below.

Correlated multivariate processes

If more than one observational unit (usually referred to as ‘series’) is included in data \((N_{series} > 1)\), use trend_model = ZMVN() to set up a model where the outcomes for different observational units may be correlated according to:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*}\]

The covariance matrix \(\Sigma\) will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution. Note that this trend_model does not assume that measurements occur over time, as users can specify what variable in the data represents the unit of analysis (i.e. outcomes could be counts of different species across different sites or regions, for example; see `?ZMVN() for guidelines).

Independent Random Walks

Use trend_model = 'RW' or trend_model = RW() to set up a model where each series in data has independent latent temporal dynamics of the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*}\]

Process error parameters \(\sigma\) are modeled independently for each series. If a moving average process is required, use trend_model = RW(ma = TRUE) to set up the following:

\[\begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*}\]

Moving average coefficients \(\theta\) are independently estimated for each series and will be forced to be stationary by default \((abs(\theta)<1)\). Only moving averages of order \(q=1\) are currently allowed.

Multivariate Random Walks

If more than one series is included in data \((N_{series} > 1)\), a multivariate Random Walk can be set up using trend_model = RW(cor = TRUE), resulting in the following:

\[\begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*}\]

Where the latent process estimate \(z_t\) now takes the form of a vector. The covariance matrix \(\Sigma\) will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances \(\sigma\) and on the strength of correlations using Stan’s lkj_corr_cholesky distribution.

Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients \(\theta\) will be parameterised as an \(N_{series} * N_{series}\) matrix

Autoregressive processes

Autoregressive models up to \(p=3\), in which the autoregressive coefficients are estimated independently for each series, can be used by specifying trend_model = 'AR1', trend_model = 'AR2', trend_model = 'AR3', or trend_model = AR(p = 1, 2, or 3). For example, a univariate AR(1) model takes the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*}\]

All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See ?get_mvgam_priors for more details.

Vector Autoregressive processes

A Vector Autoregression of order \(p=1\) can be specified if \(N_{series} > 1\) using trend_model = 'VAR1' or trend_model = VAR(). A VAR(1) model takes the form:

\[\begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*}\]

Where \(A\) is an \(N_{series} * N_{series}\) matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time \(t\) on its own estimate at time \(t+1\)), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time \(t\) on the process for another series at time \(t+1\)). By default, the covariance matrix \(\Sigma\) will assume no process error covariance by fixing the off-diagonals to \(0\). To allow for correlated errors, use trend_model = 'VAR1cor' or trend_model = VAR(cor = TRUE). A moving average of order \(q=1\) can also be included using trend_model = VAR(ma = TRUE, cor = TRUE).

Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in Heaps 2022

Heaps, Sarah E. “Enforcing stationarity through the prior in vector autoregressions.Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hierarchical processes

Several of the above-mentioned trend_model options can be modified to account for grouping structures in data by setting up hierarchical latent processes. If an optional grouping variable (gr; which must be a factor in the supplied data) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of gr are modelled hierarchically:

\[\begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*}\]

where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) (i.e. the derived correlation matrix that will be used for each level of the grouping factor gr) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. financial assets or species, for example) are measured in different strata (i.e. regions, countries or experimental units, for example). Currently hierarchical correlations can be included for AR(), VAR() or ZMVN() trend_model options.

Gaussian Processes

The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using trend_model = 'GP'. The dynamics for each latent process are modelled as:

\[\begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*}\]

The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on \(\rho\) (often called the length scale parameter) to control how quickly the correlations between the model’s errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter \(\alpha\), which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. mvgam capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which considerably speed up computation at little cost to accuracy or prediction performance.

Continuous time AR(1) processes

Most trend models in the mvgam() function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we’d like to model autoregressive properties of these. The trend_model = CAR() can be useful to set up these models, which currently only support autoregressive processes of order 1. The evolution of the latent dynamic process follows the form:

\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}\]

Where \(distance\) is a vector of non-negative measurements of the time differences between successive observations. These models are perhaps more widely known as Ornstein–Uhlenbeck processes. See the Examples section in ?CAR for an illustration of how to set these models up.

Regression formulae

mvgam supports an observation model regression formula, built off the mgcv package, as well as an optional process model regression formula. The formulae supplied to mvgam() are exactly like those supplied to glm() except that smooth terms, s(), te(), ti() and t2(), time-varying effects using dynamic(), monotonically increasing (using s(x, bs = 'moi')) or decreasing splines (using s(x, bs = 'mod'); see ?smooth.construct.moi.smooth.spec for details), as well as Gaussian Process functions using gp(), can be added to the right hand side (and . is not supported in mvgam formulae). See ?mvgam_formulae for more guidance.

For setting up State-Space models, the optional process model formula can be used (see the State-Space model vignette and the shared latent states vignette for guidance on using trend formulae).

Example time series data

The ‘portal_data’ object contains time series of rodent captures from the Portal Project, a long-term monitoring study based near the town of Portal, Arizona. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970’s. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc…). You can read about the full sampling protocol in this preprint by Ernest et al on the Biorxiv.

data("portal_data")

As the data come pre-loaded with the mvgam package, you can read a little about it in the help page using ?portal_data. Before working with data, it is important to inspect how the data are structured, first using head():

head(portal_data)
#>   time series captures  ndvi_ma12    mintemp
#> 1    1     DM       20 -0.1721441 -0.7963381
#> 2    1     DO        2 -0.1721441 -0.7963381
#> 3    1     PB        0 -0.1721441 -0.7963381
#> 4    1     PP        0 -0.1721441 -0.7963381
#> 5    2     DM       NA -0.2373635 -1.3347160
#> 6    2     DO       NA -0.2373635 -1.3347160

But the glimpse() function in dplyr is also useful for understanding how variables are structured

dplyr::glimpse(portal_data)
#> Rows: 320
#> Columns: 5
#> $ time      <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
#> $ series    <fct> DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, …
#> $ captures  <int> 20, 2, 0, 0, NA, NA, NA, NA, 36, 5, 0, 0, 40, 3, 0, 1, 29, 3…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.172144125, -0.172144125, -0.172144125, -0.2…
#> $ mintemp   <dbl> -0.79633807, -0.79633807, -0.79633807, -0.79633807, -1.33471…

We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse Chaetodipus penicillatus. This species is interesting in that it goes into a kind of “hibernation” during the colder months, leading to very low captures during the winter period

Manipulating data for modelling

Manipulating the data into a ‘long’ format is necessary for modelling in mvgam. By ‘long’ format, we mean that each series x time observation needs to have its own entry in the dataframe or list object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the sim_mvgam() function. See ?sim_mvgam for more details

data <- sim_mvgam(n_series = 4, T = 24)
head(data$data_train, 12)
#>    y season year   series time
#> 1  1      1    1 series_1    1
#> 2  0      1    1 series_2    1
#> 3  1      1    1 series_3    1
#> 4  3      1    1 series_4    1
#> 5  1      2    1 series_1    2
#> 6  1      2    1 series_2    2
#> 7  0      2    1 series_3    2
#> 8  5      2    1 series_4    2
#> 9  0      3    1 series_1    3
#> 10 1      3    1 series_2    3
#> 11 0      3    1 series_3    3
#> 12 3      3    1 series_4    3

Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled y in these simulated data. We also must supply a variable labelled time to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the State-Space vignette. Below are the steps needed to shape our portal_data object into the correct form. First, we create a time variable, select the column representing counts of our target species (PP), and select appropriate variables that we can use as predictors

portal_data %>%
  # Filter the data to only contain captures of the 'PP' 
  dplyr::filter(series == 'PP') %>%
  droplevels() %>%
  dplyr::mutate(count = captures) %>%
  # Add a 'year' variable
  dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>%
  # Select the variables of interest to keep in the model_data
  dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data

The data now contain six variables:
series, a factor indexing which time series each observation belongs to
year, the year of sampling
time, the indicator of which time step each observation belongs to
count, the response variable representing the number of captures of the species PP in each sampling observation
mintemp, the monthly average minimum temperature at each time step
ndvi_ma12, a 12-month moving average of the monthly Normalized Difference Vegetation Index at each time step

Now check the data structure again

head(model_data)
#>   series year time count     mintemp   ndvi_ma12
#> 1     PP    1    1     0 -0.79633807 -0.17214413
#> 2     PP    1    2    NA -1.33471597 -0.23736348
#> 3     PP    1    3     0 -1.24166462 -0.21212064
#> 4     PP    1    4     1 -1.08048145 -0.16043812
#> 5     PP    1    5     7 -0.42447625 -0.08267729
#> 6     PP    1    6     7  0.06532892 -0.03692877
dplyr::glimpse(model_data)
#> Rows: 80
#> Columns: 6
#> $ series    <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, …
#> $ year      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
#> $ time      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ count     <int> 0, NA, 0, 1, 7, 7, 8, 8, 4, NA, 0, 0, 0, 0, 0, 0, NA, 2, 4, …
#> $ mintemp   <dbl> -0.79633807, -1.33471597, -1.24166462, -1.08048145, -0.42447…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.237363477, -0.212120638, -0.160438125, -0.0…

You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values

summary(model_data)
#>  series       year           time           count           mintemp       
#>  PP:80   Min.   :1.00   Min.   : 1.00   Min.   : 0.000   Min.   :-2.0978  
#>          1st Qu.:2.00   1st Qu.:20.75   1st Qu.: 1.000   1st Qu.:-1.0808  
#>          Median :4.00   Median :40.50   Median : 5.000   Median :-0.4091  
#>          Mean   :3.85   Mean   :40.50   Mean   : 5.222   Mean   :-0.2151  
#>          3rd Qu.:5.25   3rd Qu.:60.25   3rd Qu.: 8.000   3rd Qu.: 0.6133  
#>          Max.   :7.00   Max.   :80.00   Max.   :21.000   Max.   : 1.4530  
#>                                         NA's   :17                        
#>    ndvi_ma12       
#>  Min.   :-0.66884  
#>  1st Qu.:-0.20869  
#>  Median :-0.16517  
#>  Mean   :-0.09501  
#>  3rd Qu.:-0.03440  
#>  Max.   : 0.74831  
#> 

We have some NAs in our response variable count. These observations will generally be thrown out by most modelling packages in . But as you will see when we work through the tutorials, mvgam keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using plot_mvgam_series():

plot_mvgam_series(data = model_data, series = 1, y = "count")

GLMs with temporal random effects

Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our count observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for year. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc…). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert year to a factor so that we can use a random effect basis in mvgam. See ?smooth.terms and ?smooth.construct.re.smooth.spec for details about the re basis construction that is used by both mvgam and mgcv

model_data %>%
  # Create a 'year_fac' factor version of 'year'
  dplyr::mutate(year_fac = factor(year)) -> model_data

Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data

dplyr::glimpse(model_data)
#> Rows: 80
#> Columns: 7
#> $ series    <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, …
#> $ year      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
#> $ time      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ count     <int> 0, NA, 0, 1, 7, 7, 8, 8, 4, NA, 0, 0, 0, 0, 0, 0, NA, 2, 4, …
#> $ mintemp   <dbl> -0.79633807, -1.33471597, -1.24166462, -1.08048145, -0.42447…
#> $ ndvi_ma12 <dbl> -0.172144125, -0.237363477, -0.212120638, -0.160438125, -0.0…
#> $ year_fac  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
levels(model_data$year_fac)
#> [1] "1" "2" "3" "4" "5" "6" "7"

We are now ready for our first mvgam model. The syntax will be familiar to users who have previously built models with mgcv. But for a refresher, see ?formula.gam and the examples in ?gam. Random effects can be specified using the s wrapper with the re basis. Note that we can also suppress the primary intercept using the usual R formula syntax - 1. mvgam has a number of possible observation families that can be used, see ?mvgam_families for more information. We will use Stan as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the Cmdstan backend when possible, so it is recommended that users have an up-to-date installation of Cmdstan and the associated cmdstanr interface on their machines (note that you can set the backend yourself using the backend argument: see ?mvgam for details). Interested users should consult the Stan user’s guide for more information about the software and the enormous variety of models that can be tackled with HMC.

model1 <- mvgam(
  count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = model_data
)

The model can be described mathematically for each timepoint \(t\) as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*}\]

Where the \(\beta_{year}\) effects are drawn from a population distribution that is parameterized by a common mean \((\mu_{year})\) and variance \((\sigma_{year})\). Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in brms. For example, the default priors on \((\mu_{year})\) and \((\sigma_{year})\) can be viewed using the following code:

get_mvgam_priors(count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = model_data
)
#>                      param_name param_length           param_info
#> 1             vector[1] mu_raw;            1 s(year_fac) pop mean
#> 2 vector<lower=0>[1] sigma_raw;            1   s(year_fac) pop sd
#>                                  prior                 example_change
#> 1               mu_raw ~ std_normal();  mu_raw ~ normal(-0.88, 0.73);
#> 2 sigma_raw ~ inv_gamma(1.418, 0.452); sigma_raw ~ exponential(0.15);
#>   new_lowerbound new_upperbound
#> 1             NA             NA
#> 2             NA             NA

See examples in ?get_mvgam_priors to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the summary() to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters

summary(model1)
#> GAM formula:
#> count ~ s(year_fac, bs = "re") - 1
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                2.5% 50% 97.5% Rhat n_eff
#> s(year_fac).1 0.930 1.3   1.6    1  2517
#> s(year_fac).2 0.870 1.2   1.5    1  2716
#> s(year_fac).3 0.085 0.6   1.1    1  2154
#> s(year_fac).4 2.000 2.3   2.5    1  2367
#> s(year_fac).5 1.100 1.5   1.8    1  2517
#> s(year_fac).6 1.500 1.8   2.1    1  2511
#> s(year_fac).7 1.800 2.1   2.3    1  2228
#> 
#> GAM group-level estimates:
#>                   2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 0.87 1.5   1.9 1.02   368
#> sd(s(year_fac))   0.35 0.6   1.2 1.01   345
#> 
#> Approximate significance of GAM smooths:
#>               edf Ref.df Chi.sq p-value    
#> s(year_fac) 6.095      7  234.5  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class brmsfit can (see ?mvgam::mvgam_draws for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the \(\beta\)’s) into a data.frame using:

beta_post <- as.data.frame(model1, variable = "betas")
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 7
#> $ `s(year_fac).1` <dbl> 1.42562, 1.13259, 1.60469, 1.05618, 1.30829, 1.36421, …
#> $ `s(year_fac).2` <dbl> 1.360710, 1.224610, 1.352340, 1.080130, 1.495370, 1.24…
#> $ `s(year_fac).3` <dbl> 0.726486, 0.540769, 0.706619, 0.477383, 0.872224, 0.77…
#> $ `s(year_fac).4` <dbl> 2.30283, 2.09318, 2.36101, 2.18330, 2.24543, 2.51212, …
#> $ `s(year_fac).5` <dbl> 1.338800, 0.903048, 1.296670, 1.423650, 1.654660, 1.51…
#> $ `s(year_fac).6` <dbl> 1.90255, 1.88174, 1.72255, 1.94652, 2.00091, 1.78989, …
#> $ `s(year_fac).7` <dbl> 2.26354, 2.15511, 2.05374, 2.10885, 2.23140, 2.23759, …

With any model fitted in mvgam, the underlying Stan code can be viewed using the stancode() function:

stancode(model1)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 7] = mu_raw[1] + b_raw[1 : 7] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ inv_gamma(1.418, 0.452);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior (non-centred) for s(year_fac)...
#>   b_raw[1 : 7] ~ std_normal();
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   array[n, n_series] int ypred;
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Plotting effects and residuals

Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using plot.mvgam() with type = 're'. See ?plot.mvgam for more details about the types of plots that can be produced from fitted mvgam objects

plot(model1, type = "re")

bayesplot support

We can also capitalize on most of the useful MCMC plotting functions from the bayesplot package to visualize posterior distributions and diagnostics (see ?mvgam::mcmc_plot.mvgam for details):

mcmc_plot(
  object = model1,
  variable = "betas",
  type = "areas"
)

We can also use the wide range of posterior checking functions available in bayesplot (see ?mvgam::ppc_check.mvgam for details):

pp_check(object = model1)

There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using plot.mvgam() with type = 'forecast'

plot(model1, type = "forecast")

If you wish to extract these hindcasts for other downstream analyses, the hindcast() function can be used. This will return a list object of class mvgam_forecast. In the hindcasts slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example):

hc <- hindcast(model1)
str(hc)
#> List of 15
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000018e48f5c728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : chr "PP"
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:80] 0 NA 0 1 7 7 8 8 4 NA ...
#>  $ train_times       :List of 1
#>   ..$ PP: int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations : NULL
#>  $ test_times        : NULL
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:80] 7 5 6 4 4 8 0 4 5 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:80] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         : NULL
#>  - attr(*, "class")= chr "mvgam_forecast"

You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation:

hc <- hindcast(model1, type = "link")
range(hc$hindcasts$PP)
#> [1] -0.306975  2.594950

In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the Dunn-Smyth, or randomized quantile, residual. Inspect Dunn-Smyth residuals from the model using plot.mvgam() with type = 'residuals'

plot(model1, type = "residuals")

Automatic forecasting for new data

These temporal random effects do not have a sense of “time”. Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as newdata. For splitting, we will make use of the filter() function from dplyr

model_data %>%
  dplyr::filter(time <= 70) -> data_train
model_data %>%
  dplyr::filter(time > 70) -> data_test
model1b <- mvgam(
  count ~ s(year_fac, bs = "re") - 1,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set

plot(model1b, type = "forecast", newdata = data_test)

As with the hindcast() function, we can use the forecast() function to automatically extract the posterior distributions for these predictions. This also returns an object of class mvgam_forecast, but now it will contain both the hindcasts and forecasts for each series in the data:

fc <- forecast(model1b)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language count ~ s(year_fac, bs = "re") - 1
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000018e48f5c728> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       : chr "None"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 1 level "PP": 1
#>  $ train_observations:List of 1
#>   ..$ PP: int [1:70] 0 NA 0 1 7 7 8 8 4 NA ...
#>  $ train_times       :List of 1
#>   ..$ PP: int [1:70] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 1
#>   ..$ PP: int [1:10] NA 4 11 8 5 2 5 8 14 14
#>  $ test_times        :List of 1
#>   ..$ PP: int [1:10] 71 72 73 74 75 76 77 78 79 80
#>  $ hindcasts         :List of 1
#>   ..$ PP: num [1:2000, 1:70] 3 2 4 2 4 1 1 2 3 4 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:70] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>  $ forecasts         :List of 1
#>   ..$ PP: num [1:2000, 1:10] 6 7 6 1 5 4 2 5 7 6 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:10] "ypred[71,1]" "ypred[72,1]" "ypred[73,1]" "ypred[74,1]" ...
#>  - attr(*, "class")= chr "mvgam_forecast"

Adding predictors as “fixed” effects

Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of ndvi_ma12 as a linear predictor:

model2 <- mvgam(
  count ~ s(year_fac, bs = "re") +
    ndvi_ma12 - 1,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the \(\beta_{year}\) effects are the same as before but we now have another predictor \((\beta_{ndvi})\) that applies to the ndvi_ma12 value at each timepoint \(t\). Inspect the summary of this model

summary(model2)
#> GAM formula:
#> count ~ ndvi_ma12 + s(year_fac, bs = "re") - 1
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> ndvi_ma12     -0.390 0.045   0.5    1  1595
#> s(year_fac).1  0.900 1.300   1.6    1  2451
#> s(year_fac).2  0.870 1.200   1.5    1  2633
#> s(year_fac).3  0.083 0.590   1.0    1  2163
#> s(year_fac).4  2.000 2.300   2.5    1  1831
#> s(year_fac).5  1.100 1.500   1.8    1  2202
#> s(year_fac).6  1.600 1.800   2.1    1  3045
#> s(year_fac).7 -0.310 1.400   2.8    1  1313
#> 
#> GAM group-level estimates:
#>                   2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 0.72 1.3   1.8 1.01   485
#> sd(s(year_fac))   0.33 0.6   1.3 1.00   507
#> 
#> Approximate significance of GAM smooths:
#>               edf Ref.df Chi.sq p-value    
#> s(year_fac) 5.261      7  177.6  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of ndvi (and other linear predictor coefficients) using coef:

coef(model2)
#>                      2.5%       50%     97.5% Rhat n_eff
#> ndvi_ma12     -0.39239295 0.0454631 0.5002432    1  1595
#> s(year_fac).1  0.90208663 1.2598350 1.5620722    1  2451
#> s(year_fac).2  0.87452045 1.2055800 1.5018923    1  2633
#> s(year_fac).3  0.08311501 0.5945600 1.0277267    1  2163
#> s(year_fac).4  2.01673475 2.2694600 2.4863652    1  1831
#> s(year_fac).5  1.06976925 1.4577300 1.7888065    1  2202
#> s(year_fac).6  1.57686075 1.8460300 2.1110565    1  3045
#> s(year_fac).7 -0.30962118 1.3544650 2.7908662    1  1313

Look at the estimated effect of ndvi using using a histogram. This can be done by first extracting the posterior coefficients:

beta_post <- as.data.frame(model2, variable = "betas")
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 8
#> $ ndvi_ma12       <dbl> -0.59960500, 0.45922600, 0.55956400, 0.39627800, 0.178…
#> $ `s(year_fac).1` <dbl> 1.121480, 1.560650, 1.280340, 1.279720, 1.355980, 1.29…
#> $ `s(year_fac).2` <dbl> 1.10624, 1.37323, 1.15708, 1.02976, 1.20075, 1.17569, …
#> $ `s(year_fac).3` <dbl> 0.7412040, 0.8188330, 0.6476260, 0.4650750, 0.8031380,…
#> $ `s(year_fac).4` <dbl> 2.06531, 2.35775, 2.48328, 2.38348, 2.29324, 2.29980, …
#> $ `s(year_fac).5` <dbl> 1.80219, 1.27466, 1.17318, 1.19170, 1.29867, 1.46982, …
#> $ `s(year_fac).6` <dbl> 1.81798, 1.93282, 1.71232, 1.74702, 2.08540, 1.70745, …
#> $ `s(year_fac).7` <dbl> 1.530030, 2.455950, 1.508500, 1.704790, 2.151070, 1.56…

The posterior distribution for the effect of ndvi_ma12 is stored in the ndvi_ma12 column. A quick histogram confirms our inference that log(counts) respond positively to increases in ndvi:

hist(beta_post$ndvi_ma12,
  xlim = c(
    -1 * max(abs(beta_post$ndvi_ma12)),
    max(abs(beta_post$ndvi))
  ),
  col = "darkred",
  border = "white",
  xlab = expression(beta[NDVI]),
  ylab = "",
  yaxt = "n",
  main = "",
  lwd = 2
)
abline(v = 0, lwd = 2.5)

marginaleffects support

Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the marginaleffects package makes this relatively straightforward. Objects of class mvgam can be used with marginaleffects to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like brms, mvgam has the simple conditional_effects() function to make quick and informative plots for main effects, which rely on marginaleffects support. This will likely be your go-to function for quickly understanding patterns from fitted mvgam models

conditional_effects(model2)

Adding predictors as smooths

Smooth functions, using penalized splines, are a major feature of mvgam. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package mgcv will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We’ll create a set of 6 basis functions to represent possible variation in the effect of time on our outcome.In addition to constructing the basis functions, mgcv also creates a penalty matrix \(S\), which contains known coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters (\(\lambda\)) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the \(\lambda\)’s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of time. We will need a reasonably complex function (large k) to try and accommodate the temporal variation in our observations. Following some useful advice by Gavin Simpson, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no -1 in the formula now):

model3 <- mvgam(
  count ~ s(time, bs = "bs", k = 15) +
    ndvi_ma12,
  family = poisson(),
  data = data_train,
  newdata = data_test
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]

Where the smooth function \(f_{time}\) is built by summing across a set of weighted basis functions. The basis functions \((b)\) are constructed using a thin plate regression basis in mgcv. The weights \((\beta_{smooth})\) are drawn from a penalized multivariate normal distribution where the precision matrix \((\Omega\)) is multiplied by a smoothing penalty \((\lambda)\). If \(\lambda\) becomes large, this acts to squeeze the covariances among the weights \((\beta_{smooth})\), leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before

summary(model3)
#> GAM formula:
#> count ~ s(time, bs = "bs", k = 15) + ndvi_ma12
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50% 97.5% Rhat n_eff
#> (Intercept)   0.82  1.10   1.3 1.00   853
#> ndvi_ma12     0.37  1.90   3.5 1.00  1085
#> s(time).1    -9.80 -5.80  -2.6 1.00   348
#> s(time).2     1.20  3.40   6.0 1.00   387
#> s(time).3   -10.00 -6.20  -3.1 1.00   318
#> s(time).4    -1.60  0.78   3.3 1.00   265
#> s(time).5    -3.00 -0.42   2.0 1.00   246
#> s(time).6    -6.40 -3.80  -1.1 1.00   380
#> s(time).7    -1.90  0.56   3.0 1.00   225
#> s(time).8    -2.40 -0.12   2.1 1.01   222
#> s(time).9    -0.55  2.10   4.7 1.00   233
#> s(time).10   -5.70 -3.30  -1.0 1.00   294
#> s(time).11   -2.50  0.57   4.0 1.00   374
#> s(time).12   -6.80 -5.00  -3.2 1.00   491
#> s(time).13    1.90  5.00   8.5 1.00   235
#> s(time).14  -11.00 -3.10   4.0 1.00   241
#> 
#> Approximate significance of GAM smooths:
#>           edf Ref.df Chi.sq  p-value    
#> s(time) 11.76     14  102.7 7.98e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of time. We can visualize conditional_effects as before:

conditional_effects(model3, type = "link")

Inspect the underlying Stan code to gain some idea of how the spline is being penalized:

stancode(model3)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[14, 28] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.4, 2.5);
#>   
#>   // prior for ndvi_ma12...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(time)...
#>   b_raw[3 : 16] ~ multi_normal_prec(zero[3 : 16],
#>                                     S1[1 : 14, 1 : 14] * lambda[1]
#>                                     + S1[1 : 14, 15 : 28] * lambda[2]);
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

The line below // prior for s(time)... shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix \(S\) is penalized by two different smoothing parameters (the \(\lambda\)’s) to enforce smoothness and reduce overfitting

Latent dynamics in mvgam

Forecasts from the above model are not ideal:

plot(model3, type = "forecast", newdata = data_test)

Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions:

plot_mvgam_smooth(
  model3,
  smooth = "s(time)",
  # pass newdata to the plot function to generate
  # predictions of the temporal smooth to the end of the
  # testing period
  newdata = data.frame(
    time = 1:max(data_test$time),
    ndvi_ma12 = 0
  )
)
abline(v = max(data_train$time), lty = "dashed", lwd = 2)

This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of time. Now onto another prominent feature of mvgam: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the trend_model argument (see ?mvgam_trends for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of ndvi_ma12 in this model, rather than the parametric term that was used above, to showcase that mvgam can include combinations of smooths and dynamic components:

model4 <- mvgam(
  count ~ s(ndvi_ma12, k = 6),
  family = poisson(),
  data = data_train,
  newdata = data_test,
  trend_model = AR()
)

The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*}\]

Here the term \(z_t\) captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process:

summary(model4)
#> GAM formula:
#> count ~ s(ndvi_ma12, k = 6)
#> <environment: 0x0000018e48f5c728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 80 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM coefficient (beta) estimates:
#>                  2.5%     50% 97.5% Rhat n_eff
#> (Intercept)    -0.910  0.7700 1.700 1.04   121
#> s(ndvi_ma12).1 -0.140  0.0047 0.190 1.00   830
#> s(ndvi_ma12).2 -0.210 -0.0049 0.200 1.00   685
#> s(ndvi_ma12).3 -0.085 -0.0014 0.069 1.00   810
#> s(ndvi_ma12).4 -0.430  0.0250 0.600 1.00   750
#> s(ndvi_ma12).5 -0.260  0.0640 0.490 1.00   921
#> 
#> Approximate significance of GAM smooths:
#>                edf Ref.df Chi.sq p-value
#> s(ndvi_ma12) 1.285      5  0.357   0.999
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.58 0.81   1.1 1.01   345
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1] 0.78 1.5     3    1   358
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.61 0.82  0.97 1.01   436
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

View posterior hindcasts / forecasts and compare against the out of sample test data

plot(model4, type = "forecast", newdata = data_test)

The trend is evolving as an AR1 process, which we can also view:

plot(model4, type = "trend", newdata = data_test)

In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the loo package (a higher value is preferred for this metric):

loo_compare(model3, model4)
#>        elpd_diff se_diff
#> model3    0.0       0.0 
#> model4 -891.7     153.7

The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data.

Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the forecast and score functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric)

fc_mod3 <- forecast(model3)
fc_mod4 <- forecast(model4)
score_mod3 <- score(fc_mod3, score = "drps")
score_mod4 <- score(fc_mod4, score = "drps")
sum(score_mod4$PP$score, na.rm = TRUE) - 
  sum(score_mod3$PP$score, na.rm = TRUE)
#> [1] -619.4987

A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3)

Further reading

The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice:

Clark, Nicholas J. and Wells, K. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. (2023): 14, 771-784.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

de Sousa, Heitor C., et al. Severe fire regimes decrease resilience of ectothermic populations. Journal of Animal Ecology (2024): 93(11), 1656-1669.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis (2023): 179, 107659.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Zhu, L., et al. Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts. Functional Ecology (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/nmixtures.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) library(dplyr) # A custom ggplot2 theme theme_set( theme_classic(base_size = 12, base_family = "serif") + theme( axis.line.x.bottom = element_line( colour = "black", size = 1 ), axis.line.y.left = element_line( colour = "black", size = 1 ) ) ) options( ggplot2.discrete.colour = c( "#A25050", "#00008b", "darkred", "#010048" ), ggplot2.discrete.fill = c( "#A25050", "#00008b", "darkred", "#010048" ) ) ## -------------------------------------------------------------------------------- set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_1", # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate) ) ## -------------------------------------------------------------------------------- testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) ## -------------------------------------------------------------------------------- dplyr::glimpse(testdat) head(testdat, 12) ## -------------------------------------------------------------------------------- testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ## ----include = FALSE, results='hide'--------------------------------------------- mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ## ----eval = FALSE---------------------------------------------------------------- # mod <- mvgam( # # the observation formula sets up linear predictors for # # detection probability on the logit scale # formula = obs ~ species - 1, # # # the trend_formula sets up the linear predictors for # # the latent abundance processes on the log scale # trend_formula = ~ s(time, by = trend, k = 4) + species, # # # the trend_map takes care of the mapping # trend_map = trend_map, # # # nmix() family and data # family = nmix(), # data = testdat, # # # priors can be set in the usual way # priors = c( # prior(std_normal(), class = b), # prior(normal(1, 1.5), class = Intercept_trend) # ), # samples = 1000 # ) ## -------------------------------------------------------------------------------- code(mod) ## -------------------------------------------------------------------------------- summary(mod) ## -------------------------------------------------------------------------------- loo(mod) ## -------------------------------------------------------------------------------- plot(mod, type = "smooths", trend_effects = TRUE) ## -------------------------------------------------------------------------------- marginaleffects::plot_predictions( mod, condition = "species", type = "detection" ) + ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + theme(legend.position = "none") ## -------------------------------------------------------------------------------- hc <- hindcast(mod, type = "latent_N") # Function to plot latent abundance estimates vs truth plot_latentN <- function(hindcasts, data, species = "sp_1") { all_series <- unique( data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series) ) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call( rbind, lapply(all_series, function(x) { ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] }) ) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) { quantile( x, probs = c( 0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95 ) ) }))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") + geom_line(aes(x = time, y = truth), colour = "black", linewidth = 1) + geom_point( aes(x = time, y = truth), shape = 21, colour = "white", fill = "black", size = 2.5 ) + geom_jitter( data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = "darkred", colour = "white", size = 2.5 ) + labs( y = "Latent abundance (N)", x = "Time", title = species ) } ## -------------------------------------------------------------------------------- plot_latentN(hc, testdat, species = "sp_1") plot_latentN(hc, testdat, species = "sp_2") ## -------------------------------------------------------------------------------- # Date link load(url( "https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda" )) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[,] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) ## -------------------------------------------------------------------------------- mod_data <- do.call( rbind, lapply(1:NROW(data.one.sp$y), function(x) { data.frame( y = data.one.sp$y[x, ], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x, ], det_cov2 = det.cov2[x, ], replicate = 1:NCOL(data.one.sp$y), site = paste0("site", x) ) }) ) %>% dplyr::mutate( species = "sp_1", series = as.factor(paste0(site, "_", species, "_", replicate)) ) %>% dplyr::mutate( site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20 ) ## -------------------------------------------------------------------------------- NROW(mod_data) dplyr::glimpse(mod_data) head(mod_data) ## -------------------------------------------------------------------------------- mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ## ----include = FALSE, results='hide'--------------------------------------------- mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam( # # effects of covariates on detection probability; # # here we use penalized splines for both continuous covariates # formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4), # # # effects of the covariates on latent abundance; # # here we use a penalized spline for the continuous covariate and # # hierarchical intercepts for the factor covariate # trend_formula = ~ s(abund_cov, k = 4) + # s(abund_fac, bs = "re"), # # # link multiple observations to each site # trend_map = trend_map, # # # nmix() family and supplied data # family = nmix(), # data = mod_data, # # # standard normal priors on key regression parameters # priors = c( # prior(std_normal(), class = "b"), # prior(std_normal(), class = "Intercept"), # prior(std_normal(), class = "Intercept_trend"), # prior(std_normal(), class = "sigma_raw_trend") # ), # # # use Stan's variational inference for quicker results # algorithm = "meanfield", # # # no need to compute "series-level" residuals # residuals = FALSE, # samples = 1000 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- marginaleffects::avg_predictions(mod, type = "detection") ## -------------------------------------------------------------------------------- abund_plots <- plot( conditional_effects( mod, type = "link", effects = c( "abund_cov", "abund_fac" ) ), plot = FALSE ) ## -------------------------------------------------------------------------------- abund_plots[[1]] + ylab("Expected latent abundance") ## -------------------------------------------------------------------------------- abund_plots[[2]] + ylab("Expected latent abundance") ## -------------------------------------------------------------------------------- det_plots <- plot( conditional_effects( mod, type = "detection", effects = c( "det_cov", "det_cov2" ) ), plot = FALSE ) ## -------------------------------------------------------------------------------- det_plots[[1]] + ylab("Pr(detection)") det_plots[[2]] + ylab("Pr(detection)") ## -------------------------------------------------------------------------------- fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2) marginaleffects::plot_predictions( mod, newdata = marginaleffects::datagrid( det_cov = unique, det_cov2 = fivenum_round ), by = c("det_cov", "det_cov2"), type = "detection" ) + theme_classic() + ylab("Pr(detection)") ================================================ FILE: inst/doc/nmixtures.Rmd ================================================ --- title: "N-mixtures in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{N-mixtures in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) library(dplyr) # A custom ggplot2 theme theme_set(theme_classic(base_size = 12, base_family = "serif") + theme( axis.line.x.bottom = element_line( colour = "black", size = 1 ), axis.line.y.left = element_line( colour = "black", size = 1 ) )) options( ggplot2.discrete.colour = c( "#A25050", "#00008b", "darkred", "#010048" ), ggplot2.discrete.fill = c( "#A25050", "#00008b", "darkred", "#010048" ) ) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection. ## N-mixture models An N-mixture model is a fairly recent addition to the ecological modeller's toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect ([Royle 2004](https://onlinelibrary.wiley.com/doi/10.1111/j.0006-341X.2004.00142.x){target="_blank"}). Briefly, assume $\boldsymbol{Y_{i,r}}$ is the number of individuals recorded at site $i$ during replicate sampling observation $r$ (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren't perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true "latent" abundance $N$ and a detection probability $p$: \begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*} Using a set of linear predictors, we can estimate effects of covariates $\boldsymbol{X}$ on the expected latent abundance (with a log link for $\lambda$) and, jointly, effects of possibly different covariates (call them $\boldsymbol{Q}$) on detection probability (with a logit link for $p$): \begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*} `mvgam` can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the "true" but unknown population size. This is very convenient because we can incorporate any of the package's diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc...) into the linear predictors. All that is required for this to work is a marginalization trick that allows `Stan`'s sampling algorithms to handle discrete parameters (see more about how this method of "integrating out" discrete parameters works in [this nice blog post by Maxwell Joseph](https://mbjoseph.github.io/posts/2020-04-28-a-step-by-step-guide-to-marginalizing-over-discrete-parameters-for-ecologists-using-stan/){target="_blank"}). The family `nmix()` is used to set up N-mixture models in `mvgam`, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation `series` and `trend_map` arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a `cap` variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done. ## Example 1: a two-species system with nonlinear trends First we will use a simple simulation in which multiple replicate observations are taken at each timepoint for two different species. The simulation produces observations at a single site over six years, with five replicate surveys per year. Each species is simulated to have different nonlinear temporal trends and different detection probabilities. For now, detection probability is fixed (i.e. it does not change over time or in association with any covariates). Notice that we add the `cap` variable, which does not need to be static, to define the maximum possible value that we think the latent abundance could be for each timepoint. This simply needs to be large enough that we get a reasonable idea of which latent N values are most likely, without adding too much computational cost: ```{r} set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_1", # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows(data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate)) ``` This data format isn't too difficult to set up, but it does differ from the traditional multidimensional array setup that is commonly used for fitting N-mixture models in other software packages. Next we ensure that species and series IDs are included as factor variables, in case we'd like to allow certain effects to vary by species ```{r} testdat$species <- factor(testdat$species, levels = unique(testdat$species) ) testdat$series <- factor(testdat$series, levels = unique(testdat$series) ) ``` Preview the dataset to get an idea of how it is structured: ```{r} dplyr::glimpse(testdat) head(testdat, 12) ``` ### Setting up the `trend_map` Finally, we need to set up the `trend_map` object. This is crucial for allowing multiple observations to be linked to the same latent process model (see more information about this argument in the [Shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/shared_states.html){target="_blank"}). In this case, the mapping operates by species and site to state that each set of replicate observations from the same time point should all share the exact same latent abundance model: ```{r} testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ``` Notice how all of the replicates for species 1 in site 1 share the same process (i.e. the same `trend`). This will ensure that all replicates are Binomial draws of the same latent N. ### Modelling with the `nmix()` family Now we are ready to fit a model using `mvgam()`. This model will allow each species to have different detection probabilities and different temporal trends. We will use `Cmdstan` as the backend, which by default will use Hamiltonian Monte Carlo for full Bayesian inference ```{r include = FALSE, results='hide'} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` ```{r eval = FALSE} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` View the automatically-generated `Stan` code to get a sense of how the marginalization over latent N works ```{r} code(mod) ``` The posterior summary of this model shows that it has converged nicely ```{r} summary(mod) ``` `loo()` functionality works just as it does for all `mvgam` models to aid in model comparison / selection (though note that Pareto K values often give warnings for mixture models so these may not be too helpful) ```{r} loo(mod) ``` Plot the estimated smooths of time from each species' latent abundance process (on the log scale) ```{r} plot(mod, type = "smooths", trend_effects = TRUE) ``` `marginaleffects` support allows for more useful prediction-based interrogations on different scales (though note that at the time of writing this Vignette, you must have the development version of `marginaleffects` installed for `nmix()` models to be supported; use `remotes::install_github('vincentarelbundock/marginaleffects')` to install). Objects that use family `nmix()` have a few additional prediction scales that can be used (i.e. `link`, `response`, `detection` or `latent_N`). For example, here are the estimated detection probabilities per species, which show that the model has done a nice job of estimating these parameters: ```{r} marginaleffects::plot_predictions(mod, condition = "species", type = "detection" ) + ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + theme(legend.position = "none") ``` A common goal in N-mixture modelling is to estimate the true latent abundance. The model has automatically generated predictions for the unknown latent abundance that are conditional on the observations. We can extract these and produce decent plots using a small function ```{r} hc <- hindcast(mod, type = "latent_N") # Function to plot latent abundance estimates vs truth plot_latentN <- function(hindcasts, data, species = "sp_1") { all_series <- unique(data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series)) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call(rbind, lapply(all_series, function(x) { ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] })) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) { quantile(x, probs = c( 0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95 )) }))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") + geom_line(aes(x = time, y = truth), colour = "black", linewidth = 1 ) + geom_point(aes(x = time, y = truth), shape = 21, colour = "white", fill = "black", size = 2.5 ) + geom_jitter( data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = "darkred", colour = "white", size = 2.5 ) + labs( y = "Latent abundance (N)", x = "Time", title = species ) } ``` Latent abundance plots vs the simulated truths for each species are shown below. Here, the red points show the imperfect observations, the black line shows the true latent abundance, and the ribbons show credible intervals of our estimates: ```{r} plot_latentN(hc, testdat, species = "sp_1") plot_latentN(hc, testdat, species = "sp_2") ``` We can see that estimates for both species have correctly captured the true temporal variation and magnitudes in abundance ## Example 2: a larger survey with possible nonlinear effects Now for another example with a larger dataset. We will use data from [Jeff Doser's simulation example from the wonderful `spAbundance` package](https://doserlab.com/files/spabundance-web/articles/nmixturemodels){target="_blank"}. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models. Download the data and grab observations / covariate measurements for one species ```{r} # Date link load(url("https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda")) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[, ] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) ``` Next we wrangle into the appropriate 'long' data format, adding indicators of `time` and `series` for working in `mvgam`. We also add the `cap` variable to represent the maximum latent N to marginalize over for each observation ```{r} mod_data <- do.call( rbind, lapply(1:NROW(data.one.sp$y), function(x) { data.frame( y = data.one.sp$y[x, ], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x, ], det_cov2 = det.cov2[x, ], replicate = 1:NCOL(data.one.sp$y), site = paste0("site", x) ) }) ) %>% dplyr::mutate( species = "sp_1", series = as.factor(paste0(site, "_", species, "_", replicate)) ) %>% dplyr::mutate( site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20 ) ``` The data include observations for 225 sites with three replicates per site, though some observations are missing ```{r} NROW(mod_data) dplyr::glimpse(mod_data) head(mod_data) ``` The final step for data preparation is of course the `trend_map`, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above ```{r} mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ``` Now we are ready to fit a model using `mvgam()`. Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how `mvgam` can make use of the different approximation algorithms available in `Stan` by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example) ```{r include = FALSE, results='hide'} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` ```{r eval=FALSE} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 4) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` Inspect the model summary but don't bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model ```{r} summary(mod, include_betas = FALSE) ``` Again we can make use of `marginaleffects` support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability ```{r} marginaleffects::avg_predictions(mod, type = "detection") ``` Next investigate estimated effects of covariates on latent abundance using the `conditional_effects()` function and specifying `type = 'link'`; this will return plots on the expectation scale ```{r} abund_plots <- plot( conditional_effects(mod, type = "link", effects = c( "abund_cov", "abund_fac" ) ), plot = FALSE ) ``` The effect of the continuous covariate on expected latent abundance ```{r} abund_plots[[1]] + ylab("Expected latent abundance") ``` The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect ```{r} abund_plots[[2]] + ylab("Expected latent abundance") ``` Now we can investigate estimated effects of covariates on detection probability using `type = 'detection'` ```{r} det_plots <- plot( conditional_effects(mod, type = "detection", effects = c( "det_cov", "det_cov2" ) ), plot = FALSE ) ``` The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful ```{r} det_plots[[1]] + ylab("Pr(detection)") det_plots[[2]] + ylab("Pr(detection)") ``` More targeted predictions are also easy with `marginaleffects` support. For example, we can ask: How does detection probability change as we change *both* detection covariates? ```{r} fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2) marginaleffects::plot_predictions(mod, newdata = marginaleffects::datagrid( det_cov = unique, det_cov2 = fivenum_round ), by = c("det_cov", "det_cov2"), type = "detection" ) + theme_classic() + ylab("Pr(detection)") ``` The model has found support for some important covariate effects, but of course we'd want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths). ## Further reading The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations: Guélat, Jérôme, and Kéry, Marc. “[Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12983)” *Methods in Ecology and Evolution* 9 (2018): 1614–25. Kéry, Marc, and Royle Andrew J. "[Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models](https://shop.elsevier.com/books/applied-hierarchical-modeling-in-ecology-analysis-of-distribution-abundance-and-species-richness-in-r-and-bugs/kery/978-0-12-809585-0)". London, UK: Academic Press (2020). Royle, Andrew J. "[N‐mixture models for estimating population size from spatially replicated counts.](https://onlinelibrary.wiley.com/doi/full/10.1111/j.0006-341X.2004.00142.x)" *Biometrics* 60.1 (2004): 108-115. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/nmixtures.html ================================================ N-mixtures in mvgam

N-mixtures in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection.

N-mixture models

An N-mixture model is a fairly recent addition to the ecological modeller’s toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect (Royle 2004). Briefly, assume \(\boldsymbol{Y_{i,r}}\) is the number of individuals recorded at site \(i\) during replicate sampling observation \(r\) (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren’t perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true “latent” abundance \(N\) and a detection probability \(p\):

\[\begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*}\]

Using a set of linear predictors, we can estimate effects of covariates \(\boldsymbol{X}\) on the expected latent abundance (with a log link for \(\lambda\)) and, jointly, effects of possibly different covariates (call them \(\boldsymbol{Q}\)) on detection probability (with a logit link for \(p\)):

\[\begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*}\]

mvgam can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the “true” but unknown population size. This is very convenient because we can incorporate any of the package’s diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc…) into the linear predictors. All that is required for this to work is a marginalization trick that allows Stan’s sampling algorithms to handle discrete parameters (see more about how this method of “integrating out” discrete parameters works in this nice blog post by Maxwell Joseph).

The family nmix() is used to set up N-mixture models in mvgam, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation series and trend_map arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a cap variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done.

Example 2: a larger survey with possible nonlinear effects

Now for another example with a larger dataset. We will use data from Jeff Doser’s simulation example from the wonderful spAbundance package. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models.

Download the data and grab observations / covariate measurements for one species

# Date link
load(url("https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda"))
data.one.sp <- dataNMixSim

# Pull out observations for one species
data.one.sp$y <- data.one.sp$y[1, , ]

# Abundance covariates that don't change across repeat sampling observations
abund.cov <- dataNMixSim$abund.covs[, 1]
abund.factor <- as.factor(dataNMixSim$abund.covs[, 2])

# Detection covariates that can change across repeat sampling observations
# Note that `NA`s are not allowed for covariates in mvgam, so we randomly
# impute them here
det.cov <- dataNMixSim$det.covs$det.cov.1[, ]
det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov))))
det.cov2 <- dataNMixSim$det.covs$det.cov.2
det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2))))

Next we wrangle into the appropriate ‘long’ data format, adding indicators of time and series for working in mvgam. We also add the cap variable to represent the maximum latent N to marginalize over for each observation

mod_data <- do.call(
  rbind,
  lapply(1:NROW(data.one.sp$y), function(x) {
    data.frame(
      y = data.one.sp$y[x, ],
      abund_cov = abund.cov[x],
      abund_fac = abund.factor[x],
      det_cov = det.cov[x, ],
      det_cov2 = det.cov2[x, ],
      replicate = 1:NCOL(data.one.sp$y),
      site = paste0("site", x)
    )
  })
) %>%
  dplyr::mutate(
    species = "sp_1",
    series = as.factor(paste0(site, "_", species, "_", replicate))
  ) %>%
  dplyr::mutate(
    site = factor(site, levels = unique(site)),
    species = factor(species, levels = unique(species)),
    time = 1,
    cap = max(data.one.sp$y, na.rm = TRUE) + 20
  )

The data include observations for 225 sites with three replicates per site, though some observations are missing

NROW(mod_data)
#> [1] 675
dplyr::glimpse(mod_data)
#> Rows: 675
#> Columns: 11
#> $ y         <int> 1, NA, NA, NA, 2, 2, NA, 1, NA, NA, 0, 1, 0, 0, 0, 0, NA, NA…
#> $ abund_cov <dbl> -0.3734384, -0.3734384, -0.3734384, 0.7064305, 0.7064305, 0.…
#> $ abund_fac <fct> 3, 3, 3, 4, 4, 4, 9, 9, 9, 2, 2, 2, 3, 3, 3, 2, 2, 2, 1, 1, …
#> $ det_cov   <dbl> -1.28279990, 1.11996398, -1.26741746, -1.29426683, 0.1954808…
#> $ det_cov2  <dbl> 2.03047314, 1.61128100, 0.06661865, -0.94290689, 1.04555361,…
#> $ replicate <int> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …
#> $ site      <fct> site1, site1, site1, site2, site2, site2, site3, site3, site…
#> $ species   <fct> sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, …
#> $ series    <fct> site1_sp_1_1, site1_sp_1_2, site1_sp_1_3, site2_sp_1_1, site…
#> $ time      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cap       <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, …
head(mod_data)
#>    y  abund_cov abund_fac    det_cov    det_cov2 replicate  site species
#> 1  1 -0.3734384         3 -1.2827999  2.03047314         1 site1    sp_1
#> 2 NA -0.3734384         3  1.1199640  1.61128100         2 site1    sp_1
#> 3 NA -0.3734384         3 -1.2674175  0.06661865         3 site1    sp_1
#> 4 NA  0.7064305         4 -1.2942668 -0.94290689         1 site2    sp_1
#> 5  2  0.7064305         4  0.1954809  1.04555361         2 site2    sp_1
#> 6  2  0.7064305         4  0.9673034  1.91971178         3 site2    sp_1
#>         series time cap
#> 1 site1_sp_1_1    1  33
#> 2 site1_sp_1_2    1  33
#> 3 site1_sp_1_3    1  33
#> 4 site2_sp_1_1    1  33
#> 5 site2_sp_1_2    1  33
#> 6 site2_sp_1_3    1  33

The final step for data preparation is of course the trend_map, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above

mod_data %>%
  # each unique combination of site*species is a separate process
  dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>%
  dplyr::select(trend, series) %>%
  dplyr::distinct() -> trend_map

trend_map %>%
  dplyr::arrange(trend) %>%
  head(12)
#>    trend         series
#> 1      1 site100_sp_1_1
#> 2      1 site100_sp_1_2
#> 3      1 site100_sp_1_3
#> 4      2 site101_sp_1_1
#> 5      2 site101_sp_1_2
#> 6      2 site101_sp_1_3
#> 7      3 site102_sp_1_1
#> 8      3 site102_sp_1_2
#> 9      3 site102_sp_1_3
#> 10     4 site103_sp_1_1
#> 11     4 site103_sp_1_2
#> 12     4 site103_sp_1_3

Now we are ready to fit a model using mvgam(). Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how mvgam can make use of the different approximation algorithms available in Stan by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example)

mod <- mvgam(
  # effects of covariates on detection probability;
  # here we use penalized splines for both continuous covariates
  formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4),

  # effects of the covariates on latent abundance;
  # here we use a penalized spline for the continuous covariate and
  # hierarchical intercepts for the factor covariate
  trend_formula = ~ s(abund_cov, k = 4) +
    s(abund_fac, bs = "re"),

  # link multiple observations to each site
  trend_map = trend_map,

  # nmix() family and supplied data
  family = nmix(),
  data = mod_data,

  # standard normal priors on key regression parameters
  priors = c(
    prior(std_normal(), class = "b"),
    prior(std_normal(), class = "Intercept"),
    prior(std_normal(), class = "Intercept_trend"),
    prior(std_normal(), class = "sigma_raw_trend")
  ),

  # use Stan's variational inference for quicker results
  algorithm = "meanfield",

  # no need to compute "series-level" residuals
  residuals = FALSE,
  samples = 1000
)

Inspect the model summary but don’t bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ s(det_cov, k = 3) + s(det_cov2, k = 3)
#> <environment: 0x000001a57291b728>
#> 
#> GAM process formula:
#> ~s(abund_cov, k = 3) + s(abund_fac, bs = "re")
#> <environment: 0x000001a57291b728>
#> 
#> Family:
#> nmix
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> None
#> 
#> N process models:
#> 225 
#> 
#> N series:
#> 675 
#> 
#> N timepoints:
#> 1 
#> 
#> Status:
#> Fitted using Stan 
#> 1 chains, each with iter = 1000; warmup = ; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> GAM observation model coefficient (beta) estimates:
#>              2.5%  50% 97.5% Rhat n.eff
#> (Intercept) 0.076 0.37  0.67  NaN   NaN
#> 
#> Approximate significance of GAM observation smooths:
#>               edf Ref.df Chi.sq  p-value    
#> s(det_cov)  1.041      2  176.9 0.000177 ***
#> s(det_cov2) 1.011      2  548.1  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> GAM process model coefficient (beta) estimates:
#>                      2.5%  50% 97.5% Rhat n.eff
#> (Intercept)_trend -0.0083 0.14  0.29  NaN   NaN
#> 
#> GAM process model group-level estimates:
#>                           2.5%   50%  97.5% Rhat n.eff
#> mean(s(abund_fac))_trend -0.44 -0.26 -0.076  NaN   NaN
#> sd(s(abund_fac))_trend    0.29  0.42  0.620  NaN   NaN
#> 
#> Approximate significance of GAM process smooths:
#>               edf Ref.df Chi.sq p-value  
#> s(abund_cov) 1.42      2  1.622  0.2656  
#> s(abund_fac) 8.85     10 14.771  0.0918 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Posterior approximation used: no diagnostics to compute
#> 
#> Use how_to_cite() to get started describing this model

Again we can make use of marginaleffects support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability

marginaleffects::avg_predictions(mod, type = "detection")
#> 
#>  Estimate 2.5 % 97.5 %
#>     0.575 0.515  0.636
#> 
#> Type: detection

Next investigate estimated effects of covariates on latent abundance using the conditional_effects() function and specifying type = 'link'; this will return plots on the expectation scale

abund_plots <- plot(
  conditional_effects(mod,
    type = "link",
    effects = c(
      "abund_cov",
      "abund_fac"
    )
  ),
  plot = FALSE
)

The effect of the continuous covariate on expected latent abundance

abund_plots[[1]] +
  ylab("Expected latent abundance")

The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect

abund_plots[[2]] +
  ylab("Expected latent abundance")

Now we can investigate estimated effects of covariates on detection probability using type = 'detection'

det_plots <- plot(
  conditional_effects(mod,
    type = "detection",
    effects = c(
      "det_cov",
      "det_cov2"
    )
  ),
  plot = FALSE
)

The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful

det_plots[[1]] +
  ylab("Pr(detection)")

det_plots[[2]] +
  ylab("Pr(detection)")

More targeted predictions are also easy with marginaleffects support. For example, we can ask: How does detection probability change as we change both detection covariates?

fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2)

marginaleffects::plot_predictions(mod,
  newdata = marginaleffects::datagrid(
    det_cov = unique,
    det_cov2 = fivenum_round
  ),
  by = c("det_cov", "det_cov2"),
  type = "detection"
) +
  theme_classic() +
  ylab("Pr(detection)")

The model has found support for some important covariate effects, but of course we’d want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths).

Further reading

The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations:

Guélat, Jérôme, and Kéry, Marc. “Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.Methods in Ecology and Evolution 9 (2018): 1614–25.

Kéry, Marc, and Royle Andrew J. “Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models”. London, UK: Academic Press (2020).

Royle, Andrew J. “N‐mixture models for estimating population size from spatially replicated counts.Biometrics 60.1 (2004): 108-115.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/shared_states.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(122) simdat <- sim_mvgam( trend_model = AR(), prop_trend = 0.6, mu = c(0, 1, 2), family = poisson() ) trend_map <- data.frame( series = unique(simdat$data_train$series), trend = c(1, 1, 2) ) trend_map ## -------------------------------------------------------------------------------- all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ## -------------------------------------------------------------------------------- fake_mod <- mvgam( y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = "cc", k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics; processes are estimated using the noncentred # parameterisation for improved efficiency trend_model = AR(), noncentred = TRUE, # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE ) ## -------------------------------------------------------------------------------- stancode(fake_mod) ## -------------------------------------------------------------------------------- fake_mod$model_data$Z ## ----full_mod, include = FALSE, results='hide'----------------------------------- full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # full_mod <- mvgam( # y ~ series - 1, # trend_formula = ~ s(season, bs = "cc", k = 6), # trend_model = AR(), # noncentred = TRUE, # trend_map = trend_map, # family = poisson(), # data = simdat$data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(full_mod) ## -------------------------------------------------------------------------------- plot(full_mod, type = "trend", series = 1) plot(full_mod, type = "trend", series = 2) plot(full_mod, type = "trend", series = 3) ## -------------------------------------------------------------------------------- set.seed(123) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1) # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector( scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1)) ) ## -------------------------------------------------------------------------------- plot( true_signal, type = "l", bty = "l", lwd = 2, ylab = "True signal", xlab = "Time" ) ## -------------------------------------------------------------------------------- # Function to simulate a monotonic response to a covariate sim_monotonic <- function(x, a = 2.2, b = 2) { out <- exp(a * x) / (6 + exp(b * x)) * -1 return(2.5 * as.vector(scale(out))) } # Simulated temperature covariate temperature <- runif(100, -2, 2) # Simulate the three series sim_series <- function(n_series = 3, true_signal) { temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05) alphas <- rnorm(n_series, sd = 2) do.call( rbind, lapply(seq_len(n_series), function(series) { data.frame( observed = rnorm( length(true_signal), mean = alphas[series] + sim_monotonic(temperature, runif(1, 2.2, 3), runif(1, 2.2, 3)) + true_signal, sd = runif(1, 1, 2) ), series = paste0("sensor_", series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal ) }) ) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ## -------------------------------------------------------------------------------- plot_mvgam_series( data = model_dat, y = "observed", series = "all" ) ## -------------------------------------------------------------------------------- plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_1"), pch = 16, bty = "l", ylab = "Sensor 1", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_2"), pch = 16, bty = "l", ylab = "Sensor 2", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_3"), pch = 16, bty = "l", ylab = "Sensor 3", xlab = "Temperature" ) ## ----sensor_mod, include = FALSE, results='hide'--------------------------------- mod <- mvgam( # formula for observations, allowing for different # intercepts and smooth effects of temperature formula = observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), # formula for the latent signal, which can depend # nonlinearly on productivity trend_formula = ~ s(productivity, k = 8) - 1, # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation trend_model = AR(), noncentred = TRUE, # trend_map forces all sensors to track the same # latent signal trend_map = data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), burnin = 600, control = list(adapt_delta = 0.95), data = model_dat, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam( # formula = # # formula for observations, allowing for different # # intercepts and hierarchical smooth effects of temperature # observed ~ series + # s(temperature, k = 10) + # s(series, temperature, bs = "sz", k = 8), # trend_formula = # # formula for the latent signal, which can depend # # nonlinearly on productivity # ~ s(productivity, k = 8) - 1, # trend_model = # # in addition to productivity effects, the signal is # # assumed to exhibit temporal autocorrelation # AR(), # noncentred = TRUE, # trend_map = # # trend_map forces all sensors to track the same # # latent signal # data.frame( # series = unique(model_dat$series), # trend = c(1, 1, 1) # ), # # # informative priors on process error # # and observation error will help with convergence # priors = c( # prior(normal(2, 0.5), class = sigma), # prior(normal(1, 0.5), class = sigma_obs) # ), # # # Gaussian observations # family = gaussian(), # data = model_dat, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- conditional_effects(mod, type = "link") ## -------------------------------------------------------------------------------- plot_predictions( mod, condition = c("temperature", "series", "series"), points = 0.5 ) + theme(legend.position = "none") ## -------------------------------------------------------------------------------- plot(mod, type = "trend") + ggplot2::geom_point( data = data.frame(time = 1:100, y = true_signal), mapping = ggplot2::aes(x = time, y = y) ) ================================================ FILE: inst/doc/shared_states.Rmd ================================================ --- title: "Shared latent states in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Shared latent states in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how `mvgam` can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html). ## The `trend_map` argument The `trend_map` argument in the `mvgam()` function is an optional `data.frame` that can be used to specify which series should depend on which latent process models (called "trends" in `mvgam`). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting `use_lv = TRUE` and using the supplied `trend_map` to set up the shared trends. Users familiar with the `MARSS` family of packages will recognize this as a way of specifying the $Z$ matrix. This `data.frame` needs to have column names `series` and `trend`, with integer values in the `trend` column to state which trend each series should depend on. The `series` column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the `series` variable in `data`). For example, if we were to simulate a collection of three integer-valued time series (using `sim_mvgam`), the following `trend_map` would force the first two series to share the same latent trend process: ```{r} set.seed(122) simdat <- sim_mvgam( trend_model = AR(), prop_trend = 0.6, mu = c(0, 1, 2), family = poisson() ) trend_map <- data.frame( series = unique(simdat$data_train$series), trend = c(1, 1, 2) ) trend_map ``` We can see that the factor levels in `trend_map` match those in the data: ```{r} all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ``` ### Checking `trend_map` with `run_model = FALSE` Supplying this `trend_map` to the `mvgam` function for a simple model, but setting `run_model = FALSE`, allows us to inspect the constructed `Stan` code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the `mvgam` framework: ```{r} fake_mod <- mvgam( y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = "cc", k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics; processes are estimated using the noncentred # parameterisation for improved efficiency trend_model = AR(), noncentred = TRUE, # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE ) ``` Inspecting the `Stan` code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied `trend_map`: ```{r} stancode(fake_mod) ``` Notice the line that states "lv_coefs = Z;". This uses the supplied $Z$ matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you'd use if you were to create a similar model in the `MARSS` package: ```{r} fake_mod$model_data$Z ``` ### Fitting and inspecting the model Though this model doesn't perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like: ```{r full_mod, include = FALSE, results='hide'} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` ```{r eval=FALSE} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well ```{r} summary(full_mod) ``` Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different: ```{r} plot(full_mod, type = "trend", series = 1) plot(full_mod, type = "trend", series = 2) plot(full_mod, type = "trend", series = 3) ``` However, forecasts for series' 1 and 2 will differ because they have different intercepts in the observation model ## Example: signal detection Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called `productivity`, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation: ```{r} set.seed(123) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1) # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector(scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1))) ``` Plot the signal to inspect it's evolution over time ```{r} plot( true_signal, type = "l", bty = "l", lwd = 2, ylab = "True signal", xlab = "Time" ) ``` Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called `temperature` in this example. Again this makes use of `gamSim` ```{r} # Function to simulate a monotonic response to a covariate sim_monotonic <- function(x, a = 2.2, b = 2) { out <- exp(a * x) / (6 + exp(b * x)) * -1 return(2.5 * as.vector(scale(out))) } # Simulated temperature covariate temperature <- runif(100, -2, 2) # Simulate the three series sim_series <- function(n_series = 3, true_signal) { temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05) alphas <- rnorm(n_series, sd = 2) do.call(rbind, lapply(seq_len(n_series), function(series) { data.frame( observed = rnorm(length(true_signal), mean = alphas[series] + sim_monotonic(temperature, runif(1, 2.2, 3), runif(1, 2.2, 3)) + true_signal, sd = runif(1, 1, 2) ), series = paste0("sensor_", series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal ) })) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ``` Plot the sensor observations ```{r} plot_mvgam_series( data = model_dat, y = "observed", series = "all" ) ``` And now plot the observed relationships between the three sensors and the `temperature` covariate ```{r} plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_1"), pch = 16, bty = "l", ylab = "Sensor 1", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_2"), pch = 16, bty = "l", ylab = "Sensor 2", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_3"), pch = 16, bty = "l", ylab = "Sensor 3", xlab = "Temperature" ) ``` ### The shared signal model Now we can formulate and fit a model that allows each sensor's observation error to depend nonlinearly on `temperature` while allowing the true signal to depend nonlinearly on `productivity`. By fixing all of the values in the `trend` column to `1` in the `trend_map`, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error ```{r sensor_mod, include = FALSE, results='hide'} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), burnin = 600, control = list(adapt_delta = 0.95), data = model_dat, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and hierarchical smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), data = model_dat, silent = 2 ) ``` View a reduced version of the model summary because there will be many spline coefficients in this model ```{r} summary(mod, include_betas = FALSE) ``` ### Inspecting effects on both process and observation models Don't pay much attention to the approximate *p*-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don't tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with `conditional_effects`: ```{r} conditional_effects(mod, type = "link") ``` `conditional_effects` is simply a wrapper to the more flexible `plot_predictions` function from the `marginaleffects` package. We can get more useful plots of these effects using this function for further customisation: ```{r} plot_predictions( mod, condition = c("temperature", "series", "series"), points = 0.5 ) + theme(legend.position = "none") ``` We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time. ### Recovering the hidden signal A final but very key question is whether we can successfully recover the true hidden signal. The `trend` slot in the returned model parameters has the estimates for this signal, which we can easily plot using the `mvgam` S3 method for `plot`. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it: ```{r} plot(mod, type = "trend") + ggplot2::geom_point(data = data.frame(time = 1:100, y = true_signal), mapping = ggplot2::aes(x = time, y = y)) ``` ## Further reading The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. ["A guide to state–space modeling of ecological time series.](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470)" *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Ward, Eric J., et al. "[Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x)" *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/shared_states.html ================================================ Shared latent states in mvgam

Shared latent states in mvgam

Nicholas J Clark

2026-01-19

This vignette gives an example of how mvgam can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic mvgam functionality, please see the introductory vignette.

The trend_map argument

The trend_map argument in the mvgam() function is an optional data.frame that can be used to specify which series should depend on which latent process models (called “trends” in mvgam). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting use_lv = TRUE and using the supplied trend_map to set up the shared trends. Users familiar with the MARSS family of packages will recognize this as a way of specifying the \(Z\) matrix. This data.frame needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the series variable in data). For example, if we were to simulate a collection of three integer-valued time series (using sim_mvgam), the following trend_map would force the first two series to share the same latent trend process:

set.seed(122)
simdat <- sim_mvgam(
  trend_model = AR(),
  prop_trend = 0.6,
  mu = c(0, 1, 2),
  family = poisson()
)
trend_map <- data.frame(
  series = unique(simdat$data_train$series),
  trend = c(1, 1, 2)
)
trend_map
#>     series trend
#> 1 series_1     1
#> 2 series_2     1
#> 3 series_3     2

We can see that the factor levels in trend_map match those in the data:

all.equal(levels(trend_map$series), 
          levels(simdat$data_train$series))
#> [1] TRUE

Checking trend_map with run_model = FALSE

Supplying this trend_map to the mvgam function for a simple model, but setting run_model = FALSE, allows us to inspect the constructed Stan code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the mvgam framework:

fake_mod <- mvgam(
  y ~
    # observation model formula, which has a
    # different intercept per series
    series - 1,

  # process model formula, which has a shared seasonal smooth
  # (each latent process model shares the SAME smooth)
  trend_formula = ~ s(season, bs = "cc", k = 6),

  # AR1 dynamics (each latent process model has DIFFERENT)
  # dynamics; processes are estimated using the noncentred
  # parameterisation for improved efficiency
  trend_model = AR(),
  noncentred = TRUE,

  # supplied trend_map
  trend_map = trend_map,

  # data and observation family
  family = poisson(),
  data = simdat$data_train,
  run_model = FALSE
)

Inspecting the Stan code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied trend_map:

stancode(fake_mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp_trend; // number of trend smoothing parameters
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent states
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   int<lower=0> num_basis_trend; // number of trend basis coefficients
#>   vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   array[n, n_lv] int ytimes_trend;
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   matrix[4, 4] S_trend1; // mgcv smooth penalty matrix S_trend1
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   vector[num_basis_trend] b_raw_trend;
#>   
#>   // latent state SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // latent state AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // raw latent states
#>   matrix[n, n_lv] LV_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp_trend] lambda_trend;
#> }
#> transformed parameters {
#>   // raw latent states
#>   vector[n * n_lv] trend_mus;
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent states
#>   matrix[n, n_lv] LV;
#>   vector[num_basis_trend] b_trend;
#>   
#>   // observation model basis coefficients
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // process model basis coefficients
#>   b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend];
#>   
#>   // latent process linear predictors
#>   trend_mus = X_trend * b_trend;
#>   LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));
#>   for (j in 1 : n_lv) {
#>     LV[1, j] += trend_mus[ytimes_trend[1, j]];
#>     for (i in 2 : n) {
#>       LV[i, j] += trend_mus[ytimes_trend[i, j]]
#>                   + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);
#>     }
#>   }
#>   
#>   // derived latent states
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for seriesseries_1...
#>   b_raw[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_2...
#>   b_raw[2] ~ student_t(3, 0, 2);
#>   
#>   // prior for seriesseries_3...
#>   b_raw[3] ~ student_t(3, 0, 2);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for latent state SD parameters
#>   sigma ~ inv_gamma(1.418, 0.452);
#>   to_vector(LV_raw) ~ std_normal();
#>   
#>   // dynamic process models
#>   
#>   // prior for (Intercept)_trend...
#>   b_raw_trend[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for s(season)_trend...
#>   b_raw_trend[2 : 5] ~ multi_normal_prec(zero_trend[2 : 5],
#>                                          S_trend1[1 : 4, 1 : 4]
#>                                          * lambda_trend[1]);
#>   lambda_trend ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp_trend] rho_trend;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   penalty = 1.0 / (sigma .* sigma);
#>   rho_trend = log(lambda_trend);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }

Notice the line that states “lv_coefs = Z;”. This uses the supplied \(Z\) matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you’d use if you were to create a similar model in the MARSS package:

fake_mod$model_data$Z
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1

Fitting and inspecting the model

Though this model doesn’t perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like:

full_mod <- mvgam(
  y ~ series - 1,
  trend_formula = ~ s(season, bs = "cc", k = 6),
  trend_model = AR(),
  noncentred = TRUE,
  trend_map = trend_map,
  family = poisson(),
  data = simdat$data_train,
  silent = 2
)

The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well

summary(full_mod)
#> GAM observation formula:
#> y ~ series - 1
#> <environment: 0x000002063451d728>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 6)
#> <environment: 0x000002063451d728>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> seriesseries_1 -2.70 -0.63   1.4 1.01   700
#> seriesseries_2 -1.70  0.31   2.3 1.01   702
#> seriesseries_3 -0.76  1.30   3.3 1.01   703
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.37 0.53  0.71    1   701
#> sigma[2] 0.49 0.61  0.78    1   693
#> 
#> autoregressive coef 1:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar1[1] -0.59 -0.240  0.17 1.00   462
#> ar1[2] -0.25  0.048  0.35 1.01   349
#> 
#> GAM process model coefficient (beta) estimates:
#>                     2.5%    50% 97.5% Rhat n_eff
#> (Intercept)_trend -1.300  0.730 2.800 1.01   694
#> s(season).1_trend -0.310 -0.061 0.190 1.00   947
#> s(season).2_trend -0.062  0.230 0.510 1.00   753
#> s(season).3_trend -0.480 -0.200 0.097 1.00   620
#> s(season).4_trend  0.350  0.690 0.960 1.00   584
#> 
#> Approximate significance of GAM process smooths:
#>             edf Ref.df Chi.sq  p-value    
#> s(season) 2.324      4   24.2 8.27e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different:

plot(full_mod, type = "trend", series = 1)

plot(full_mod, type = "trend", series = 2)

plot(full_mod, type = "trend", series = 3)

However, forecasts for series’ 1 and 2 will differ because they have different intercepts in the observation model

Example: signal detection

Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called productivity, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation:

set.seed(123)
# simulate a nonlinear relationship using the mgcv function gamSim
signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1)
#> Gu & Wahba 4 term additive model

# productivity is one of the variables in the simulated data
productivity <- signal_dat$x2

# simulate the true signal, which already has a nonlinear relationship
# with productivity; we will add in a fairly strong AR1 process to
# contribute to the signal
true_signal <- as.vector(scale(signal_dat$y) +
  arima.sim(100, model = list(ar = 0.8, sd = 0.1)))

Plot the signal to inspect it’s evolution over time

plot(
  true_signal,
  type = "l",
  bty = "l", lwd = 2,
  ylab = "True signal",
  xlab = "Time"
)

Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called temperature in this example. Again this makes use of gamSim

# Function to simulate a monotonic response to a covariate
sim_monotonic <- function(x, a = 2.2, b = 2) {
  out <- exp(a * x) / (6 + exp(b * x)) * -1
  return(2.5 * as.vector(scale(out)))
}

# Simulated temperature covariate
temperature <- runif(100, -2, 2)

# Simulate the three series
sim_series <- function(n_series = 3, true_signal) {
  temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05)
  alphas <- rnorm(n_series, sd = 2)

  do.call(rbind, lapply(seq_len(n_series), function(series) {
    data.frame(
      observed = rnorm(length(true_signal),
        mean = alphas[series] +
          sim_monotonic(temperature, 
                            runif(1, 2.2, 3),
                            runif(1, 2.2, 3)) +
          true_signal,
        sd = runif(1, 1, 2)
      ),
      series = paste0("sensor_", series),
      time = 1:length(true_signal),
      temperature = temperature,
      productivity = productivity,
      true_signal = true_signal
    )
  }))
}
model_dat <- sim_series(true_signal = true_signal) %>%
  dplyr::mutate(series = factor(series))
#> Gu & Wahba 4 term additive model, correlated predictors

Plot the sensor observations

plot_mvgam_series(
  data = model_dat, y = "observed",
  series = "all"
)

And now plot the observed relationships between the three sensors and the temperature covariate

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_1"),
  pch = 16, bty = "l",
  ylab = "Sensor 1",
  xlab = "Temperature"
)

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_2"),
  pch = 16, bty = "l",
  ylab = "Sensor 2",
  xlab = "Temperature"
)

plot(
  observed ~ temperature,
  data = model_dat %>%
    dplyr::filter(series == "sensor_3"),
  pch = 16, bty = "l",
  ylab = "Sensor 3",
  xlab = "Temperature"
)

The shared signal model

Now we can formulate and fit a model that allows each sensor’s observation error to depend nonlinearly on temperature while allowing the true signal to depend nonlinearly on productivity. By fixing all of the values in the trend column to 1 in the trend_map, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error

mod <- mvgam(
  formula =
  # formula for observations, allowing for different
  # intercepts and hierarchical smooth effects of temperature
    observed ~ series +
      s(temperature, k = 10) +
      s(series, temperature, bs = "sz", k = 8),
  trend_formula =
  # formula for the latent signal, which can depend
  # nonlinearly on productivity
    ~ s(productivity, k = 8) - 1,
  trend_model =
  # in addition to productivity effects, the signal is
  # assumed to exhibit temporal autocorrelation
    AR(),
  noncentred = TRUE,
  trend_map =
  # trend_map forces all sensors to track the same
  # latent signal
    data.frame(
      series = unique(model_dat$series),
      trend = c(1, 1, 1)
    ),

  # informative priors on process error
  # and observation error will help with convergence
  priors = c(
    prior(normal(2, 0.5), class = sigma),
    prior(normal(1, 0.5), class = sigma_obs)
  ),

  # Gaussian observations
  family = gaussian(),
  data = model_dat,
  silent = 2
)

View a reduced version of the model summary because there will be many spline coefficients in this model

summary(mod, include_betas = FALSE)
#> GAM observation formula:
#> observed ~ series + s(temperature, k = 10) + s(series, temperature, 
#>     bs = "sz", k = 8)
#> <environment: 0x000002063451d728>
#> 
#> GAM process formula:
#> ~s(productivity, k = 8) - 1
#> <environment: 0x000002063451d728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1100; warmup = 600; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1]  1.3 1.5   1.8    1  1791
#> sigma_obs[2]  1.2 1.4   1.6    1  2040
#> sigma_obs[3]  1.3 1.5   1.8    1  1607
#> 
#> GAM observation model coefficient (beta) estimates:
#>                 2.5%   50% 97.5% Rhat n_eff
#> (Intercept)     0.11  1.20  4.10 1.01   464
#> seriessensor_2 -2.40 -1.60 -0.65 1.00   831
#> seriessensor_3 -0.59  0.49  1.60 1.00  1316
#> 
#> Approximate significance of GAM observation smooths:
#>                         edf Ref.df   Chi.sq p-value    
#> s(temperature)        5.335      9 1446.722  <2e-16 ***
#> s(series,temperature) 2.545     16    1.042       1    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> standard deviation:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.89 1.2   1.5    1   576
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.56 0.77  0.97 1.01   378
#> 
#> Approximate significance of GAM process smooths:
#>                   edf Ref.df Chi.sq p-value
#> s(productivity) 1.865      7  41.97   0.228
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Inspecting effects on both process and observation models

Don’t pay much attention to the approximate p-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don’t tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with conditional_effects:

conditional_effects(mod, type = "link")

conditional_effects is simply a wrapper to the more flexible plot_predictions function from the marginaleffects package. We can get more useful plots of these effects using this function for further customisation:

plot_predictions(
  mod,
  condition = c("temperature", "series", "series"),
  points = 0.5
) +
  theme(legend.position = "none")

We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time.

Recovering the hidden signal

A final but very key question is whether we can successfully recover the true hidden signal. The trend slot in the returned model parameters has the estimates for this signal, which we can easily plot using the mvgam S3 method for plot. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it:

plot(mod, 
     type = "trend") +
  ggplot2::geom_point(data = data.frame(time = 1:100,
                                        y = true_signal),
                      mapping = ggplot2::aes(x = time,
                                             y = y))

Further reading

The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice:

Auger‐Méthé, Marie, et al. “A guide to state–space modeling of ecological time series.Ecological Monographs 91.4 (2021): e01470.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Ward, Eric J., et al. “Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.Journal of Applied Ecology 47.1 (2010): 47-56.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/time_varying_effects.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N) + 0.5 ## ----fig.alt = "Simulating time-varying effects in mvgam and R"------------------ plot( beta_temp, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Coefficient", col = "darkred" ) box(bty = "l", lwd = 2) ## -------------------------------------------------------------------------------- temp <- rnorm(N, sd = 1) ## ----fig.alt = "Simulating time-varying effects in mvgam and R"------------------ out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25) time <- seq_along(temp) plot( out, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Outcome", col = "darkred" ) box(bty = "l", lwd = 2) ## -------------------------------------------------------------------------------- data <- data.frame(out, temp, time) data_train <- data[1:190, ] data_test <- data[191:200, ] ## ----include=FALSE--------------------------------------------------------------- mod <- mvgam( out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), # family = gaussian(), # data = data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ## -------------------------------------------------------------------------------- require(marginaleffects) range_round <- function(x) { round(range(x, na.rm = TRUE), 2) } plot_predictions( mod, newdata = datagrid( time = unique, temp = range_round ), by = c("time", "temp", "temp"), type = "link" ) ## -------------------------------------------------------------------------------- fc <- forecast(mod, newdata = data_test) plot(fc) ## ----include=FALSE--------------------------------------------------------------- mod <- mvgam( out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod <- mvgam(out ~ dynamic(temp, k = 40), # family = gaussian(), # data = data_train, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ## -------------------------------------------------------------------------------- load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda")) dplyr::glimpse(SalmonSurvCUI) ## -------------------------------------------------------------------------------- SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor("salmon")) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data ## -------------------------------------------------------------------------------- dplyr::glimpse(model_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = model_data, y = "survival") ## ----include = FALSE------------------------------------------------------------- mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ## ----eval = FALSE---------------------------------------------------------------- # mod0 <- mvgam( # formula = survival ~ 1, # trend_model = AR(), # noncentred = TRUE, # priors = prior(normal(-3.5, 0.5), class = Intercept), # family = betar(), # data = model_data, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod0) ## -------------------------------------------------------------------------------- plot(mod0, type = "trend") ## ----include=FALSE--------------------------------------------------------------- mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, control = list(adapt_delta = 0.99), silent = 2 ) ## ----eval=FALSE------------------------------------------------------------------ # mod1 <- mvgam( # formula = survival ~ 1, # trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, # trend_model = AR(), # noncentred = TRUE, # priors = prior(normal(-3.5, 0.5), class = Intercept), # family = betar(), # data = model_data, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(mod1, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot(mod1, type = "trend") ## -------------------------------------------------------------------------------- plot(mod1, type = "forecast") ## -------------------------------------------------------------------------------- # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod0") mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod1") sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 require(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() ## -------------------------------------------------------------------------------- plot(mod1, type = "smooths", trend_effects = TRUE) ## -------------------------------------------------------------------------------- loo_compare(mod0, mod1) ## ----include=FALSE--------------------------------------------------------------- lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ## ----eval=FALSE------------------------------------------------------------------ # lfo_mod0 <- lfo_cv(mod0, min_t = 30) # lfo_mod1 <- lfo_cv(mod1, min_t = 30) ## -------------------------------------------------------------------------------- sum(lfo_mod0$elpds) sum(lfo_mod1$elpds) ## ----fig.alt = "Comparing forecast skill for dynamic beta regression models in mvgam and R"---- plot( x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = "ELPDmod0 - ELPDmod1", xlab = "Evaluation time point", pch = 16, col = "darkred", bty = "l" ) abline(h = 0, lty = "dashed") ================================================ FILE: inst/doc/time_varying_effects.Rmd ================================================ --- title: "Time-varying effects in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Time-varying effects in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to estimate and forecast regression coefficients that vary through time. ## Time-varying effects Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In `mvgam`, the `dynamic()` formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc...) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the `gp` basis in `mgcv` of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the 'global' dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn't do well if we used splines to estimate the effect. An example below illustrates. ### Simulating time-varying effects Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale $\rho$=10. We will do this using an internal function from `mvgam` (the `sim_gp` function): ```{r} set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N ) + 0.5 ``` A plot of the time-varying coefficient shows that it changes smoothly through time: ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} plot(beta_temp, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Coefficient", col = "darkred" ) box(bty = "l", lwd = 2) ``` Next we need to simulate the values of the covariate, which we will call `temp` (to represent $temperature$). In this case we just use a standard normal distribution to simulate this covariate: ```{r} temp <- rnorm(N, sd = 1) ``` Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of $temperature$ ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25 ) time <- seq_along(temp) plot(out, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Outcome", col = "darkred" ) box(bty = "l", lwd = 2) ``` Gather the data into a `data.frame` for fitting models, and split the data into training and testing folds. ```{r} data <- data.frame(out, temp, time) data_train <- data[1:190, ] data_test <- data[191:200, ] ``` ### The `dynamic()` function Time-varying coefficients can be fairly easily set up using the `s()` or `gp()` wrapper functions in `mvgam` formulae by fitting a nonlinear effect of `time` and using the covariate of interest as the numeric `by` variable (see `?mgcv::s` or `?brms::gp` for more details). The `dynamic()` formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to `dynamic`, it will either set up a low-rank GP smooth function using `s()` with `bs = 'gp'` and a fixed value of the length scale parameter $\rho$, or it will set up a Hilbert space approximate GP using the `gp()` function with `c=5/4` so that $\rho$ is estimated (see `?dynamic` for more details). In this first example we will use the `s()` option, and will mis-specify the $\rho$ parameter here as, in practice, it is never known. This call to `dynamic()` will set up the following smooth: `s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)` ```{r, include=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r, eval=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` Inspect the model summary, which shows how the `dynamic()` wrapper was used to construct a low-rank Gaussian Process smooth function: ```{r} summary(mod, include_betas = FALSE) ``` Because this model used a spline with a `gp` basis, it's smooths can be visualised just like any other `gam`. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the `newdata` argument in `plot_mvgam_smooth()` to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it's dynamics in both the training and testing data partitions ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` We can also use `plot_predictions()` from the `marginaleffects` package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of $temperature$: ```{r} require(marginaleffects) range_round <- function(x) { round(range(x, na.rm = TRUE), 2) } plot_predictions(mod, newdata = datagrid( time = unique, temp = range_round ), by = c("time", "temp", "temp"), type = "link" ) ``` This results in sensible forecasts of the observations as well ```{r} fc <- forecast(mod, newdata = data_test) plot(fc) ``` The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the `rho` argument in `dynamic()` to make this happen. This will set up a call similar to `gp(time, by = 'temp', c = 5/4, k = 40)`. ```{r include=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function: ```{r} summary(mod, include_betas = FALSE) ``` Effects for `gp()` terms can also be plotted as smooths: ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` ## Salmon survival example Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. [Scheuerell and Williams (2005)](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2419.2005.00346.x) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the `MARSS` package: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda")) dplyr::glimpse(SalmonSurvCUI) ``` First we need to prepare the data for modelling. The variable `CUI.apr` will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a `time` indicator and a `series` indicator for working in `mvgam`: ```{r} SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor("salmon")) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data ``` Inspect the data ```{r} dplyr::glimpse(model_data) ``` Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model: ```{r} plot_mvgam_series(data = model_data, y = "survival") ``` ### A State-Space Beta regression `mvgam` can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the `mgcv` function `betar()`, see `?mgcv::betar` for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model: ```{r include = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` ```{r eval = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters: ```{r} summary(mod0) ``` A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series: ```{r} plot(mod0, type = "trend") ``` ### Including time-varying upwelling effects Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a `dynamic()` effect of `CUI.apr` in the latent process model. We do not specify the $\rho$ parameter, instead opting to estimate it using a Hilbert space approximate GP: ```{r include=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, control = list(adapt_delta = 0.99), silent = 2 ) ``` ```{r eval=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary for this model now includes estimates for the time-varying GP parameters: ```{r} summary(mod1, include_betas = FALSE) ``` The estimates for the underlying dynamic process, and for the hindcasts, haven't changed much: ```{r} plot(mod1, type = "trend") ``` ```{r} plot(mod1, type = "forecast") ``` But the process error parameter $\sigma$ is slightly smaller for this model than for the first model: ```{r} # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod0") mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod1") sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 require(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() ``` Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using `plot()`: ```{r} plot(mod1, type = "smooths", trend_effects = TRUE) ``` ### Comparing model predictive performances A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in `mvgam` for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular `loo` package: ```{r} loo_compare(mod0, mod1) ``` The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, `mvgam` also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don't need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020). ```{r include=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` ```{r eval=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD ```{r} sum(lfo_mod0$elpds) sum(lfo_mod1$elpds) ``` We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts: ```{r, fig.alt = "Comparing forecast skill for dynamic beta regression models in mvgam and R"} plot( x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = "ELPDmod0 - ELPDmod1", xlab = "Evaluation time point", pch = 16, col = "darkred", bty = "l" ) abline(h = 0, lty = "dashed") ``` A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in `mvgam()`. But for now, we will leave the model as-is. ## Further reading The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice: Bürkner, PC, Gabry, J and Vehtari, A [Approximate leave-future-out cross-validation for Bayesian time series models](https://www.tandfonline.com/doi/full/10.1080/00949655.2020.1783262). *Journal of Statistical Computation and Simulation*. 90:14 (2020) 2499-2523. Herrero, Asier, et al. [From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12527). *Journal of Ecology* 104.2 (2016): 430-442. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Scheuerell, Mark D., and John G. Williams. [Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (*Oncorhynchus Tshawytscha*)](https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2419.2005.00346.x) *Fisheries Oceanography* 14 (2005): 448–57. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/time_varying_effects.html ================================================ Time-varying effects in mvgam

Time-varying effects in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to estimate and forecast regression coefficients that vary through time.

Time-varying effects

Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In mvgam, the dynamic() formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc…) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the gp basis in mgcv of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the ‘global’ dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn’t do well if we used splines to estimate the effect. An example below illustrates.

Simulating time-varying effects

Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale \(\rho\)=10. We will do this using an internal function from mvgam (the sim_gp function):

set.seed(1111)
N <- 200
beta_temp <- mvgam:::sim_gp(rnorm(1),
  alpha_gp = 0.75,
  rho_gp = 10,
  h = N
) + 0.5

A plot of the time-varying coefficient shows that it changes smoothly through time:

plot(beta_temp,
  type = "l", lwd = 3,
  bty = "l", xlab = "Time", ylab = "Coefficient",
  col = "darkred"
)
box(bty = "l", lwd = 2)

Simulating time-varying effects in mvgam and R

Next we need to simulate the values of the covariate, which we will call temp (to represent \(temperature\)). In this case we just use a standard normal distribution to simulate this covariate:

temp <- rnorm(N, sd = 1)

Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of \(temperature\)

out <- rnorm(N,
  mean = 4 + beta_temp * temp,
  sd = 0.25
)
time <- seq_along(temp)
plot(out,
  type = "l", lwd = 3,
  bty = "l", xlab = "Time", ylab = "Outcome",
  col = "darkred"
)
box(bty = "l", lwd = 2)

Simulating time-varying effects in mvgam and R

Gather the data into a data.frame for fitting models, and split the data into training and testing folds.

data <- data.frame(out, temp, time)
data_train <- data[1:190, ]
data_test <- data[191:200, ]

The dynamic() function

Time-varying coefficients can be fairly easily set up using the s() or gp() wrapper functions in mvgam formulae by fitting a nonlinear effect of time and using the covariate of interest as the numeric by variable (see ?mgcv::s or ?brms::gp for more details). The dynamic() formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to dynamic, it will either set up a low-rank GP smooth function using s() with bs = 'gp' and a fixed value of the length scale parameter \(\rho\), or it will set up a Hilbert space approximate GP using the gp() function with c=5/4 so that \(\rho\) is estimated (see ?dynamic for more details). In this first example we will use the s() option, and will mis-specify the \(\rho\) parameter here as, in practice, it is never known. This call to dynamic() will set up the following smooth: s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)

mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40),
  family = gaussian(),
  data = data_train,
  silent = 2
)

Inspect the model summary, which shows how the dynamic() wrapper was used to construct a low-rank Gaussian Process smooth function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.23 0.25  0.28    1  2113
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  3264
#> 
#> Approximate significance of GAM smooths:
#>                edf Ref.df Chi.sq p-value    
#> s(time):temp 16.35     40  168.2  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Because this model used a spline with a gp basis, it’s smooths can be visualised just like any other gam. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the newdata argument in plot_mvgam_smooth() to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it’s dynamics in both the training and testing data partitions

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = "dashed", lwd = 2)
lines(beta_temp, lwd = 2.5, col = "white")
lines(beta_temp, lwd = 2)

We can also use plot_predictions() from the marginaleffects package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of \(temperature\):

require(marginaleffects)
range_round <- function(x) {
  round(range(x, na.rm = TRUE), 2)
}
plot_predictions(mod,
  newdata = datagrid(
    time = unique,
    temp = range_round
  ),
  by = c("time", "temp", "temp"),
  type = "link"
)

This results in sensible forecasts of the observations as well

fc <- forecast(mod, newdata = data_test)
plot(fc)

The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the rho argument in dynamic() to make this happen. This will set up a call similar to gp(time, by = 'temp', c = 5/4, k = 40).

mod <- mvgam(out ~ dynamic(temp, k = 40),
  family = gaussian(),
  data = data_train,
  silent = 2
)

This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function:

summary(mod, include_betas = FALSE)
#> GAM formula:
#> out ~ gp(time, by = temp, c = 5/4, k = 40, scale = TRUE)
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 190 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.24 0.26  0.29    1  2547
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    4   4   4.1    1  2704
#> 
#> GAM gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                      2.5%   50% 97.5% Rhat n_eff
#> alpha_gp(time):temp 0.630 0.880 1.400    1   734
#> rho_gp(time):temp   0.024 0.051 0.068    1   611
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

Effects for gp() terms can also be plotted as smooths:

plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 190, lty = "dashed", lwd = 2)
lines(beta_temp, lwd = 2.5, col = "white")
lines(beta_temp, lwd = 2)

Salmon survival example

Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. Scheuerell and Williams (2005) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the MARSS package:

load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda"))
dplyr::glimpse(SalmonSurvCUI)
#> Rows: 42
#> Columns: 3
#> $ year    <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 19…
#> $ logit.s <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82,…
#> $ CUI.apr <int> 57, 5, 43, 11, 47, -21, 25, -2, -1, 43, 2, 35, 0, 1, -1, 6, -7…

First we need to prepare the data for modelling. The variable CUI.apr will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a time indicator and a series indicator for working in mvgam:

SalmonSurvCUI %>%
  # create a time variable
  dplyr::mutate(time = dplyr::row_number()) %>%
  # create a series variable
  dplyr::mutate(series = as.factor("salmon")) %>%
  # z-score the covariate CUI.apr
  dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>%
  # convert logit-transformed survival back to proportional
  dplyr::mutate(survival = plogis(logit.s)) -> model_data

Inspect the data

dplyr::glimpse(model_data)
#> Rows: 42
#> Columns: 6
#> $ year     <int> 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1…
#> $ logit.s  <dbl> -3.46, -3.32, -3.58, -3.03, -3.61, -3.35, -3.93, -4.19, -4.82…
#> $ CUI.apr  <dbl> 2.37949804, 0.03330223, 1.74782994, 0.30401713, 1.92830654, -…
#> $ time     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ series   <fct> salmon, salmon, salmon, salmon, salmon, salmon, salmon, salmo…
#> $ survival <dbl> 0.030472033, 0.034891409, 0.027119717, 0.046088827, 0.0263393…

Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model:

plot_mvgam_series(data = model_data, y = "survival")

A State-Space Beta regression

mvgam can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the mgcv function betar(), see ?mgcv::betar for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model:

mod0 <- mvgam(
  formula = survival ~ 1,
  trend_model = AR(),
  noncentred = TRUE,
  priors = prior(normal(-3.5, 0.5), class = Intercept),
  family = betar(),
  data = model_data,
  silent = 2
)

The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters:

summary(mod0)
#> GAM formula:
#> survival ~ 1
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]   82 220   560 1.01   173
#> 
#> GAM coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.6 -4.3    -4    1   524
#> 
#> standard deviation:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.11 0.4  0.65 1.01   160
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1]  2.4 6.4    77 1.01   273
#> 
#> autoregressive coef 1:
#>         2.5%  50% 97.5% Rhat n_eff
#> ar1[1] -0.32 0.68  0.98 1.01   310
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series:

plot(mod0, type = "trend")

Including time-varying upwelling effects

Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a dynamic() effect of CUI.apr in the latent process model. We do not specify the \(\rho\) parameter, instead opting to estimate it using a Hilbert space approximate GP:

mod1 <- mvgam(
  formula = survival ~ 1,
  trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1,
  trend_model = AR(),
  noncentred = TRUE,
  priors = prior(normal(-3.5, 0.5), class = Intercept),
  family = betar(),
  data = model_data,
  silent = 2
)

The summary for this model now includes estimates for the time-varying GP parameters:

summary(mod1, include_betas = FALSE)
#> GAM observation formula:
#> survival ~ 1
#> <environment: 0x0000017caa517728>
#> 
#> GAM process formula:
#> ~dynamic(CUI.apr, k = 25, scale = FALSE) - 1
#> <environment: 0x0000017caa517728>
#> 
#> Family:
#> beta
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 42 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation precision parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1]  170 340   650 1.01   794
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5%  50% 97.5% Rhat n_eff
#> (Intercept) -4.4 -3.8  -2.9    1   682
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.17 0.32  0.51    1   661
#> 
#> autoregressive coef 1:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.57 0.93     1    1   498
#> 
#> GAM process model gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                         2.5%  50% 97.5% Rhat n_eff
#> alpha_gp(time):CUI.apr 0.036 0.32   1.5 1.00   999
#> rho_gp(time):CUI.apr   1.300 5.80  43.0 1.01   607
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The estimates for the underlying dynamic process, and for the hindcasts, haven’t changed much:

plot(mod1, type = "trend")

plot(mod1, type = "forecast")

But the process error parameter \(\sigma\) is slightly smaller for this model than for the first model:

# Extract estimates of the process error 'sigma' for each model
mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>%
  dplyr::mutate(model = "Mod0")
mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>%
  dplyr::mutate(model = "Mod1")
sigmas <- rbind(mod0_sigma, mod1_sigma)

# Plot using ggplot2
require(ggplot2)
ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) +
  geom_density(alpha = 0.3, colour = NA) +
  coord_flip()

Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using plot():

plot(mod1, type = "smooths", trend_effects = TRUE)

Comparing model predictive performances

A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in mvgam for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular loo package:

loo_compare(mod0, mod1)
#>      elpd_diff se_diff
#> mod0     0.0       0.0
#> mod1 -1308.3     135.3

The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, mvgam also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don’t need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020).

lfo_mod0 <- lfo_cv(mod0, min_t = 30)
lfo_mod1 <- lfo_cv(mod1, min_t = 30)

The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD

sum(lfo_mod0$elpds)
#> [1] 35.92439
sum(lfo_mod1$elpds)
#> [1] 37.0323

We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts:

plot(
  x = 1:length(lfo_mod0$elpds) + 30,
  y = lfo_mod0$elpds - lfo_mod1$elpds,
  ylab = "ELPDmod0 - ELPDmod1",
  xlab = "Evaluation time point",
  pch = 16,
  col = "darkred",
  bty = "l"
)
abline(h = 0, lty = "dashed")

Comparing forecast skill for dynamic beta regression models in mvgam and R

A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in mvgam(). But for now, we will leave the model as-is.

Further reading

The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice:

Bürkner, PC, Gabry, J and Vehtari, A Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation. 90:14 (2020) 2499-2523.

Herrero, Asier, et al. From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits. Journal of Ecology 104.2 (2016): 430-442.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. “MARSS: multivariate autoregressive state-space models for analyzing time-series data.R Journal. 4.1 (2012): 11.

Scheuerell, Mark D., and John G. Williams. Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (Oncorhynchus Tshawytscha) Fisheries Oceanography 14 (2005): 448–57.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: inst/doc/trend_formulas.R ================================================ params <- list(EVAL = TRUE) ## ----echo = FALSE---------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ## ----setup, include=FALSE-------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ## -------------------------------------------------------------------------------- load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda")) ## -------------------------------------------------------------------------------- outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae") ## -------------------------------------------------------------------------------- # loop across each plankton group to create the long datframe plankton_data <- do.call( rbind, lapply(outcomes, function(x) { # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame( year = lakeWAplanktonTrans[, "Year"], month = lakeWAplanktonTrans[, "Month"], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, "Temp"] ) }) ) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() ## -------------------------------------------------------------------------------- head(plankton_data) ## -------------------------------------------------------------------------------- dplyr::glimpse(plankton_data) ## -------------------------------------------------------------------------------- plot_mvgam_series(data = plankton_data, series = "all") ## -------------------------------------------------------------------------------- plankton_data %>% dplyr::filter(series == "Other.algae") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3) + geom_line(aes(y = y), col = "darkred", size = 1.1) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Other algae (red)") ## -------------------------------------------------------------------------------- plankton_data %>% dplyr::filter(series == "Diatoms") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3) + geom_line(aes(y = y), col = "darkred", size = 1.1) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Diatoms (red)") ## -------------------------------------------------------------------------------- plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112) ## ----notrend_mod, include = FALSE, results='hide'-------------------------------- notrend_mod <- mvgam( y ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ## ----eval=FALSE------------------------------------------------------------------ # notrend_mod <- mvgam( # y ~ # # tensor of temp and month to capture # # "global" seasonality # te(temp, month, k = c(4, 4)) + # # # series-specific deviation tensor products # te(temp, month, k = c(4, 4), by = series) - 1, # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # trend_model = "None" # ) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 1) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 2) ## -------------------------------------------------------------------------------- plot_mvgam_smooth(notrend_mod, smooth = 3) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 1) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 2) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "forecast", series = 3) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "residuals", series = 1) ## -------------------------------------------------------------------------------- plot(notrend_mod, type = "residuals", series = 3) ## -------------------------------------------------------------------------------- priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train ) ## -------------------------------------------------------------------------------- priors[, 3] ## -------------------------------------------------------------------------------- priors[, 4] ## -------------------------------------------------------------------------------- priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ## ----var_mod, include = FALSE, results='hide'------------------------------------ var_mod <- mvgam( y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1 , family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(), priors = priors, adapt_delta = 0.99, burnin = 1000 ) ## ----eval=FALSE------------------------------------------------------------------ # var_mod <- mvgam( # # observation formula, which is empty # forumla = y ~ -1, # # # process model formula, which includes the smooth functions # trend_formula = ~ te(temp, month, k = c(4, 4)) + # te(temp, month, k = c(4, 4), by = trend) - 1, # # # VAR1 model with uncorrelated process errors # trend_model = VAR(), # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # # # include the updated priors # priors = priors, # silent = 2 # ) ## -------------------------------------------------------------------------------- summary(var_mod, include_betas = FALSE) ## -------------------------------------------------------------------------------- plot(var_mod, "smooths", trend_effects = TRUE) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( var_mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( var_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist") ## -------------------------------------------------------------------------------- priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ## ----varcor_mod, include = FALSE, results='hide'--------------------------------- varcor_mod <- mvgam( y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1 , family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(cor = TRUE), burnin = 1000, adapt_delta = 0.99, priors = priors ) ## ----eval=FALSE------------------------------------------------------------------ # varcor_mod <- mvgam( # # observation formula, which remains empty # formula = y ~ -1, # # # process model formula, which includes the smooth functions # trend_formula = ~ te(temp, month, k = c(4, 4)) + # te(temp, month, k = c(4, 4), by = trend) - 1, # # # VAR1 model with correlated process errors # trend_model = VAR(cor = TRUE), # family = gaussian(), # data = plankton_train, # newdata = plankton_test, # # # include the updated priors # priors = priors, # silent = 2 # ) ## ----warning=FALSE, message=FALSE------------------------------------------------ mcmc_plot( varcor_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ## -------------------------------------------------------------------------------- Sigma_post <- as.matrix( varcor_mod, variable = "Sigma", regex = TRUE ) median_correlations <- cov2cor( matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5) ) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ## -------------------------------------------------------------------------------- irfs <- irf(varcor_mod, h = 12) ## -------------------------------------------------------------------------------- summary(irfs) ## -------------------------------------------------------------------------------- plot(irfs, series = 3) ## -------------------------------------------------------------------------------- fevds <- fevd(varcor_mod, h = 12) plot(fevds) ## -------------------------------------------------------------------------------- # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "variogram")$all_series$score - score(fcvar, score = "variogram")$all_series$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1]) ) abline(h = 0, lty = "dashed") ## -------------------------------------------------------------------------------- # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "energy")$all_series$score - score(fcvar, score = "energy")$all_series$score plot( diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1]) ) abline(h = 0, lty = "dashed") ## -------------------------------------------------------------------------------- description <- how_to_cite(varcor_mod) ## ----eval = FALSE---------------------------------------------------------------- # description ## ----echo=FALSE------------------------------------------------------------------ cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ## ----echo=FALSE------------------------------------------------------------------ cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ================================================ FILE: inst/doc/trend_formulas.Rmd ================================================ --- title: "State-Space models in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{State-Space models in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate State-Space models with nonlinear effects. ## State-Space Models ![Illustration of a basic State-Space model, which assumes that a latent dynamic *process* (X) can evolve independently from the way we take *observations* (Y) of that process](SS_model.svg){width=85%}
State-Space models allow us to separately make inferences about the underlying dynamic *process model* that we are interested in (i.e. the evolution of a time series or a collection of time series) and the *observation model* (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariates might impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of `mvgam` is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes. ### Lake Washington plankton data The data we will use to illustrate how we can fit State-Space models in `mvgam` are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the `MARSS` package and can be downloaded using the following: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda")) ``` We will work with five different groups of plankton: ```{r} outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae") ``` As usual, preparing the data into the correct format for `mvgam` modelling takes a little bit of wrangling in `dplyr`: ```{r} # loop across each plankton group to create the long datframe plankton_data <- do.call(rbind, lapply(outcomes, function(x) { # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame( year = lakeWAplanktonTrans[, "Year"], month = lakeWAplanktonTrans[, "Month"], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, "Temp"] ) })) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() ``` Inspect the data structure ```{r} head(plankton_data) ``` ```{r} dplyr::glimpse(plankton_data) ``` Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables) ```{r} plot_mvgam_series(data = plankton_data, series = "all") ``` We have some missing observations, but this isn't an issue for modelling in `mvgam`. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month: ```{r} plankton_data %>% dplyr::filter(series == "Other.algae") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Other algae (red)") ``` ```{r} plankton_data %>% dplyr::filter(series == "Diatoms") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Diatoms (red)") ``` We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits: ```{r} plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112) ``` Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of `mvgam`). ### Capturing seasonality First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a "global" tensor product of the `month` and `temp` variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group's seasonality differs from the overall "global" seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0. ```{r notrend_mod, include = FALSE, results='hide'} notrend_mod <- mvgam( y ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` ```{r eval=FALSE} notrend_mod <- mvgam( y ~ # tensor of temp and month to capture # "global" seasonality te(temp, month, k = c(4, 4)) + # series-specific deviation tensor products te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` The "global" tensor product smooth function can be quickly visualized: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 1) ``` On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the "global" pattern: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 2) ``` ```{r} plot_mvgam_smooth(notrend_mod, smooth = 3) ``` These multidimensional smooths have done a good job of capturing the seasonal variation in our observations: ```{r} plot(notrend_mod, type = "forecast", series = 1) ``` ```{r} plot(notrend_mod, type = "forecast", series = 2) ``` ```{r} plot(notrend_mod, type = "forecast", series = 3) ``` This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series: ```{r} plot(notrend_mod, type = "residuals", series = 1) ``` ```{r} plot(notrend_mod, type = "residuals", series = 3) ``` ### Multiseries dynamics Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*} Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors $\sigma_{obs}$). At present, `mvgam` does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time $t$ $(\mu_{process[t]})$ is a vector that evolves as a function of where the vector-valued process model was at time $t-1$. The $A$ matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by $\Sigma_{process}$, which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using `Stan`'s $LKJcorr$ distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, `mvgam` makes use of a recent breakthrough by [Sarah Heaps to enforce stationarity of Bayesian VAR processes](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). This is advantageous as we often don't expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.
Ok that was a lot to take in. Let's fit some models to try and inspect what is going on and what they assume. But first, we need to update `mvgam`'s default priors for the observation and process errors. By default, `mvgam` uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let's update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in `mvgam`: ```{r} priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train ) ``` Get names of all parameters whose priors can be modified: ```{r} priors[, 3] ``` And their default prior distributions: ```{r} priors[, 4] ``` Setting priors is easy in `mvgam` as you can use `brms` routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. `mvgam` accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated ```{r var_mod, include = FALSE, results='hide'} var_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(), priors = priors, adapt_delta = 0.99, burnin = 1000 ) ``` ```{r eval=FALSE} var_mod <- mvgam( # observation formula, which is empty forumla = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` ### Inspecting SS models This model's summary is a bit different to other `mvgam` summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called `A` in the output) and for the process errors (`Sigma`). Note that we set `include_betas = FALSE` to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret: ```{r} summary(var_mod, include_betas = FALSE) ``` The convergence of this model isn't fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using `trend_effects = TRUE` in the plotting functions: ```{r} plot(var_mod, "smooths", trend_effects = TRUE) ``` The autoregressive coefficient matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately `bayesplot` doesn't know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. Using `dir = 'v'` in the `facet_args` argument will accomplish this: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an *increase* in the process for series 3 (Greens) at time $t$ is expected to impact the process for series 1 (Bluegreens) at time $t+1$. The latent process model is now capturing these effects and the smooth seasonal effects. The process error $(\Sigma)$ captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` The observation error estimates $(\sigma_{obs})$ represent how much the model thinks we might miss the true count when we take our imperfect measurements: ```{r warning=FALSE, message=FALSE} mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist") ``` These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations. ### Correlated process errors Let's see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` And now we can fit the correlated process error model ```{r varcor_mod, include = FALSE, results='hide'} varcor_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(cor = TRUE), burnin = 1000, adapt_delta = 0.99, priors = priors ) ``` ```{r eval=FALSE} varcor_mod <- mvgam( # observation formula, which remains empty formula = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with correlated process errors trend_model = VAR(cor = TRUE), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` The $(\Sigma)$ matrix now captures any evidence of contemporaneously correlated process error: ```{r warning=FALSE, message=FALSE} mcmc_plot( varcor_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations: ```{r} Sigma_post <- as.matrix( varcor_mod, variable = "Sigma", regex = TRUE ) median_correlations <- cov2cor( matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5 ) ) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ``` ### Impulse response functions Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an [Impulse Response Function](https://en.wikipedia.org/wiki/Impulse_response) (IRF). If $h$ represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times $(t+1):h$ if a focal series is given an innovation "shock" at time $t = 0$. `mvgam` can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the `irf()` function and then use the S3 `plot()` function to view the estimated responses. By default, `irf()` will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps: ```{r} irfs <- irf(varcor_mod, h = 12) ``` A summary of the IRFs can be computed using the `summary()` function: ```{r} summary(irfs) ``` But it is easier to understand these responses using plots. For example, we can plot the expected responses of the remaining series to a positive shock for series 3 (Greens) using the `plot()` function: ```{r} plot(irfs, series = 3) ``` This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the $A$ matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models. You can also use these IRFs to calculate a relative contribution from each shock to the forecast error variance for a focal series. This method, known as a [Forecast Error Variance Decomposition](https://en.wikipedia.org/wiki/Variance_decomposition_of_forecast_errors) (FEVD), is useful to get an idea about the amount of information that each series contributes to the evolution of all other series in a Vector Autoregression: ```{r} fevds <- fevd(varcor_mod, h = 12) plot(fevds) ``` The plot above shows the median contribution to forecast error variance for each series. ### Comparing forecast scores But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set: ```{r} # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "variogram")$all_series$score - score(fcvar, score = "variogram")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1]) ) abline(h = 0, lty = "dashed") ``` And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated: ```{r} # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "energy")$all_series$score - score(fcvar, score = "energy")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1]) ) abline(h = 0, lty = "dashed") ``` The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. `mvgam` offers some utilities for doing this (i.e. see `?lfo_cv` for guidance). Alternatively, we could use forecasts from *both* models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the `ensemble()` function in `mvgam` (see `?ensemble` for guidance). Using `how_to_cite()` for models with VAR dynamics will give you information on how they are restricted to remain stationary: ```{r} description <- how_to_cite(varcor_mod) ``` ```{r, eval = FALSE} description ``` ```{r, echo=FALSE} cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ``` ```{r echo=FALSE} cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ``` More advanced hierarchical panel VAR models can also be handled by using the `gr` and `subgr` arguments in `VAR()`. These models are useful if you have a data for the same set of series (`subgr`) that are measured in different regions (`gr`), such as species measured in different sampling regions or financial series measured in different countries. ## Further reading The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. [A guide to state–space modeling of ecological time series](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470). *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Heaps, Sarah E. [Enforcing stationarity through the prior in vector autoregressions](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant](https://doi.org/10.1016/j.csda.2022.107659). *Computational Statistics & Data Analysis* 179 (2023): 107659. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. [MARSS: multivariate autoregressive state-space models for analyzing time-series data](https://journal.r-project.org/articles/RJ-2012-002/). *R Journal*. 4.1 (2012): 11. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Ward, Eric J., et al. [Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x). *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: inst/doc/trend_formulas.html ================================================ State-Space models in mvgam

State-Space models in mvgam

Nicholas J Clark

2026-01-19

The purpose of this vignette is to show how the mvgam package can be used to fit and interrogate State-Space models with nonlinear effects.

State-Space Models

Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process
Illustration of a basic State-Space model, which assumes that a latent dynamic process (X) can evolve independently from the way we take observations (Y) of that process


State-Space models allow us to separately make inferences about the underlying dynamic process model that we are interested in (i.e. the evolution of a time series or a collection of time series) and the observation model (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariates might impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of mvgam is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes.

Lake Washington plankton data

The data we will use to illustrate how we can fit State-Space models in mvgam are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the MARSS package and can be downloaded using the following:

load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda"))

We will work with five different groups of plankton:

outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae")

As usual, preparing the data into the correct format for mvgam modelling takes a little bit of wrangling in dplyr:

# loop across each plankton group to create the long datframe
plankton_data <- do.call(rbind, lapply(outcomes, function(x) {
  # create a group-specific dataframe with counts labelled 'y'
  # and the group name in the 'series' variable
  data.frame(
    year = lakeWAplanktonTrans[, "Year"],
    month = lakeWAplanktonTrans[, "Month"],
    y = lakeWAplanktonTrans[, x],
    series = x,
    temp = lakeWAplanktonTrans[, "Temp"]
  )
})) %>%
  # change the 'series' label to a factor
  dplyr::mutate(series = factor(series)) %>%
  # filter to only include some years in the data
  dplyr::filter(year >= 1965 & year < 1975) %>%
  dplyr::arrange(year, month) %>%
  dplyr::group_by(series) %>%
  # z-score the counts so they are approximately standard normal
  dplyr::mutate(y = as.vector(scale(y))) %>%
  # add the time indicator
  dplyr::mutate(time = dplyr::row_number()) %>%
  dplyr::ungroup()

Inspect the data structure

head(plankton_data)
#> # A tibble: 6 × 6
#>    year month       y series       temp  time
#>   <dbl> <dbl>   <dbl> <fct>       <dbl> <int>
#> 1  1965     1 -0.542  Greens      -1.23     1
#> 2  1965     1 -0.344  Bluegreens  -1.23     1
#> 3  1965     1 -0.0768 Diatoms     -1.23     1
#> 4  1965     1 -1.52   Unicells    -1.23     1
#> 5  1965     1 -0.491  Other.algae -1.23     1
#> 6  1965     2 NA      Greens      -1.32     2
dplyr::glimpse(plankton_data)
#> Rows: 600
#> Columns: 6
#> $ year   <dbl> 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196…
#> $ month  <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …
#> $ y      <dbl> -0.54241769, -0.34410776, -0.07684901, -1.52243490, -0.49055442…
#> $ series <fct> Greens, Bluegreens, Diatoms, Unicells, Other.algae, Greens, Blu…
#> $ temp   <dbl> -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.2306562, -1.…
#> $ time   <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, …

Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables)

plot_mvgam_series(data = plankton_data, series = "all")

We have some missing observations, but this isn’t an issue for modelling in mvgam. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month:

plankton_data %>%
  dplyr::filter(series == "Other.algae") %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y),
    col = "white",
    size = 1.3
  ) +
  geom_line(aes(y = y),
    col = "darkred",
    size = 1.1
  ) +
  ylab("z-score") +
  xlab("Time") +
  ggtitle("Temperature (black) vs Other algae (red)")

plankton_data %>%
  dplyr::filter(series == "Diatoms") %>%
  ggplot(aes(x = time, y = temp)) +
  geom_line(size = 1.1) +
  geom_line(aes(y = y),
    col = "white",
    size = 1.3
  ) +
  geom_line(aes(y = y),
    col = "darkred",
    size = 1.1
  ) +
  ylab("z-score") +
  xlab("Time") +
  ggtitle("Temperature (black) vs Diatoms (red)")

We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits:

plankton_train <- plankton_data %>%
  dplyr::filter(time <= 112)
plankton_test <- plankton_data %>%
  dplyr::filter(time > 112)

Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of mvgam).

Capturing seasonality

First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a “global” tensor product of the month and temp variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group’s seasonality differs from the overall “global” seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0.

notrend_mod <- mvgam(
  y ~
    # tensor of temp and month to capture
    # "global" seasonality
    te(temp, month, k = c(4, 4)) +

    # series-specific deviation tensor products
    te(temp, month, k = c(4, 4), by = series) - 1,
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  trend_model = "None"
)

The “global” tensor product smooth function can be quickly visualized:

plot_mvgam_smooth(notrend_mod, smooth = 1)

On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the “global” pattern:

plot_mvgam_smooth(notrend_mod, smooth = 2)

plot_mvgam_smooth(notrend_mod, smooth = 3)

These multidimensional smooths have done a good job of capturing the seasonal variation in our observations:

plot(notrend_mod, type = "forecast", series = 1)

plot(notrend_mod, type = "forecast", series = 2)

plot(notrend_mod, type = "forecast", series = 3)

This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series:

plot(notrend_mod, type = "residuals", series = 1)

plot(notrend_mod, type = "residuals", series = 3)

Multiseries dynamics

Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows:

\[\begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*}\]

Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors \(\sigma_{obs}\)). At present, mvgam does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time \(t\) \((\mu_{process[t]})\) is a vector that evolves as a function of where the vector-valued process model was at time \(t-1\). The \(A\) matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by \(\Sigma_{process}\), which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using Stan’s \(LKJcorr\) distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, mvgam makes use of a recent breakthrough by Sarah Heaps to enforce stationarity of Bayesian VAR processes. This is advantageous as we often don’t expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.


Ok that was a lot to take in. Let’s fit some models to try and inspect what is going on and what they assume. But first, we need to update mvgam’s default priors for the observation and process errors. By default, mvgam uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let’s update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in mvgam:

priors <- get_mvgam_priors(
  # observation formula, which has no terms in it
  y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train
)

Get names of all parameters whose priors can be modified:

priors[, 3]
#>  [1] "(Intercept)"                                                                                                                                                                                                                                                           
#>  [2] "process error sd"                                                                                                                                                                                                                                                      
#>  [3] "diagonal autocorrelation population mean"                                                                                                                                                                                                                              
#>  [4] "off-diagonal autocorrelation population mean"                                                                                                                                                                                                                          
#>  [5] "diagonal autocorrelation population variance"                                                                                                                                                                                                                          
#>  [6] "off-diagonal autocorrelation population variance"                                                                                                                                                                                                                      
#>  [7] "shape1 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#>  [8] "shape1 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#>  [9] "shape2 for diagonal autocorrelation precision"                                                                                                                                                                                                                         
#> [10] "shape2 for off-diagonal autocorrelation precision"                                                                                                                                                                                                                     
#> [11] "observation error sd"                                                                                                                                                                                                                                                  
#> [12] "te(temp,month) smooth parameters, te(temp,month):trendtrend1 smooth parameters, te(temp,month):trendtrend2 smooth parameters, te(temp,month):trendtrend3 smooth parameters, te(temp,month):trendtrend4 smooth parameters, te(temp,month):trendtrend5 smooth parameters"

And their default prior distributions:

priors[, 4]
#>  [1] "(Intercept) ~ student_t(3, -0.1, 2.5);"
#>  [2] "sigma ~ inv_gamma(1.418, 0.452);"      
#>  [3] "es[1] = 0;"                            
#>  [4] "es[2] = 0;"                            
#>  [5] "fs[1] = sqrt(0.455);"                  
#>  [6] "fs[2] = sqrt(0.455);"                  
#>  [7] "gs[1] = 1.365;"                        
#>  [8] "gs[2] = 1.365;"                        
#>  [9] "hs[1] = 0.071175;"                     
#> [10] "hs[2] = 0.071175;"                     
#> [11] "sigma_obs ~ inv_gamma(1.418, 0.452);"  
#> [12] "lambda_trend ~ normal(5, 30);"

Setting priors is easy in mvgam as you can use brms routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors:

priors <- c(
  prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
  prior(normal(0.5, 0.25), class = sigma)
)

You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. mvgam accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated

var_mod <- mvgam(
  # observation formula, which is empty
  forumla = y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with uncorrelated process errors
  trend_model = VAR(),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,

  # include the updated priors
  priors = priors,
  silent = 2
)

Inspecting SS models

This model’s summary is a bit different to other mvgam summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called A in the output) and for the process errors (Sigma). Note that we set include_betas = FALSE to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret:

summary(var_mod, include_betas = FALSE)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x0000017ff1154728>
#> 
#> GAM process formula:
#> ~te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), 
#>     by = trend) - 1
#> <environment: 0x0000017ff1154728>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> VAR()
#> 
#> N process models:
#> 5 
#> 
#> N series:
#> 5 
#> 
#> N timepoints:
#> 120 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1500; warmup = 1000; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.20 0.26  0.34 1.01   412
#> sigma_obs[2] 0.24 0.40  0.54 1.02   193
#> sigma_obs[3] 0.43 0.65  0.83 1.15    29
#> sigma_obs[4] 0.25 0.37  0.49 1.01   242
#> sigma_obs[5] 0.31 0.43  0.56 1.03   226
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> standard deviation:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.26 0.34  0.42 1.01   463
#> sigma[2] 0.23 0.39  0.55 1.09    35
#> sigma[3] 0.10 0.49  0.80 1.33    14
#> sigma[4] 0.33 0.45  0.59 1.02   201
#> sigma[5] 0.22 0.35  0.51 1.04   142
#> 
#> var coefficient matrix:
#>          2.5%    50% 97.5% Rhat n_eff
#> A[1,1]  0.600  0.790 0.910 1.03   163
#> A[1,2] -0.420 -0.140 0.041 1.04   111
#> A[1,3] -0.210  0.016 0.310 1.00   345
#> A[1,4] -0.055  0.056 0.220 1.01   543
#> A[1,5] -0.037  0.120 0.380 1.03   181
#> A[2,1] -0.540 -0.190 0.025 1.03   128
#> A[2,2]  0.062  0.430 0.740 1.02   242
#> A[2,3] -0.290  0.026 1.300 1.20    26
#> A[2,4] -0.110  0.110 0.370 1.03   117
#> A[2,5] -0.040  0.230 0.640 1.03   133
#> A[3,1] -0.330 -0.030 0.190 1.02   333
#> A[3,2] -0.500 -0.051 0.320 1.03   193
#> A[3,3] -0.034  0.520 0.900 1.08    64
#> A[3,4] -0.090  0.120 0.500 1.04   142
#> A[3,5] -0.290  0.027 0.380 1.01   465
#> A[4,1] -0.450 -0.120 0.086 1.06    89
#> A[4,2] -0.660 -0.180 0.130 1.06   107
#> A[4,3] -0.270  0.086 1.300 1.16    30
#> A[4,4]  0.520  0.730 0.960 1.02   212
#> A[4,5] -0.051  0.190 0.650 1.03   150
#> A[5,1] -0.110  0.053 0.270 1.00   449
#> A[5,2] -0.430 -0.110 0.130 1.01   238
#> A[5,3] -0.150  0.060 0.780 1.12    41
#> A[5,4] -0.210 -0.040 0.110 1.01   373
#> A[5,5]  0.460  0.740 0.950 1.00   370
#> 
#> Approximate significance of GAM process smooths:
#>                               edf Ref.df Chi.sq p-value
#> te(temp,month)              3.374     15 37.656   0.427
#> te(temp,month):seriestrend1 2.798     15  3.441   0.996
#> te(temp,month):seriestrend2 4.454     15 48.402   0.245
#> te(temp,month):seriestrend3 1.748     15  3.363   1.000
#> te(temp,month):seriestrend4 1.352     15  6.409   0.999
#> te(temp,month):seriestrend5 3.085     15  6.703   0.979
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✖ Rhats above 1.05 found for some parameters
#>     Use pairs() and mcmc_plot() to investigate
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

The convergence of this model isn’t fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using trend_effects = TRUE in the plotting functions:

plot(var_mod, "smooths", trend_effects = TRUE)

The autoregressive coefficient matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately bayesplot doesn’t know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. Using dir = 'v' in the facet_args argument will accomplish this:

mcmc_plot(
  var_mod,
  variable = 'A',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an increase in the process for series 3 (Greens) at time \(t\) is expected to impact the process for series 1 (Bluegreens) at time \(t+1\). The latent process model is now capturing these effects and the smooth seasonal effects.

The process error \((\Sigma)\) captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes:

mcmc_plot(
  var_mod,
  variable = 'Sigma',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

The observation error estimates \((\sigma_{obs})\) represent how much the model thinks we might miss the true count when we take our imperfect measurements:

mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist")

These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations.

Correlated process errors

Let’s see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors:

priors <- c(
  prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2),
  prior(normal(0.5, 0.25), class = sigma)
)

And now we can fit the correlated process error model

varcor_mod <- mvgam(
  # observation formula, which remains empty
  formula = y ~ -1,

  # process model formula, which includes the smooth functions
  trend_formula = ~ te(temp, month, k = c(4, 4)) +
    te(temp, month, k = c(4, 4), by = trend) - 1,

  # VAR1 model with correlated process errors
  trend_model = VAR(cor = TRUE),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,

  # include the updated priors
  priors = priors,
  silent = 2
)

The \((\Sigma)\) matrix now captures any evidence of contemporaneously correlated process error:

mcmc_plot(
  varcor_mod,
  variable = 'Sigma',
  regex = TRUE,
  type = 'hist',
  facet_args = list(dir = 'v')
)

This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations:

Sigma_post <- as.matrix(
  varcor_mod, 
  variable = "Sigma", 
  regex = TRUE
)
median_correlations <- cov2cor(
  matrix(apply(Sigma_post, 2, median),
         nrow = 5, 
         ncol = 5
  )
)
rownames(median_correlations) <- 
  colnames(median_correlations) <- 
  levels(plankton_train$series)

round(median_correlations, 2)
#>             Bluegreens Diatoms Greens Other.algae Unicells
#> Bluegreens        1.00   -0.20  -0.04        0.17     0.48
#> Diatoms          -0.20    1.00   0.13        0.45     0.17
#> Greens           -0.04    0.13   1.00        0.30    -0.05
#> Other.algae       0.17    0.45   0.30        1.00     0.28
#> Unicells          0.48    0.17  -0.05        0.28     1.00

Impulse response functions

Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an Impulse Response Function (IRF). If \(h\) represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times \((t+1):h\) if a focal series is given an innovation “shock” at time \(t = 0\). mvgam can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the irf() function and then use the S3 plot() function to view the estimated responses. By default, irf() will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps:

irfs <- irf(varcor_mod, h = 12)

A summary of the IRFs can be computed using the summary() function:

summary(irfs)
#> # A tibble: 300 × 5
#>    shock                  horizon irfQ50 irfQ2.5 irfQ97.5
#>    <chr>                    <int>  <dbl>   <dbl>    <dbl>
#>  1 Process_1 -> Process_1       1 0.350   0.264     0.441
#>  2 Process_1 -> Process_1       2 0.297   0.227     0.374
#>  3 Process_1 -> Process_1       3 0.251   0.190     0.323
#>  4 Process_1 -> Process_1       4 0.214   0.155     0.283
#>  5 Process_1 -> Process_1       5 0.182   0.125     0.253
#>  6 Process_1 -> Process_1       6 0.155   0.0966    0.227
#>  7 Process_1 -> Process_1       7 0.132   0.0744    0.205
#>  8 Process_1 -> Process_1       8 0.113   0.0563    0.185
#>  9 Process_1 -> Process_1       9 0.0967  0.0419    0.167
#> 10 Process_1 -> Process_1      10 0.0833  0.0307    0.152
#> # ℹ 290 more rows

But it is easier to understand these responses using plots. For example, we can plot the expected responses of the remaining series to a positive shock for series 3 (Greens) using the plot() function:

plot(irfs, series = 3)

This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the \(A\) matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models. You can also use these IRFs to calculate a relative contribution from each shock to the forecast error variance for a focal series. This method, known as a Forecast Error Variance Decomposition (FEVD), is useful to get an idea about the amount of information that each series contributes to the evolution of all other series in a Vector Autoregression:

fevds <- fevd(varcor_mod, h = 12)
plot(fevds)

The plot above shows the median contribution to forecast error variance for each series.

Comparing forecast scores

But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set:

# create forecast objects for each model
fcvar <- forecast(var_mod)
fcvarcor <- forecast(varcor_mod)

# plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = "variogram")$all_series$score -
  score(fcvar, score = "variogram")$all_series$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1])
)
abline(h = 0, lty = "dashed")

And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated:

# plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better
diff_scores <- score(fcvarcor, score = "energy")$all_series$score -
  score(fcvar, score = "energy")$all_series$score
plot(diff_scores,
  pch = 16, cex = 1.25, col = "darkred",
  ylim = c(
    -1 * max(abs(diff_scores), na.rm = TRUE),
    max(abs(diff_scores), na.rm = TRUE)
  ),
  bty = "l",
  xlab = "Forecast horizon",
  ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1])
)
abline(h = 0, lty = "dashed")

The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. mvgam offers some utilities for doing this (i.e. see ?lfo_cv for guidance). Alternatively, we could use forecasts from both models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the ensemble() function in mvgam (see ?ensemble for guidance).

Using how_to_cite() for models with VAR dynamics will give you information on how they are restricted to remain stationary:

description <- how_to_cite(varcor_mod)
description
#> Methods text skeleton
#> We used the R package mvgam (version 1.1.52; Clark & Wells, 2023) to
#>   construct, fit and interrogate the model. mvgam fits Bayesian
#>   State-Space models that can include flexible predictor effects in both
#>   the process and observation components by incorporating functionalities
#>   from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,
#>   2023) packages. To encourage stability and prevent forecast variance
#>   from increasing indefinitely, we enforced stationarity of the Vector
#>   Autoregressive process following methods described by Heaps (2023) and
#>   Clark et al. (2025). The mvgam-constructed model and observed data were
#>   passed to the probabilistic programming environment Stan (version
#>   2.36.0; Carpenter et al. 2017, Stan Development Team 2026), specifically
#>   through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4
#>   Hamiltonian Monte Carlo chains for 1000 warmup iterations and 500
#>   sampling iterations for joint posterior estimation. Rank normalized
#>   split Rhat (Vehtari et al. 2021) and effective sample sizes were used to
#>   monitor convergence.
#> 
#> Primary references
#> Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models
#>   (DGAMs) for forecasting discrete ecological time series. Methods in
#>   Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974
#> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models
#>   Using Stan. Journal of Statistical Software, 80(1), 1-28.
#>   doi:10.18637/jss.v080.i01
#> Wood, SN (2017). Generalized Additive Models: An Introduction with R
#>   (2nd edition). Chapman and Hall/CRC.
#> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R
#>   Package splines2. Journal of Data Science, 19(3), 498-517.
#>   doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.
#> Heaps, SE (2023). Enforcing stationarity through the prior in vector
#>   autoregressions. Journal of Computational and Graphical Statistics 32,
#>   74-83.
#> Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM,
#>   Karunarathna KANK (2025). Beyond single-species models: leveraging
#>   multispecies forecasts to navigate the dynamics of ecological
#>   predictability. PeerJ 13:e18929.
#> Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M,
#>   Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic
#>   programming language. Journal of Statistical Software 76.
#> Gabry J, Cesnovar R, Johnson A, and Bronder S (2026). cmdstanr: R
#>   Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,
#>   https://discourse.mc-stan.org.
#> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).
#>   Rank-normalization, folding, and localization: An improved Rhat for
#>   assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)
#>   667-718. https://doi.org/10.1214/20-BA1221.
#> 
#> Other useful references
#> Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret
#>   statistical models using marginaleffects for R and Python. Journal of
#>   Statistical Software, 111(9), 1-32.
#>   https://doi.org/10.18637/jss.v111.i09
#> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).
#>   Visualization in Bayesian workflow. Journal of the Royal Statatistical
#>   Society A, 182, 389-402. doi:10.1111/rssa.12378.
#> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model
#>   evaluation using leave-one-out cross-validation and WAIC. Statistics and
#>   Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.
#> Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out
#>   cross-validation for Bayesian time series models. Journal of Statistical
#>   Computation and Simulation, 90(14), 2499-2523.
#>   https://doi.org/10.1080/00949655.2020.1783262

More advanced hierarchical panel VAR models can also be handled by using the gr and subgr arguments in VAR(). These models are useful if you have a data for the same set of series (subgr) that are measured in different regions (gr), such as species measured in different sampling regions or financial series measured in different countries.

Further reading

The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice:

Auger‐Méthé, Marie, et al. A guide to state–space modeling of ecological time series. Ecological Monographs 91.4 (2021): e01470.

Clark, Nicholas J., et al. Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. (2025): 13:e18929

Heaps, Sarah E. Enforcing stationarity through the prior in vector autoregressions. Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.

Hannaford, Naomi E., et al. A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant. Computational Statistics & Data Analysis 179 (2023): 107659.

Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. MARSS: multivariate autoregressive state-space models for analyzing time-series data. R Journal. 4.1 (2012): 11.

Karunarathna, K.A.N.K., et al. Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models. Ecological Modelling (2024): 490, 110648.

Ward, Eric J., et al. Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico. Journal of Applied Ecology 47.1 (2010): 47-56.

Interested in contributing?

I’m actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of mvgam. Please see this small list of opportunities on my website and do reach out if you are interested (n.clark’at’uq.edu.au)

================================================ FILE: man/GP.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_trend_types.R \name{GP} \alias{GP} \title{Specify dynamic Gaussian process trends in \pkg{mvgam} models} \usage{ GP(...) } \arguments{ \item{...}{unused} } \value{ An object of class \code{mvgam_trend}, which contains a list of arguments to be interpreted by the parsing functions in \pkg{mvgam}. } \description{ Set up low-rank approximate Gaussian Process trend models using Hilbert basis expansions in \pkg{mvgam}. This function does not evaluate its arguments – it exists purely to help set up a model with particular GP trend models. } \details{ A GP trend is estimated for each series using Hilbert space approximate Gaussian Processes. In \code{mvgam}, latent squared exponential GP trends are approximated using by default \code{20} basis functions and using a multiplicative factor of \code{c = 5/4}, which saves computational costs compared to fitting full GPs while adequately estimating GP \code{alpha} and \code{rho} parameters. } \references{ Riutort-Mayol G, Burkner PC, Andersen MR, Solin A and Vehtari A (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing 33, 1. https://doi.org/10.1007/s11222-022-10167-2 } \seealso{ \code{\link[brms]{gp}} } \author{ Nicholas J Clark } ================================================ FILE: man/RW.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_trend_types.R \name{RW} \alias{RW} \alias{AR} \alias{CAR} \alias{VAR} \title{Specify autoregressive dynamic processes in \pkg{mvgam}} \usage{ RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) CAR(p = 1) VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) } \arguments{ \item{ma}{\code{Logical}. Include moving average terms of order \code{1}? Default is \code{FALSE}.} \item{cor}{\code{Logical}. Include correlated process errors as part of a multivariate normal process model? If \code{TRUE} and if \code{n_series > 1} in the supplied data, a fully structured covariance matrix will be estimated for the process errors. Default is \code{FALSE}.} \item{gr}{An optional grouping variable, which must be a \code{factor} in the supplied \code{data}, for setting up hierarchical residual correlation structures. If specified, this will automatically set \code{cor = TRUE} and set up a model where the residual correlations for a specific level of \code{gr} are modelled hierarchically: \eqn{\Omega_{group} = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local}}, where \eqn{\Omega_{global}} is a \emph{global} correlation matrix, \eqn{\Omega_{group, local}} is a \emph{local deviation} correlation matrix and \eqn{\alpha_{cor}} is a weighting parameter controlling how strongly the local correlation matrix \eqn{\Omega_{group}} is shrunk towards the global correlation matrix \eqn{\Omega_{global}} (larger values of \eqn{\alpha_{cor}} indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). When used within a \code{VAR()} model, this essentially sets up a hierarchical panel vector autoregression where both the autoregressive and correlation matrices are learned hierarchically. If \code{gr} is supplied then \code{subgr} \emph{must} also be supplied.} \item{subgr}{A subgrouping \code{factor} variable specifying which element in \code{data} represents the different time series. Defaults to \code{series}, but note that models that use the hierarchical correlations, where the \code{subgr} time series are measured in each level of \code{gr}, \emph{should not} include a \code{series} element in \code{data}. Rather, this element will be created internally based on the supplied variables for \code{gr} and \code{subgr}. For example, if you are modelling temporal counts for a group of species (labelled as \code{species} in \code{data}) across three different geographical regions (labelled as \code{region}), and you would like the residuals to be correlated within regions, then you should specify \code{gr = region} and \code{subgr = species}. Internally, \code{mvgam()} will create the \code{series} element for the data using: \code{series = interaction(group, subgroup, drop = TRUE)}} \item{p}{A non-negative integer specifying the autoregressive (AR) order. Default is \code{1}. Cannot currently be larger than \code{3} for \code{AR} terms, and cannot be anything other than \code{1} for continuous time AR (\code{CAR}) terms.} } \value{ An object of class \code{mvgam_trend}, which contains a list of arguments to be interpreted by the parsing functions in \pkg{mvgam}. } \description{ Set up autoregressive or autoregressive moving average trend models in \pkg{mvgam}. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models. } \details{ Use \code{vignette("mvgam_overview")} to see the full details of available stochastic trend types in \pkg{mvgam}, or view the rendered version on the package website at: https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html } \examples{ \dontrun{ # A short example to illustrate CAR(1) models # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 125, phi = 0.5, sigma = 2, sigma_obs = 0.75) { # Sample irregularly spaced time intervals time_dis <- c(1, runif(n - 1, 0, 5)) # Set up the latent dynamic process x <- vector(length = n); x[1] <- -0.3 for (i in 2:n) { # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if (time_dis[i] == 0) { x[i] <- rnorm( 1, mean = (phi^1e-3) * x[i - 1], sd = sigma * (1 - phi^(2 * 1e-3)) / (1 - phi^2) ) } else { x[i] <- rnorm( 1, mean = (phi^time_dis[i]) * x[i - 1], sd = sigma * (1 - phi^(2 * time_dis[i])) / (1 - phi^2) ) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1:n) / 12) cov2 <- cos(2 * pi * (1:n) / 12) beta1 <- runif(1, 0.3, 0.7) beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame( y = rnorm(n, mean = x + seasonality, sd = sigma_obs), season = rep(1:12, 20)[1:n], time = cumsum(time_dis) ) } # Sample two time series dat <- rbind( dplyr::bind_cols( sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1') ), dplyr::bind_cols( sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2') ) ) \%>\% dplyr::mutate(series = as.factor(series)) # mvgam with CAR(1) trends and series-level seasonal smooths mod <- mvgam( formula = y ~ -1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), priors = c( prior(exponential(3), class = sigma), prior(beta(4, 4), class = sigma_obs) ), data = dat, family = gaussian(), chains = 2, silent = 2 ) # View usual summaries and plots summary(mod) conditional_effects(mod, type = 'expected') plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) mcmc_plot( mod, variable = 'ar1', regex = TRUE, type = 'hist' ) # Now an example illustrating hierarchical dynamics set.seed(123) # Simulate three species monitored in three different regions simdat1 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat2 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat3 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) # Set up the data but DO NOT include 'series' all_dat <- rbind( simdat1$data_train \%>\% dplyr::mutate(region = 'qld'), simdat2$data_train \%>\% dplyr::mutate(region = 'nsw'), simdat3$data_train \%>\% dplyr::mutate(region = 'vic') ) \%>\% dplyr::mutate( species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region) ) \%>\% dplyr::arrange(series, time) \%>\% dplyr::select(-series) # Check priors for a hierarchical AR1 model get_mvgam_priors( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat ) # Fit the model mod <- mvgam( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat, chains = 2, silent = 2 ) # Check standard outputs summary(mod) # Inspect posterior estimates for the correlation weighting parameter mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') } } \author{ Nicholas J Clark } ================================================ FILE: man/ZMVN.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_trend_types.R \name{ZMVN} \alias{ZMVN} \title{Specify correlated residual processes in \pkg{mvgam}} \usage{ ZMVN(unit = time, gr = NA, subgr = series) } \arguments{ \item{unit}{The unquoted name of the variable that represents the unit of analysis in \code{data} over which latent residuals should be correlated. This variable should be either a \code{numeric} or \code{integer} variable in the supplied \code{data}. Defaults to \code{time} to be consistent with other functionalities in \pkg{mvgam}, though note that the data need not be time series in this case. See examples below for further details and explanations} \item{gr}{An optional grouping variable, which must be a \code{factor} in the supplied \code{data}, for setting up hierarchical residual correlation structures. If specified, this will automatically set up a model where the residual correlations for a specific level of \code{gr} are modelled hierarchically: \eqn{\Omega_{group} = p\Omega_{global} + (1 - p)\Omega_{group, local}}, where \eqn{\Omega_{global}} is a \emph{global} correlation matrix, \eqn{\Omega_{group, local}} is a \emph{local deviation} correlation matrix, and \eqn{p} is a weighting parameter controlling how strongly the local correlation matrix \eqn{\Omega_{group}} is shrunk towards the global correlation matrix \eqn{\Omega_{global}}. If \code{gr} is supplied then \code{subgr} \emph{must} also be supplied} \item{subgr}{A subgrouping \code{factor} variable specifying which element in \code{data} represents the different observational units. Defaults to \code{series} to be consistent with other functionalities in \pkg{mvgam}, though note that the data need not be time series in this case Models that use the hierarchical correlations (by supplying a value for \code{gr}) \emph{should not} include a \code{series} element in \code{data}. Rather, this element will be created internally based on the supplied variables for \code{gr} and \code{subgr} For example, if you are modelling counts for a group of species (labelled as \code{species} in the data) across sampling sites (labelled as \code{site} in the data) in three different geographical regions (labelled as \code{region}), and you would like the residuals to be correlated within regions, then you should specify \code{unit = site}, \code{gr = region}, and \code{subgr = species} Internally, \code{mvgam()} will appropriately order the data by \code{unit} (in this case, by \code{site}) and create the \code{series} element for the data using something like: \code{series = as.factor(paste0(group, '_', subgroup))}} } \value{ An object of class \code{mvgam_trend}, which contains a list of arguments to be interpreted by the parsing functions in \pkg{mvgam} } \description{ Set up latent correlated multivariate Gaussian residual processes in \pkg{mvgam}. This function does not evaluate its arguments – it exists purely to help set up a model with particular error processes } \examples{ \dontrun{ # Simulate counts of four species over ten sampling locations site_dat <- data.frame( site = rep(1:10, 4), species = as.factor(sort(rep(letters[1:4], 10))), y = c(NA, rpois(39, 3)) ) head(site_dat) # Set up a correlated residual (i.e. Joint Species Distribution) model trend_model <- ZMVN(unit = site, subgr = species) mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') # A hierarchical correlation example Sigma <- matrix( c(1, -0.4, 0.5, -0.4, 1, 0.3, 0.5, 0.3, 1), byrow = TRUE, nrow = 3 ) make_site_dat <- function(...) { errors <- mgcv::rmvn( n = 30, mu = c(0.6, 0.8, 1.8), V = Sigma ) site_dat <- do.call(rbind, lapply(1:3, function(spec) { data.frame( y = rpois(30, lambda = exp(errors[, spec])), species = paste0('species', spec), site = 1:30 ) })) site_dat } site_dat <- rbind( make_site_dat() \%>\% dplyr::mutate(group = 'group1'), make_site_dat() \%>\% dplyr::mutate(group = 'group2') ) \%>\% dplyr::mutate( species = as.factor(species), group = as.factor(group) ) # Fit the hierarchical correlated residual model mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, gr = group, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') } } ================================================ FILE: man/add_residuals.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/add_residuals.R \name{add_residuals.mvgam} \alias{add_residuals.mvgam} \alias{add_residuals} \title{Calculate randomized quantile residuals for \pkg{mvgam} objects} \usage{ add_residuals(object, ...) \method{add_residuals}{mvgam}(object, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{...}{unused} } \value{ A list object of class \code{mvgam} with residuals included in the \code{'resids'} slot } \description{ Calculate randomized quantile residuals for \pkg{mvgam} objects } \details{ For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution } ================================================ FILE: man/all_neon_tick_data.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/all_neon_tick_data.R \docType{data} \name{all_neon_tick_data} \alias{all_neon_tick_data} \title{NEON Amblyomma and Ixodes tick abundance survey data} \format{ A tibble/dataframe containing covariate information alongside the main fields of: \describe{ \item{Year}{Year of sampling} \item{epiWeek}{Epidemiological week of sampling} \item{plot_ID}{NEON plot ID for survey location} \item{siteID}{NEON site ID for survey location} \item{amblyomma_americanum}{Counts of A. americanum nymphs} \item{ixodes_scapularis}{Counts of I. scapularis nymphs} } } \source{ \url{https://www.neonscience.org/data} } \usage{ all_neon_tick_data } \description{ A dataset containing timeseries of Amblyomma americanum and Ixodes scapularis nymph abundances at NEON sites } \keyword{datasets} ================================================ FILE: man/augment.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/tidier_methods.R \name{augment.mvgam} \alias{augment.mvgam} \title{Augment an \code{mvgam} object's data} \usage{ \method{augment}{mvgam}(x, robust = FALSE, probs = c(0.025, 0.975), ...) } \arguments{ \item{x}{An object of class \code{mvgam}.} \item{robust}{If \code{FALSE} (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If \code{TRUE}, the median and the median absolute deviation (MAD) are applied instead.} \item{probs}{The percentiles to be computed by the quantile function.} \item{...}{Unused, included for generic consistency only.} } \value{ A \code{list} or \code{tibble} (see details) combining: \itemize{ \item The data supplied to \code{mvgam()}. \item The outcome variable, named as \code{.observed}. \item The fitted backcasts, along with their variability and credible bounds. \item The residuals, along with their variability and credible bounds. } } \description{ Add fits and residuals to the data, implementing the generic \code{augment} from the package \pkg{broom}. } \details{ A \code{list} is returned if \code{class(x$obs_data) == 'list'}, otherwise a \code{tibble} is returned, but the contents of either object is the same. The arguments \code{robust} and \code{probs} are applied to both the fit and residuals calls (see \code{\link[=fitted.mvgam]{fitted.mvgam()}} and \code{\link[=residuals.mvgam]{residuals.mvgam()}} for details). } \examples{ \dontrun{ set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) mod1 <- mvgam( formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) } } \seealso{ \code{\link{residuals.mvgam}}, \code{\link{fitted.mvgam}} Other tidiers: \code{\link{tidy.mvgam}()} } \concept{tidiers} ================================================ FILE: man/code.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/stan_utils.R \name{code} \alias{code} \alias{stancode.mvgam_prefit} \alias{stancode.mvgam} \alias{standata.mvgam_prefit} \title{Stan code and data objects for \pkg{mvgam} models} \usage{ code(object) \method{stancode}{mvgam_prefit}(object, ...) \method{stancode}{mvgam}(object, ...) \method{standata}{mvgam_prefit}(object, ...) } \arguments{ \item{object}{An object of class \code{mvgam} or \code{mvgam_prefit}, returned from a call to \code{mvgam}} \item{...}{ignored} } \value{ Either a character string containing the fully commented \pkg{Stan} code to fit a \pkg{mvgam} model or a named list containing the data objects needed to fit the model in Stan. } \description{ Generate Stan code and data objects for \pkg{mvgam} models } \examples{ \dontrun{ simdat <- sim_mvgam() mod <- mvgam(y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE) # View Stan model code stancode(mod) # View Stan model data sdata <- standata(mod) str(sdata) } } ================================================ FILE: man/conditional_effects.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/conditional_effects.R \name{conditional_effects.mvgam} \alias{conditional_effects.mvgam} \alias{plot.mvgam_conditional_effects} \alias{print.mvgam_conditional_effects} \title{Display conditional effects of predictors for \pkg{mvgam} models} \usage{ \method{conditional_effects}{mvgam}( x, effects = NULL, type = "expected", points = FALSE, rug = FALSE, ... ) \method{plot}{mvgam_conditional_effects}(x, plot = TRUE, ask = FALSE, ...) \method{print}{mvgam_conditional_effects}(x, ...) } \arguments{ \item{x}{Object of class \code{mvgam}, \code{jsdgam} or \code{mvgam_conditional_effects}} \item{effects}{An optional character vector naming effects (main effects or interactions) for which to compute conditional plots. Interactions are specified by a \code{:} between variable names. If \code{NULL} (the default), plots are generated for all main effects and two-way interactions estimated in the model. When specifying \code{effects} manually, \emph{all} two-way interactions (including grouping variables) may be plotted even if not originally modeled.} \item{type}{\code{character} specifying the scale of predictions. When this has the value \code{link} the linear predictor is calculated on the link scale. If \code{expected} is used (the default), predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When \code{response} is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. Two special cases are also allowed: type \code{latent_N} will return the estimated latent abundances from an N-mixture distribution, while type \code{detection} will return the estimated detection probability from an N-mixture distribution.} \item{points}{\code{Logical}. Indicates if the original data points should be added, but only if \code{type == 'response'}. Default is \code{TRUE}.} \item{rug}{\code{Logical}. Indicates if displays tick marks should be plotted on the axes to mark the distribution of raw data, but only if \code{type == 'response'}. Default is \code{TRUE}.} \item{...}{other arguments to pass to \code{\link[marginaleffects]{plot_predictions}}} \item{plot}{Logical; indicates if plots should be plotted directly in the active graphic device. Defaults to \code{TRUE}.} \item{ask}{\code{Logical}. Indicates if the user is prompted before a new page is plotted. Only used if plot is \code{TRUE}. Default is \code{FALSE}.} } \value{ \code{conditional_effects} returns an object of class \code{mvgam_conditional_effects} which is a named list with one slot per effect containing a \code{\link[ggplot2]{ggplot}} object, which can be further customized using the \pkg{ggplot2} package. The corresponding \code{plot} method will draw these plots in the active graphic device. } \description{ Display conditional effects of one or more numeric and/or categorical predictors in models of class \code{mvgam} and \code{jsdgam}, including two-way interaction effects. } \details{ This function acts as a wrapper to the more flexible \code{\link[marginaleffects]{plot_predictions}}. When creating \code{conditional_effects} for a particular predictor (or interaction of two predictors), one has to choose the values of all other predictors to condition on. By default, the mean is used for continuous variables and the reference category is used for factors. Use \code{\link[marginaleffects]{plot_predictions}} to change these and create more bespoke conditional effects plots. } \examples{ \dontrun{ # Simulate some data simdat <- sim_mvgam( family = poisson(), seasonality = 'hierarchical' ) # Fit a model mod <- mvgam( y ~ s(season, by = series, k = 5) + year:series, family = poisson(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot all main effects on the response scale conditional_effects(mod) # Change the prediction interval to 70\% using plot_predictions() argument # 'conf_level' conditional_effects(mod, conf_level = 0.7) # Plot all main effects on the link scale conditional_effects(mod, type = 'link') # Works the same for smooth terms, including smooth interactions set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), data = dat, family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') # ggplot objects can be modified and combined with the help of many # additional packages. Here is an example using the patchwork package # Simulate some nonlinear data dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Extract the list of ggplot conditional_effect plots m <- plot(conditional_effects(mod), plot = FALSE) # Add custom labels and arrange plots together using patchwork::wrap_plots() library(patchwork) library(ggplot2) wrap_plots( m[[1]] + labs(title = 's(x1, bs = "moi")'), m[[2]] + labs(title = 'te(x0, x2)') ) } } \seealso{ \code{\link[marginaleffects]{plot_predictions}}, \code{\link[marginaleffects]{plot_slopes}} } \author{ Nicholas J Clark } ================================================ FILE: man/dynamic.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/dynamic.R \name{dynamic} \alias{dynamic} \title{Defining dynamic coefficients in \pkg{mvgam} formulae} \usage{ dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE) } \arguments{ \item{variable}{The variable that the dynamic smooth will be a function of} \item{k}{Optional number of basis functions for computing approximate GPs. If missing, \code{k} will be set as large as possible to accurately estimate the nonlinear function.} \item{rho}{Either a positive numeric stating the length scale to be used for approximating the squared exponential Gaussian Process smooth (see \code{\link[mgcv]{gp.smooth}} for details) or missing, in which case the length scale will be estimated by setting up a Hilbert space approximate GP.} \item{stationary}{Logical. If \code{TRUE} (the default) and \code{rho} is supplied, the latent Gaussian Process smooth will not have a linear trend component. If \code{FALSE}, a linear trend in the covariate is added to the Gaussian Process smooth. Leave at \code{TRUE} if you do not believe the coefficient is evolving with much trend, as the linear component of the basis functions can be hard to penalize to zero. This sometimes causes divergence issues in \code{Stan}. See \code{\link[mgcv]{gp.smooth}} for details. Ignored if \code{rho} is missing (in which case a Hilbert space approximate GP is used).} \item{scale}{Logical; If \code{TRUE} (the default) and \code{rho} is missing, predictors are scaled so that the maximum Euclidean distance between two points is \code{1}. This often improves sampling speed and convergence. Scaling also affects the estimated length-scale parameters in that they resemble those of scaled predictors (not of the original predictors) if scale is \code{TRUE}.} } \value{ a \code{list} object for internal usage in 'mvgam' } \description{ Set up time-varying (dynamic) coefficients for use in \pkg{mvgam} models. Currently, only low-rank Gaussian Process smooths are available for estimating the dynamics of the time-varying coefficient. } \details{ \code{mvgam} currently sets up dynamic coefficients as low-rank squared exponential Gaussian Process smooths via the call \code{s(time, by = variable, bs = "gp", m = c(2, rho, 2))}. These smooths, if specified with reasonable values for the length scale parameter, will give more realistic out of sample forecasts than standard splines such as thin plate or cubic. But the user must set the value for \code{rho}, as there is currently no support for estimating this value in \code{mgcv}. This may not be too big of a problem, as estimating latent length scales is often difficult anyway. The \code{rho} parameter should be thought of as a prior on the smoothness of the latent dynamic coefficient function (where higher values of \code{rho} lead to smoother functions with more temporal covariance structure). Values of \code{k} are set automatically to ensure enough basis functions are used to approximate the expected wiggliness of the underlying dynamic function (\code{k} will increase as \code{rho} decreases). } \examples{ \dontrun{ # Simulate a time-varying coefficient # (as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 # A function to simulate from a squared exponential Gaussian Process sim_gp <- function(N, c, alpha, rho) { Sigma <- alpha ^ 2 * exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + diag(1e-9, N) c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) } beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) plot( beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred' ) # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot( out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred' ) # Gather into a data.frame and fit a dynamic coefficient model data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190, ] data_test <- data[191:200, ] # Fit a model using the dynamic function mod <- mvgam( out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train, chains = 2, silent = 2 ) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2) } } \author{ Nicholas J Clark } ================================================ FILE: man/ensemble.mvgam_forecast.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ensemble.R \name{ensemble.mvgam_forecast} \alias{ensemble.mvgam_forecast} \alias{ensemble} \title{Combine forecasts from \pkg{mvgam} models into evenly weighted ensembles} \usage{ ensemble(object, ...) \method{ensemble}{mvgam_forecast}(object, ..., ndraws = 5000) } \arguments{ \item{object}{\code{list} object of class \code{mvgam_forecast}. See \code{\link[=forecast.mvgam]{forecast.mvgam()}}} \item{...}{More \code{mvgam_forecast} objects.} \item{ndraws}{Positive integer specifying the number of draws to use from each forecast distribution for creating the ensemble. If some of the ensemble members have fewer draws than \code{ndraws}, their forecast distributions will be resampled with replacement to achieve the correct number of draws} } \value{ An object of class \code{mvgam_forecast} containing the ensemble predictions. This object can be readily used with the supplied S3 functions \code{plot} and \code{score}. } \description{ Generate evenly weighted ensemble forecast distributions from \code{mvgam_forecast} objects. } \details{ It is widely recognised in the forecasting literature that combining forecasts from different models often results in improved forecast accuracy. The simplest way to create an ensemble is to use evenly weighted combinations of forecasts from the different models. This is straightforward to do in a Bayesian setting with \pkg{mvgam} as the posterior MCMC draws contained in each \code{mvgam_forecast} object will already implicitly capture correlations among the temporal posterior predictions. } \examples{ \dontrun{ # Simulate some series and fit a few competing dynamic models set.seed(1) simdat <- sim_mvgam( n_series = 1, prop_trend = 0.6, mu = 1 ) plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test ) m1 <- mvgam( y ~ 1, trend_formula = ~ time + s(season, bs = 'cc', k = 9), trend_model = AR(p = 1), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) m2 <- mvgam( y ~ time, trend_model = RW(), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) # Calculate forecast distributions for each model fc1 <- forecast(m1) fc2 <- forecast(m2) # Generate the ensemble forecast ensemble_fc <- ensemble(fc1, fc2) # Plot forecasts plot(fc1) plot(fc2) plot(ensemble_fc) # Score forecasts score(fc1) score(fc2) score(ensemble_fc) } } \seealso{ \code{\link{plot.mvgam_forecast}}, \code{\link{score.mvgam_forecast}} } \author{ Nicholas J Clark } ================================================ FILE: man/evaluate_mvgams.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/evaluate_mvgams.R \name{evaluate_mvgams} \alias{evaluate_mvgams} \alias{eval_mvgam} \alias{roll_eval_mvgam} \alias{compare_mvgams} \title{Evaluate forecasts from fitted \pkg{mvgam} objects} \usage{ eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = "drps", log = FALSE, weights ) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}} \item{n_samples}{\code{integer} specifying the number of samples to generate from the model's posterior distribution} \item{eval_timepoint}{\code{integer} indexing the timepoint that represents our last 'observed' set of outcome data} \item{fc_horizon}{\code{integer} specifying the length of the forecast horizon for evaluating forecasts} \item{n_cores}{Deprecated. Parallel processing is no longer supported} \item{score}{\code{character} specifying the type of ranked probability score to use for evaluation. Options are: \code{variogram}, \code{drps} or \code{crps}} \item{log}{\code{logical}. Should the forecasts and truths be logged prior to scoring? This is often appropriate for comparing performance of models when series vary in their observation ranges} \item{weights}{optional \code{vector} of weights (where \code{length(weights) == n_series}) for weighting pairwise correlations when evaluating the variogram score for multivariate forecasts. Useful for down-weighting series that have larger magnitude observations or that are of less interest when forecasting. Ignored if \code{score != 'variogram'}} \item{n_evaluations}{\code{integer} specifying the total number of evaluations to perform} \item{evaluation_seq}{Optional \code{integer sequence} specifying the exact set of timepoints for evaluating the model's forecasts. This sequence cannot have values \code{<3} or \code{> max(training timepoints) - fc_horizon}} \item{model1}{\code{list} object returned from \code{mvgam} representing the first model to be evaluated} \item{model2}{\code{list} object returned from \code{mvgam} representing the second model to be evaluated} } \value{ For \code{eval_mvgam}, a \code{list} object containing information on specific evaluations for each series (if using \code{drps} or \code{crps} as the score) or a vector of scores when using \code{variogram}. For \code{roll_eval_mvgam}, a \code{list} object containing information on specific evaluations for each series as well as a total evaluation summary (taken by summing the forecast score for each series at each evaluation and averaging the coverages at each evaluation) For \code{compare_mvgams}, a series of plots comparing forecast Rank Probability Scores for each competing model. A lower score is preferred. Note however that it is possible to select a model that ultimately would perform poorly in true out-of-sample forecasting. For example if a wiggly smooth function of 'year' is included in the model then this function will be learned prior to evaluating rolling window forecasts, and the model could generate very tight predictions as a result. But when forecasting ahead to timepoints that the model has not seen (i.e. next year), the smooth function will end up extrapolating, sometimes in very strange and unexpected ways. It is therefore recommended to only use smooth functions for covariates that are adequately measured in the data (i.e. 'seasonality', for example) to reduce possible extrapolation of smooths and let the latent trends in the \code{mvgam} model capture any temporal dependencies in the data. These trends are time series models and so will provide much more stable forecasts } \description{ Evaluate forecasts from fitted \pkg{mvgam} objects } \details{ \code{eval_mvgam} may be useful when both repeated fitting of a model using \code{\link{update.mvgam}} for exact leave-future-out cross-validation and approximate leave-future-out cross-validation using \code{\link{lfo_cv}} are impractical. The function generates a set of samples representing fixed parameters estimated from the full \code{mvgam} model and latent trend states at a given point in time. The trends are rolled forward a total of \code{fc_horizon} timesteps according to their estimated state space dynamics to generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window. This function therefore simulates a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts from the latent trends that have been observed up to that timepoint. Evaluation involves calculating an appropriate Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast's 90\% prediction interval \code{roll_eval_mvgam} sets up a sequence of evaluation timepoints along a rolling window and iteratively calls \code{eval_mvgam} to evaluate 'out-of-sample' forecasts. Evaluation involves calculating the Rank Probability Scores and a binary indicator for whether or not the true value lies within the forecast's 90\% prediction interval \code{compare_mvgams} automates the evaluation to compare two fitted models using rolling window forecast evaluation and provides a series of summary plots to facilitate model selection. It is essentially a wrapper for \code{roll_eval_mvgam} } \examples{ \dontrun{ # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(1) dat <- sim_mvgam( T = 75, n_series = 1, prop_trend = 0.75, trend_model = AR(p = 2), family = poisson() ) # Fit an appropriate model mod_ar2 <- mvgam( formula = y ~ s(season, bs = 'cc'), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Fit a less appropriate model mod_rw <- mvgam( formula = y ~ 1, trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score( object = fc_ar2, score = 'drps' ) score_rw <- score( object = fc_rw, score = 'drps' ) sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams( model1 = mod_ar2, model2 = mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5 ) # A more appropriate comparison would be to use approximate # leave-future-out CV to compare forecasts (see ?mvgam::lfo_cv()) } } \seealso{ \code{\link{forecast}}, \code{\link{score}}, \code{\link{lfo_cv}} } ================================================ FILE: man/fevd.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/fevd.mvgam.R \name{fevd.mvgam} \alias{fevd.mvgam} \alias{fevd} \title{Calculate latent VAR forecast error variance decompositions} \usage{ fevd(object, ...) \method{fevd}{mvgam}(object, h = 10, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} resulting from a call to \code{\link[=mvgam]{mvgam()}} that used a Vector Autoregressive latent process model (either as \code{VAR(cor = FALSE)} or \code{VAR(cor = TRUE)}; see \code{\link[=VAR]{VAR()}} for details)} \item{...}{ignored} \item{h}{Positive \code{integer} specifying the forecast horizon over which to calculate the IRF} } \value{ See \code{\link{mvgam_fevd-class}} for a full description of the quantities that are computed and returned by this function, along with key references. } \description{ Compute forecast error variance decompositions from \code{mvgam} models with Vector Autoregressive dynamics } \examples{ \dontrun{ # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = "all") # Fit a model that uses a latent VAR(1) mod <- mvgam( formula = y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # View a summary of the error variance decompositions summary(fevds) } } \references{ Lütkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. 2nd ed. Springer-Verlag Berlin Heidelberg. } \seealso{ \code{\link[=VAR]{VAR()}}, \code{\link[=irf]{irf()}}, \code{\link[=stability]{stability()}}, \code{\link{mvgam_fevd-class}} } \author{ Nicholas J Clark } ================================================ FILE: man/fitted.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/posterior_epred.mvgam.R \name{fitted.mvgam} \alias{fitted.mvgam} \title{Expected values of the posterior predictive distribution for \pkg{mvgam} objects} \usage{ \method{fitted}{mvgam}( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) } \arguments{ \item{object}{An object of class \code{mvgam}} \item{process_error}{Logical. If \code{TRUE} and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from a stationary, zero-centred multivariate Normal distribution using any estimated process variance-covariance parameters. If \code{FALSE}, uncertainty in the latent trend component is ignored when calculating predictions} \item{scale}{Either \code{"response"} or \code{"linear"}. If \code{"response"}, results are returned on the scale of the response variable. If \code{"linear"}, results are returned on the scale of the linear predictor term, that is without applying the inverse link function or other transformations.} \item{summary}{Should summary statistics be returned instead of the raw values? Default is \code{TRUE}..} \item{robust}{If \code{FALSE} (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If \code{TRUE}, the median and the median absolute deviation (MAD) are applied instead. Only used if \code{summary} is \code{TRUE}.} \item{probs}{The percentiles to be computed by the \code{quantile} function. Only used if \code{summary} is \code{TRUE}.} \item{...}{Further arguments passed to \code{\link[brms]{prepare_predictions}} that control several aspects of data validation and prediction.} } \value{ An \code{array} of predicted \emph{mean} response values. If \code{summary = FALSE} the output resembles those of \code{\link{posterior_epred.mvgam}} and \code{\link{predict.mvgam}}. If \code{summary = TRUE} the output is an \code{n_observations} x \code{E} matrix. The number of summary statistics \code{E} is equal to \code{2 + length(probs)}: The \code{Estimate} column contains point estimates (either mean or median depending on argument \code{robust}), while the \code{Est.Error} column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument \code{robust}). The remaining columns starting with \code{Q} contain quantile estimates as specified via argument \code{probs}. } \description{ This method extracts posterior estimates of the fitted values (i.e. the actual predictions, including estimates for any trend states, that were obtained when fitting the model). It also includes an option for obtaining summaries of the computed draws. } \details{ This method gives the actual fitted values from the model (i.e. what you will see if you generate hindcasts from the fitted model using \code{\link{hindcast.mvgam}} with \code{type = 'expected'}). These predictions can be overly precise if a flexible dynamic trend component was included in the model. This is in contrast to the set of predict functions (i.e. \code{\link{posterior_epred.mvgam}} or \code{\link{predict.mvgam}}), which will assume any dynamic trend component has reached stationarity when returning hypothetical predictions. } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) } } \seealso{ \code{\link{hindcast.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/forecast.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/forecast.mvgam.R \name{forecast.mvgam} \alias{forecast.mvgam} \title{Extract or compute hindcasts and forecasts for a fitted \code{mvgam} object} \usage{ \method{forecast}{mvgam}(object, newdata, data_test, n_cores = 1, type = "response", ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables that were included in the original \code{data} used to fit the model. If included, the covariate information in \code{newdata} will be used to generate forecasts from the fitted model equations. If this same \code{newdata} was originally included in the call to \code{mvgam}, then forecasts have already been produced by the generative model and these will simply be extracted and plotted. However if no \code{newdata} was supplied to the original model call, an assumption is made that the \code{newdata} supplied here comes sequentially after the data supplied in the original model (i.e. we assume there is no time gap between the last observation of series 1 in the original data and the first observation for series 1 in \code{newdata})} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{n_cores}{Deprecated. Parallel processing is no longer supported} \item{type}{When this has the value \code{link} (default) the linear predictor is calculated on the link scale. If \code{expected} is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When \code{response} is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When \code{variance} is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When \code{type = "terms"}, each component of the linear predictor is returned separately in the form of a \code{list} (possibly with standard errors, if \code{summary = TRUE}): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type \code{latent_N} will return the estimated latent abundances from an N-mixture distribution, while type \code{detection} will return the estimated detection probability from an N-mixture distribution} \item{...}{Ignored} } \value{ An object of class \code{mvgam_forecast} containing hindcast and forecast distributions. See \code{\link{mvgam_forecast-class}} for details. } \description{ Extract or compute hindcasts and forecasts for a fitted \code{mvgam} object } \details{ Posterior predictions are drawn from the fitted \code{mvgam} and used to simulate a forecast distribution } \examples{ \dontrun{ # Simulate data with 3 series and AR trend model simdat <- sim_mvgam(n_series = 3, trend_model = AR()) # Fit mvgam model mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) # Use summary() to extract hindcasts / forecasts for custom plotting head(summary(hc), 12) # Or just use the plot() function for quick plots plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) } } \seealso{ \code{\link[=hindcast.mvgam]{hindcast.mvgam()}}, \code{\link[=plot.mvgam_forecast]{plot.mvgam_forecast()}}, \code{\link[=summary.mvgam_forecast]{summary.mvgam_forecast()}}, \code{\link[=score.mvgam_forecast]{score.mvgam_forecast()}} \code{\link[=ensemble.mvgam_forecast]{ensemble.mvgam_forecast()}} } ================================================ FILE: man/formula.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/formula.mvgam.R \name{formula.mvgam} \alias{formula.mvgam} \alias{formula.mvgam_prefit} \title{Extract formulae from \pkg{mvgam} objects} \usage{ \method{formula}{mvgam}(x, trend_effects = FALSE, ...) \method{formula}{mvgam_prefit}(x, trend_effects = FALSE, ...) } \arguments{ \item{x}{\code{mvgam}, \code{jsdgam} or \code{mvgam_prefit} object} \item{trend_effects}{\code{logical}, return the formula from the observation model (if \code{FALSE}) or from the underlying process model (if\code{TRUE})} \item{...}{Ignored} } \value{ A \code{formula} object } \description{ Extract formulae from \pkg{mvgam} objects } \author{ Nicholas J Clark } ================================================ FILE: man/get_mvgam_priors.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/get_mvgam_priors.R \name{get_mvgam_priors} \alias{get_mvgam_priors} \title{Extract information on default prior distributions for an \pkg{mvgam} model} \usage{ get_mvgam_priors( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = "None", family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... ) } \arguments{ \item{formula}{A \code{formula} object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, \code{s()}, \code{te()}, \code{ti()}, \code{t2()}, as well as time-varying \code{dynamic()} terms, nonparametric \code{gp()} terms and offsets using \code{offset()}, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In \code{nmix()} family models, the \code{formula} is used to set up a linear predictor for the detection probability. Details of the formula syntax used by \pkg{mvgam} can be found in \code{\link{mvgam_formulae}}} \item{trend_formula}{An optional \code{formula} object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. \strong{Important notes:} \itemize{ \item Should not have a response variable specified on the left-hand side (e.g., \code{~ season + s(year)}) \item Use \code{trend} instead of \code{series} for effects that vary across time series \item Only available for \code{RW()}, \code{AR()} and \code{VAR()} trend models \item In \code{nmix()} family models, sets up linear predictor for latent abundance \item Consider dropping one intercept using \code{- 1} convention to avoid estimation challenges }} \item{factor_formula}{Can be supplied instead \code{trend_formula} to match syntax from \link{jsdgam}} \item{knots}{An optional \code{list} containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied. Different terms can use different numbers of knots, unless they share a covariate.} \item{trend_knots}{As for \code{knots} above, this is an optional \code{list} of knot values for smooth functions within the \code{trend_formula}.} \item{trend_model}{\code{character} or \code{function} specifying the time series dynamics for the latent trend. \strong{Available options:} \itemize{ \item \code{None}: No latent trend component (GAM component only, like \code{\link[mgcv]{gam}}) \item \code{ZMVN} or \code{ZMVN()}: Zero-Mean Multivariate Normal (Stan only) \item \code{'RW'} or \code{RW()}: Random Walk \item \code{'AR1'}, \code{'AR2'}, \code{'AR3'} or \code{AR(p = 1, 2, 3)}: Autoregressive models \item \code{'CAR1'} or \code{CAR(p = 1)}: Continuous-time AR (Ornstein–Uhlenbeck process) \item \code{'VAR1'} or \code{VAR()}: Vector Autoregressive (Stan only) \item \code{'PWlogistic'}, \code{'PWlinear'} or \code{PW()}: Piecewise trends (Stan only) \item \code{'GP'} or \code{GP()}: Gaussian Process with squared exponential kernel (Stan only) } \strong{Additional features:} \itemize{ \item Moving average and/or correlated process error terms available for most types (e.g., \code{RW(cor = TRUE)} for multivariate Random Walk) \item Hierarchical correlations possible for structured data \item See \link{mvgam_trends} for details and \code{\link[=ZMVN]{ZMVN()}} for examples }} \item{family}{\code{family} specifying the exponential observation family for the series. \strong{Supported families:} \itemize{ \item \code{gaussian()}: Real-valued data \item \code{betar()}: Proportional data on \verb{(0,1)} \item \code{lognormal()}: Non-negative real-valued data \item \code{student_t()}: Real-valued data \item \code{Gamma()}: Non-negative real-valued data \item \code{bernoulli()}: Binary data \item \code{poisson()}: Count data (default) \item \code{nb()}: Overdispersed count data \item \code{binomial()}: Count data with imperfect detection when number of trials is known (use \code{cbind()} to bind observations and trials) \item \code{beta_binomial()}: As \code{binomial()} but allows for overdispersion \item \code{nmix()}: Count data with imperfect detection when number of trials is unknown (State-Space N-Mixture model with Poisson latent states and Binomial observations) } See \code{\link{mvgam_families}} for more details.} \item{data}{A \code{dataframe} or \code{list} containing the model response variable and covariates required by the GAM \code{formula} and optional \code{trend_formula}. \strong{Required columns for most models:} \itemize{ \item \code{series}: A \code{factor} index of the series IDs (number of levels should equal number of unique series labels) \item \code{time}: \code{numeric} or \code{integer} index of time points. For most dynamic trend types, time should be measured in discrete, regularly spaced intervals (i.e., \code{c(1, 2, 3, ...)}). Irregular spacing is allowed for \code{trend_model = CAR(1)}, but zero intervals are adjusted to \code{1e-12} to prevent sampling errors. } \strong{Special cases:} \itemize{ \item Models with hierarchical temporal correlation (e.g., \code{AR(gr = region, subgr = species)}) should NOT include a \code{series} identifier \item Models without temporal dynamics (\code{trend_model = 'None'} or \code{trend_model = ZMVN()}) don't require a \code{time} variable }} \item{unit}{The unquoted name of the variable that represents the unit of analysis in \code{data} over which latent residuals should be correlated. This variable should be either a \code{numeric} or \code{integer} variable in the supplied \code{data}. Defaults to \code{time} to be consistent with other functionalities in \pkg{mvgam}, though note that the data need not be time series in this case. See examples below for further details and explanations} \item{species}{The unquoted name of the \code{factor} variable that indexes the different response units in \code{data} (usually \code{'species'} in a JSDM). Defaults to \code{series} to be consistent with other \code{mvgam} models} \item{use_lv}{\code{logical}. If \code{TRUE}, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for \code{RW()}, \code{AR()} and \code{GP()} trend models. Default is \code{FALSE}. See \code{\link{lv_correlations}} for examples.} \item{n_lv}{\code{integer} specifying the number of latent dynamic factors to use if \code{use_lv == TRUE}. Cannot exceed \code{n_series}. Default is \code{min(2, floor(n_series / 2))}.} \item{trend_map}{Optional \code{data.frame} specifying which series should depend on which latent trends. Enables multiple series to depend on the same latent trend process with different observation processes. \strong{Required structure:} \itemize{ \item Column \code{series}: Single unique entry for each series (matching factor levels in data) \item Column \code{trend}: Integer values indicating which trend each series depends on } \strong{Notes:} \itemize{ \item Sets up latent factor model by enabling \code{use_lv = TRUE} \item Process model intercept is NOT automatically suppressed \item Not yet supported for continuous time models (\code{CAR()}) }} \item{...}{Not currently used} } \value{ either a \code{data.frame} containing the prior definitions (if any suitable priors can be altered by the user) or \code{NULL}, indicating that no priors in the model can be modified } \description{ This function lists the parameters that can have their prior distributions changed for a given model, as well listing their default distributions } \details{ Users can supply a model formula, prior to fitting the model, so that default priors can be inspected and altered. To make alterations, change the contents of the \code{prior} column and supplying this \code{data.frame} to the \code{\link{mvgam}} or \code{\link{jsdgam}} functions using the argument \code{priors}. If using \code{Stan} as the backend, users can also modify the parameter bounds by modifying the \code{new_lowerbound} and/or \code{new_upperbound} columns. This will be necessary if using restrictive distributions on some parameters, such as a Beta distribution for the trend sd parameters for example (Beta only has support on \code{(0,1)}), so the upperbound cannot be above \code{1}. Another option is to make use of the prior modification functions in \pkg{brms} (i.e. \code{\link[brms]{prior}}) to change prior distributions and bounds (just use the name of the parameter that you'd like to change as the \code{class} argument; see examples below) } \note{ Only the \code{prior}, \code{new_lowerbound} and/or \code{new_upperbound} columns of the output should be altered when defining the user-defined priors for the model. Use only if you are familiar with the underlying probabilistic programming language. There are no sanity checks done to ensure that the code is legal (i.e. to check that lower bounds are smaller than upper bounds, for example) } \examples{ \dontrun{ # ======================================================================== # Example 1: Simulate data and inspect default priors # ======================================================================== dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not # always necessary, but this can be useful for testing whether your # updated priors are written correctly) mod_default <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), run_model = FALSE ) # Inspect the model file with default mvgam priors stancode(mod_default) # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2) ) test_priors # ======================================================================== # Example 2: Modify priors manually # ======================================================================== # Make a few changes; first, change the population mean for the # series-level random intercepts test_priors$prior[2] <- "mu_raw ~ normal(0.2, 0.5);" # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- "ar2 ~ normal(0, 0.25);" # Check that the changes are made to the model file without any warnings # by setting 'run_model = FALSE' mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # No warnings, the model is ready for fitting now in the usual way with # the addition of the 'priors' argument # ======================================================================== # Example 3: Use brms syntax for prior modification # ======================================================================== # The same can be done using 'brms' functions; here we will also change # the ar1 prior and put some bounds on the ar coefficients to enforce # stationarity; we set the prior using the 'class' argument in all brms # prior functions brmsprior <- c( prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1) ) brmsprior mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = brmsprior, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 4: Error handling example # ======================================================================== # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- "ar2_bananas ~ normal(0, 0.25);" mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 5: Parametric (fixed effect) priors # ======================================================================== simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors( y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = AR() ) # Change priors for the intercept and fake covariate effects priors$prior[1] <- "(Intercept) ~ normal(0, 1);" priors$prior[2] <- "cov ~ normal(0, 0.1);" mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = priors, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 6: Alternative brms syntax for fixed effects # ======================================================================== # Likewise using 'brms' utilities (note that you can use Intercept rather # than `(Intercept)`) to change priors on the intercept brmsprior <- c( prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept) ) brmsprior mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = brmsprior, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 7: Bulk prior assignment # ======================================================================== # The "class = 'b'" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam( y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = "b"), data = dat, family = gaussian(), run_model = FALSE ) stancode(mod) } } \seealso{ \code{\link{mvgam}}, \code{\link{mvgam_formulae}}, \code{\link[brms]{prior}} } \author{ Nicholas J Clark } ================================================ FILE: man/gratia_mvgam_enhancements.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/gratia_methods.R \name{gratia_mvgam_enhancements} \alias{gratia_mvgam_enhancements} \alias{drawDotmvgam} \alias{draw.mvgam} \alias{eval_smoothDothilbertDotsmooth} \alias{eval_smooth.hilbert.smooth} \alias{eval_smoothDotmodDotsmooth} \alias{eval_smooth.mod.smooth} \alias{eval_smoothDotmoiDotsmooth} \alias{eval_smooth.moi.smooth} \title{Enhance post-processing of \pkg{mvgam} models using \pkg{gratia} functionality} \usage{ drawDotmvgam( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) eval_smoothDothilbertDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmodDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmoiDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) } \arguments{ \item{object}{a fitted mvgam, the result of a call to \code{\link[=mvgam]{mvgam()}}} \item{trend_effects}{logical specifying whether smooth terms from the \code{trend_formula} should be drawn. If \code{FALSE}, only terms from the observation formula are drawn. If \code{TRUE}, only terms from the \code{trend_formula} are drawn} \item{data}{a data frame of covariate values at which to evaluate the model's smooth functions} \item{select}{character, logical, or numeric; which smooths to plot. If \code{NULL}, the default, then all model smooths are drawn. Character \code{select} matches the labels for smooths as shown for example in the output from \code{summary(object)}. Logical \code{select} operates as per numeric \code{select} in the order that smooths are stored} \item{parametric}{logical; plot parametric terms also? Note that \code{select} is used for selecting which smooths to plot. The \code{terms} argument is used to select which parametric effects are plotted. The default, as with \code{\link[mgcv:plot.gam]{mgcv::plot.gam()}}, is to not draw parametric effects} \item{terms}{character; which model parametric terms should be drawn? The Default of \code{NULL} will plot all parametric terms that can be drawn.} \item{residuals}{currently ignored for \code{mvgam} models} \item{scales}{character; should all univariate smooths be plotted with the same y-axis scale? If \code{scales = "free"}, the default, each univariate smooth has its own y-axis scale. If \code{scales = "fixed"}, a common y axis scale is used for all univariate smooths. Currently does not affect the y-axis scale of plots of the parametric terms} \item{ci_level}{numeric between 0 and 1; the coverage of credible interval.} \item{n}{numeric; the number of points over the range of the covariate at which to evaluate the smooth} \item{n_3d, n_4d}{numeric; the number of points over the range of last covariate in a 3D or 4D smooth. The default is \code{NULL} which achieves the standard behaviour of using \code{n} points over the range of all covariate, resulting in \code{n^d} evaluation points, where \code{d} is the dimension of the smooth. For \code{d > 2} this can result in very many evaluation points and slow performance. For smooths of \code{d > 4}, the value of \code{n_4d} will be used for all dimensions \verb{> 4}, unless this is \code{NULL}, in which case the default behaviour (using \code{n} for all dimensions) will be observed} \item{unconditional}{ignored for \code{mvgam} models as all appropriate uncertainties are already included in the posterior estimates} \item{overall_uncertainty}{ignored for \code{mvgam} models as all appropriate uncertainties are already included in the posterior estimates} \item{constant}{numeric; a constant to add to the estimated values of the smooth. \code{constant}, if supplied, will be added to the estimated value before the confidence band is computed} \item{fun}{function; a function that will be applied to the estimated values and confidence interval before plotting. Can be a function or the name of a function. Function \code{fun} will be applied after adding any \code{constant}, if provided} \item{dist}{numeric; if greater than 0, this is used to determine when a location is too far from data to be plotted when plotting 2-D smooths. The data are scaled into the unit square before deciding what to exclude, and \code{dist} is a distance within the unit square. See \code{\link[mgcv:exclude.too.far]{mgcv::exclude.too.far()}} for further details} \item{rug}{logical; draw a rug plot at the bottom of each plot for 1-D smooths or plot locations of data for higher dimensions.} \item{contour}{logical; should contours be draw on the plot using \code{\link[ggplot2:geom_contour]{ggplot2::geom_contour()}}} \item{grouped_by}{logical; should factor by smooths be drawn as one panel per level of the factor (\code{FALSE}, the default), or should the individual smooths be combined into a single panel containing all levels (\code{TRUE})?} \item{ci_alpha}{numeric; alpha transparency for confidence or simultaneous interval} \item{ci_col}{colour specification for the confidence/credible intervals band. Affects the fill of the interval} \item{smooth_col}{colour specification for the smooth line} \item{resid_col}{colour specification for residual points. Ignored} \item{contour_col}{colour specification for contour lines} \item{n_contour}{numeric; the number of contour bins. Will result in \code{n_contour - 1} contour lines being drawn. See \code{\link[ggplot2:geom_contour]{ggplot2::geom_contour()}}} \item{partial_match}{logical; should smooths be selected by partial matches with \code{select}? If \code{TRUE}, \code{select} can only be a single string to match against} \item{discrete_colour}{a suitable colour scale to be used when plotting discrete variables} \item{discrete_fill}{a suitable fill scale to be used when plotting discrete variables.} \item{continuous_colour}{a suitable colour scale to be used when plotting continuous variables} \item{continuous_fill}{a suitable fill scale to be used when plotting continuous variables} \item{position}{Position adjustment, either as a string, or the result of a call to a position adjustment function} \item{angle}{numeric; the angle at which the x axis tick labels are to be drawn passed to the \code{angle} argument of \code{\link[ggplot2:guide_axis]{ggplot2::guide_axis()}}} \item{ncol, nrow}{numeric; the numbers of rows and columns over which to spread the plots} \item{guides}{character; one of \code{"keep"} (the default), \code{"collect"}, or \code{"auto"}. Passed to \code{\link[patchwork:plot_layout]{patchwork::plot_layout()}}} \item{widths, heights}{The relative widths and heights of each column and row in the grid. Will get repeated to match the dimensions of the grid. If there is more than 1 plot and \code{widths = NULL}, the value of \code{widths} will be set internally to \code{widths = 1} to accommodate plots of smooths that use a fixed aspect ratio.=} \item{crs}{the coordinate reference system (CRS) to use for the plot. All data will be projected into this CRS. See \code{\link[ggplot2:ggsf]{ggplot2::coord_sf()}} for details} \item{default_crs}{the coordinate reference system (CRS) to use for the non-sf layers in the plot. If left at the default \code{NULL}, the CRS used is 4326 (WGS84), which is appropriate for spline-on-the-sphere smooths, which are parameterized in terms of latitude and longitude as coordinates. See \code{\link[ggplot2:ggsf]{ggplot2::coord_sf()}} for more details} \item{lims_method}{character; affects how the axis limits are determined. See \code{\link[ggplot2:ggsf]{ggplot2::coord_sf()}}. Be careful; in testing of some examples, changing this to \code{"orthogonal"} for example with the chlorophyll-a example from Simon Wood's GAM book quickly used up all the RAM in my test system and the OS killed R. This could be incorrect usage on my part; right now the grid of points at which SOS smooths are evaluated (if not supplied by the user) can produce invalid coordinates for the corners of tiles as the grid is generated for tile centres without respect to the spacing of those tiles} \item{wrap}{logical; wrap plots as a patchwork? If \code{FALSE}, a list of ggplot objects is returned, 1 per term plotted} \item{envir}{an environment to look up the data within} \item{...}{additional arguments passed to other methods} \item{smooth}{a smooth object of class \code{"gp.smooth"} (returned from a model using either the \code{dynamic()} function or the \code{gp()} function) or of class \code{"moi.smooth"} or \code{"mod.smooth"} (returned from a model using the 'moi' or 'mod' basis)} \item{model}{a fitted \code{mgcv} model of clas \code{gam} or \code{bam}} } \description{ These evaluation and plotting functions exist to allow some popular \code{gratia} methods to work with \code{mvgam} or \code{jsdgam} models } \details{ These methods allow \code{mvgam} models to be \emph{Enhanced} if users have the \code{gratia} package installed, making available the popular \code{draw()} function to plot partial effects of \code{mvgam} smooth functions using \code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} utilities } \examples{ \dontrun{ # Fit a simple GAM and draw partial effects of smooths using 'gratia' set.seed(0) dat <- mgcv::gamSim( eg = 1, n = 200, scale = 2 ) mod <- mvgam( formula = y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) if (require("gratia")) { gratia::draw(mod) } } } \author{ Nicholas J Clark } ================================================ FILE: man/hindcast.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/hindcast.mvgam.R \name{hindcast.mvgam} \alias{hindcast.mvgam} \alias{hindcast} \title{Extract hindcasts for a fitted \code{mvgam} object} \usage{ hindcast(object, ...) \method{hindcast}{mvgam}(object, type = "response", ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{...}{Ignored} \item{type}{When this has the value \code{link} (default) the linear predictor is calculated on the link scale. If \code{expected} is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When \code{response} is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When \code{variance} is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When \code{type = "terms"}, each component of the linear predictor is returned separately in the form of a \code{list} (possibly with standard errors, if \code{summary = TRUE}): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type \code{latent_N} will return the estimated latent abundances from an N-mixture distribution, while type \code{detection} will return the estimated detection probability from an N-mixture distribution} } \value{ An object of class \code{mvgam_forecast} containing hindcast distributions. See \code{\link{mvgam_forecast-class}} for details. } \description{ Extract hindcasts for a fitted \code{mvgam} object } \details{ Posterior hindcasts (i.e. retrodictions) are drawn from the fitted \code{mvgam} and organized into a convenient format for plotting } \examples{ \dontrun{ simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) } } \seealso{ \code{\link[=plot.mvgam_forecast]{plot.mvgam_forecast()}}, \code{\link[=summary.mvgam_forecast]{summary.mvgam_forecast()}}, \code{\link[=forecast.mvgam]{forecast.mvgam()}}, \code{\link[=fitted.mvgam]{fitted.mvgam()}}, \code{\link[=predict.mvgam]{predict.mvgam()}} } ================================================ FILE: man/how_to_cite.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/how_to_cite.R \name{how_to_cite.mvgam} \alias{how_to_cite.mvgam} \alias{how_to_cite} \title{Generate a methods description for \pkg{mvgam} models} \usage{ how_to_cite(object, ...) \method{how_to_cite}{mvgam}(object, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} resulting from a call to \code{\link[=mvgam]{mvgam()}} or \code{\link[=jsdgam]{jsdgam()}}} \item{...}{ignored} } \value{ An object of class \code{how_to_cite} containing a text description of the methods as well as lists of both primary and additional references. } \description{ Create a brief but fully referenced methods description, along with a useful list of references, for fitted \code{mvgam} and \code{jsdgam} models. } \details{ This function uses the model's structure to come up with a very basic but hopefully useful methods description that can help users to appropriately acknowledge the hard work of developers and champion open science. Please do not consider the text returned by this function to be a completely adequate methods section; it is only meant to get you started. } \examples{ \dontrun{ #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and a VAR(1) dynamic process #-------------------------------------------------- set.seed(0) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = VAR(cor = TRUE), family = gaussian() ) # Fit an appropriate model mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = VAR(cor = TRUE), chains = 2, silent = 2 ) how_to_cite(mod1) #-------------------------------------------------- # For a GP example, simulate data using the mgcv package #-------------------------------------------------- dat <- mgcv::gamSim(1, n = 30, scale = 2) # Fit a model that uses an approximate GP from brms mod2 <- mvgam( y ~ gp(x2, k = 12), data = dat, family = gaussian(), chains = 2, silent = 2 ) how_to_cite(mod2) } } \seealso{ \code{\link[utils]{citation}}, \code{\link{mvgam}}, \code{\link{jsdgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/index-mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/index-mvgam.R \name{index-mvgam} \alias{index-mvgam} \alias{variables} \alias{Index} \alias{and} \alias{their} \alias{`mgcv`} \alias{coefficient} \alias{names} \alias{variables.mvgam} \title{Index \code{mvgam} objects} \usage{ \method{variables}{mvgam}(x, ...) } \arguments{ \item{x}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{...}{Arguments passed to individual methods (if applicable).} } \value{ a \code{list} object of the variables that can be extracted, along with their aliases } \description{ Index \code{mvgam} objects } \examples{ \dontrun{ # Simulate data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract model variables variables(mod) } } \author{ Nicholas J Clark } ================================================ FILE: man/irf.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/irf.mvgam.R \name{irf.mvgam} \alias{irf.mvgam} \alias{irf} \title{Calculate latent VAR impulse response functions} \usage{ irf(object, ...) \method{irf}{mvgam}(object, h = 10, cumulative = FALSE, orthogonal = FALSE, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} resulting from a call to \code{\link[=mvgam]{mvgam()}} that used a Vector Autoregressive latent process model (either as \code{VAR(cor = FALSE)} or \code{VAR(cor = TRUE)}; see \code{\link[=VAR]{VAR()}} for details)} \item{...}{ignored} \item{h}{Positive \code{integer} specifying the forecast horizon over which to calculate the IRF} \item{cumulative}{\code{Logical} flag indicating whether the IRF should be cumulative} \item{orthogonal}{\code{Logical} flag indicating whether orthogonalized IRFs should be calculated. Note that the order of the variables matters when calculating these} } \value{ An object of \code{\link{mvgam_irf-class}} containing the posterior IRFs. This object can be used with the supplied S3 functions \code{\link[=plot.mvgam_irf]{plot.mvgam_irf()}} and \code{\link[=summary.mvgam_irf]{summary.mvgam_irf()}} } \description{ Compute Generalized or Orthogonalized Impulse Response Functions (IRFs) from \code{mvgam} models with Vector Autoregressive dynamics } \details{ See \code{\link{mvgam_irf-class}} for a full description of the quantities that are computed and returned by this function, along with key references. } \examples{ \dontrun{ # Fit a model to the portal time series that uses a latent VAR(1) mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate Generalized IRFs for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot them plot(irfs, series = 1) plot(irfs, series = 2) plot(irfs, series = 3) plot(irfs, series = 4) # Calculate posterior median, upper and lower 95th quantiles # of the impulse responses summary(irfs) } } \seealso{ \code{\link{mvgam_irf-class}}, \code{\link[=VAR]{VAR()}}, \code{\link[=plot.mvgam_irf]{plot.mvgam_irf()}}, \code{\link[=stability]{stability()}}, \code{\link[=fevd]{fevd()}} } \author{ Nicholas J Clark } ================================================ FILE: man/jsdgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/jsdgam.R \name{jsdgam} \alias{jsdgam} \title{Fit Joint Species Distribution Models in \pkg{mvgam}} \usage{ jsdgam( formula, factor_formula = ~-1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, residuals = TRUE, ... ) } \arguments{ \item{formula}{A \code{formula} object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, \code{s()}, \code{te()}, \code{ti()}, \code{t2()}, as well as time-varying \code{dynamic()} terms, nonparametric \code{gp()} terms and offsets using \code{offset()}, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). Details of the formula syntax used by \pkg{mvgam} can be found in \code{\link{mvgam_formulae}}} \item{factor_formula}{A \code{formula} object specifying the linear predictor effects for the latent factors. Use \code{by = trend} within calls to functional terms (i.e. \code{s()}, \code{te()}, \code{ti()}, \code{t2()}, \code{dynamic()}, or \code{gp()}) to ensure that each factor captures a different axis of variation. See the example below as an illustration} \item{knots}{An optional \code{list} containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied. Different terms can use different numbers of knots, unless they share a covariate.} \item{factor_knots}{An optional \code{list} containing user specified knot values to be used for basis construction of any smooth terms in \code{factor_formula}. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied (note that the number of knots is not always just \code{k}). Different terms can use different numbers of knots, unless they share a covariate} \item{data}{A \code{dataframe} or \code{list} containing the model response variable and covariates required by the GAM \code{formula} and \code{factor_formula} objects} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables as in \code{data}. If included, observations in variable \code{y} will be set to \code{NA} when fitting the model so that posterior simulations can be obtained.} \item{family}{\code{family} specifying the observation family for the outcomes. Currently supported families are: \itemize{ \item\code{gaussian()} for real-valued data \item\code{betar()} for proportional data on \verb{(0,1)} \item\code{lognormal()} for non-negative real-valued data \item\code{student_t()} for real-valued data \item\code{Gamma()} for non-negative real-valued data \item\code{bernoulli()} for binary data \item\code{poisson()} for count data \item\code{nb()} for overdispersed count data \item\code{binomial()} for count data with imperfect detection when the number of trials is known; note that the \code{cbind()} function must be used to bind the discrete observations and the discrete number of trials \item\code{beta_binomial()} as for \code{binomial()} but allows for overdispersion} Default is \code{poisson()}. See \code{\link{mvgam_families}} for more details} \item{unit}{The unquoted name of the variable that represents the unit of analysis in \code{data} over which latent residuals should be correlated. This variable should be either a \code{numeric} or \code{integer} variable in the supplied \code{data}. Defaults to \code{time} to be consistent with other functionalities in \pkg{mvgam}, though note that the data need not be time series in this case. See examples below for further details and explanations} \item{species}{The unquoted name of the \code{factor} variable that indexes the different response units in \code{data} (usually \code{'species'} in a JSDM). Defaults to \code{series} to be consistent with other \code{mvgam} models} \item{share_obs_params}{\code{logical}. If \code{TRUE} and the \code{family} has additional family-specific observation parameters (e.g., variance components, dispersion parameters), these will be shared across all outcome variables. Useful when multiple outcomes share properties. Default is \code{FALSE}.} \item{priors}{An optional \code{data.frame} with prior definitions (in Stan syntax) or, preferentially, a vector containing objects of class \code{brmsprior} (see. \code{\link[brms]{prior}} for details). See \link{get_mvgam_priors} and for more information on changing default prior distributions} \item{n_lv}{\code{integer} the number of latent factors to use for modelling residual associations. Cannot be \verb{> n_species}. Defaults arbitrarily to \code{2}} \item{backend}{Character string naming the package for Stan model fitting. Options are \code{"cmdstanr"} (default) or \code{"rstan"}. Can be set globally via \code{"brms.backend"} option. See https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/ for details.} \item{algorithm}{Character string naming the estimation approach: \itemize{ \item \code{"sampling"}: MCMC (default) \item \code{"meanfield"}: Variational inference with factorized normal distributions \item \code{"fullrank"}: Variational inference with multivariate normal distribution \item \code{"laplace"}: Laplace approximation (cmdstanr only) \item \code{"pathfinder"}: Pathfinder algorithm (cmdstanr only) } Can be set globally via \code{"brms.algorithm"} option. Limited testing suggests \code{"meanfield"} performs best among non-MCMC approximations for dynamic GAMs.} \item{control}{Named \code{list} for controlling sampler behaviour. Valid elements include \code{max_treedepth}, \code{adapt_delta} and \code{init}.} \item{chains}{\code{integer} specifying the number of parallel chains for the model. Ignored for variational inference algorithms.} \item{burnin}{\code{integer} specifying the number of warmup iterations to tune sampling algorithms. Ignored for variational inference algorithms.} \item{samples}{\code{integer} specifying the number of post-warmup iterations for sampling the posterior distribution.} \item{thin}{Thinning interval for monitors. Ignored for variational inference algorithms.} \item{parallel}{\code{logical} specifying whether to use multiple cores for parallel MCMC simulation. If \code{TRUE}, uses \code{min(c(chains, parallel::detectCores() - 1))} cores.} \item{threads}{\code{integer} Experimental option to use multithreading for within-chain parallelisation in \code{Stan}. We recommend its use only if you are experienced with \code{Stan}'s \code{reduce_sum} function and have a slow running model that cannot be sped up by any other means. Currently works for all families when using \pkg{cmdstanr} as the backend} \item{silent}{Verbosity level between \code{0} and \code{2}. If \code{1} (default), most informational messages are suppressed. If \code{2}, even more messages are suppressed. Sampling progress is still printed - set \code{refresh = 0} to disable. For \code{backend = "rstan"}, also set \code{open_progress = FALSE} to prevent additional progress bars.} \item{run_model}{\code{logical}. If \code{FALSE}, the model is not fitted but instead the function returns the model file and the data/initial values needed to fit the model outside of \code{mvgam}.} \item{return_model_data}{\code{logical}. If \code{TRUE}, the list of data needed to fit the model is returned, along with initial values for smooth and AR parameters, once the model is fitted. Helpful for users who wish to modify the model file to add other stochastic elements. Default is \code{FALSE} unless \code{run_model == FALSE}.} \item{residuals}{\code{logical}. Whether to compute series-level randomized quantile residuals. Default is \code{TRUE}. Set to \code{FALSE} to save time and reduce object size (can add later using \link{add_residuals}).} \item{...}{Other arguments to pass to \link{mvgam}} } \value{ A \code{list} object of class \code{mvgam} containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each species and key information needed for other functions in the package. See \code{\link{mvgam-class}} for details. Use \code{methods(class = "mvgam")} for an overview on available methods } \description{ This function sets up a Joint Species Distribution Model whereby the residual associations among species can be modelled in a reduced-rank format using a set of latent factors. The factor specification is extremely flexible, allowing users to include spatial, temporal or any other type of predictor effects to more efficiently capture unmodelled residual associations, while the observation model can also be highly flexible (including all smooth, GP and other effects that \pkg{mvgam} can handle) } \details{ Joint Species Distribution Models allow for responses of multiple species to be learned hierarchically, whereby responses to environmental variables in \code{formula} can be partially pooled and any latent, unmodelled residual associations can also be learned. In \pkg{mvgam}, both of these effects can be modelled with the full power of latent factor Hierarchical GAMs, providing unmatched flexibility to model full communities of species. When calling \link{jsdgam}, an initial State-Space model using \code{trend = 'None'} is set up and then modified to include the latent factors and their linear predictors. Consequently, you can inspect priors for these models using \link{get_mvgam_priors} by supplying the relevant \code{formula}, \code{factor_formula}, \code{data} and \code{family} arguments and keeping the default \code{trend = 'None'}. In a JSDGAM, the expectation of response \eqn{Y_{ij}} is modelled with \deqn{g(\mu_{ij}) = X_i\beta + u_i\theta_j,} where \eqn{g(.)} is a known link function, \eqn{X} is a design matrix of linear predictors (with associated \eqn{\beta} coefficients), \eqn{u} are \eqn{n_{lv}}-variate latent factors (\eqn{n_{lv}}<<\eqn{n_{species}}) and \eqn{\theta_j} are species-specific loadings on the latent factors, respectively. The design matrix \eqn{X} and \eqn{\beta} coefficients are constructed and modelled using \code{formula} and can contain any of \code{mvgam}'s predictor effects, including random intercepts and slopes, multidimensional penalized smooths, GP effects etc... The factor loadings \eqn{\theta_j} are constrained for identifiability but can be used to reconstruct an estimate of the species' residual variance-covariance matrix using \eqn{\Theta \Theta'} (see the example below and \code{\link[=residual_cor]{residual_cor()}} for details). The latent factors are further modelled using: \deqn{ u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) } where the second design matrix \eqn{Q} and associated \eqn{\beta_{factor}} coefficients are constructed and modelled using \code{factor_formula}. Again, the effects that make up this linear predictor can contain any of \code{mvgam}'s allowed predictor effects, providing enormous flexibility for modelling species' communities. } \examples{ \dontrun{ # ======================================================================== # Example 1: Basic JSDGAM with Portal Data # ======================================================================== # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot covariate effects library(ggplot2); theme_set(theme_bw()) plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series') ) plot_predictions( mod, condition = c('mintemp', 'series', 'series') ) # A residual correlation plot plot(residual_cor(mod)) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod, alpha = 0.7) } # ======================================================================== # Example 2: Advanced JSDGAM with Spatial Predictors # ======================================================================== # Simulate latent count data for 500 spatial locations and 10 species set.seed(0) N_points <- 500 N_species <- 10 # Species-level intercepts (on the log scale) alphas <- runif(N_species, 2, 2.25) # Simulate a covariate and species-level responses to it temperature <- rnorm(N_points) betas <- runif(N_species, -0.5, 0.5) # Simulate points uniformly over a space lon <- runif(N_points, min = 150, max = 155) lat <- runif(N_points, min = -20, max = -19) # Set up spatial basis functions as a tensor product of lat and lon sm <- mgcv::smoothCon( mgcv::te(lon, lat, k = 5), data = data.frame(lon, lat), knots = NULL )[[1]] # The design matrix for this smooth is in the 'X' slot des_mat <- sm$X dim(des_mat) # Function to generate a random covariance matrix where all variables # have unit variance (i.e. diagonals are all 1) random_Sigma = function(N){ L_Omega <- matrix(0, N, N); L_Omega[1, 1] <- 1; for (i in 2 : N) { bound <- 1; for (j in 1 : (i - 1)) { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); bound <- bound - L_Omega[i, j] ^ 2; } L_Omega[i, i] <- sqrt(bound); } Sigma <- L_Omega \%*\% t(L_Omega); return(Sigma) } # Simulate a variance-covariance matrix for the correlations among # basis coefficients Sigma <- random_Sigma(N = NCOL(des_mat)) # Now simulate the species-level basis coefficients hierarchically, where # spatial basis function correlations are a convex sum of a base correlation # matrix and a species-level correlation matrix basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) for(t in 1:N_species){ corOmega <- (cov2cor(Sigma) * 0.7) + (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) } # Simulate the latent spatial processes st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ data.frame( lat = lat, lon = lon, species = paste0('species_', t), temperature = temperature, process = alphas[t] + betas[t] * temperature + des_mat \%*\% basis_coefs[t,] ) })) # Now take noisy observations at some of the points (60) obs_points <- sample(1:N_points, size = 60, replace = FALSE) obs_points <- data.frame( lat = lat[obs_points], lon = lon[obs_points], site = 1:60 ) # Keep only the process data at these points st_process \%>\% dplyr::inner_join(obs_points, by = c('lat', 'lon')) \%>\% # now take noisy Poisson observations of the process dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) \%>\% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) \%>\% dplyr::group_by(lat, lon) -> dat # View the count distributions for each species ggplot(dat, aes(x = count)) + geom_histogram() + facet_wrap(~ species, scales = 'free') ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) + geom_point(size = 2.25) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() # ------------------------------------------------------------------------ # Model Fitting with Custom Priors # ------------------------------------------------------------------------ # Inspect default priors for a joint species model with three spatial factors priors <- get_mvgam_priors( formula = count ~ # Environmental model includes random slopes for # a linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # The data and grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson() ) head(priors) # Fit a JSDM that estimates hierarchical temperature responses # and that uses three latent spatial factors mod <- jsdgam( formula = count ~ # Environmental model includes random slopes for a # linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # Change default priors for fixed random effect variances and # factor GP marginal deviations to standard normal priors = c( prior(std_normal(), class = sigma_raw), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend1`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend2`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend3`) ), # The data and the grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson(), chains = 2, silent = 2 ) # ------------------------------------------------------------------------ # Model Visualization and Diagnostics # ------------------------------------------------------------------------ # Plot the implicit species-level intercept estimates plot_predictions(mod, condition = 'species', type = 'link') # Plot species' hierarchical responses to temperature plot_predictions( mod, condition = c('temperature', 'species', 'species'), type = 'link' ) # Plot posterior median estimates of the latent spatial factors plot(mod, type = 'smooths', trend_effects = TRUE) # Or using gratia, if you have it installed if(requireNamespace('gratia', quietly = TRUE)){ gratia::draw(mod, trend_effects = TRUE, dist = 0) } # Plot species' randomized quantile residual distributions # as a function of latitude pp_check( mod, type = 'resid_ribbon_grouped', group = 'species', x = 'lat', ndraws = 200 ) # ------------------------------------------------------------------------ # Residual Correlation Analysis # ------------------------------------------------------------------------ # Calculate residual spatial correlations post_cors <- residual_cor(mod) names(post_cors) # Look at lower and upper credible interval estimates for # some of the estimated correlations post_cors$cor[1:5, 1:5] post_cors$cor_upper[1:5, 1:5] post_cors$cor_lower[1:5, 1:5] # Plot of the posterior median correlations for those estimated # to be non-zero plot(post_cors, cluster = TRUE) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # ------------------------------------------------------------------------ # Model Validation and Prediction # ------------------------------------------------------------------------ # Posterior predictive checks and ELPD-LOO can ascertain model fit pp_check( mod, type = "pit_ecdf_grouped", group = "species", ndraws = 200 ) loo(mod) # Forecast log(counts) for entire region (site value doesn't matter as long # as each spatial location has a different and unique site identifier); # note this calculation takes a few minutes because of the need to calculate # draws from the stochastic latent factors newdata <- st_process \%>\% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) \%>\% dplyr::group_by(lat, lon) \%>\% dplyr::mutate(site = dplyr::cur_group_id()) \%>\% dplyr::ungroup() preds <- predict(mod, newdata = newdata) # Plot the median log(count) predictions on a grid newdata$log_count <- preds[,1] ggplot(newdata, aes(x = lon, y = lat, col = log_count)) + geom_point(size = 1.5) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic() # Not needed for general use; cleans up connections for automated testing closeAllConnections() } } \references{ Nicholas J Clark & Konstans Wells (2023). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784. \cr \cr David I Warton, F Guillaume Blanchet, Robert B O'Hara, Otso Ovaskainen, Sara Taskinen, Steven C Walker & Francis KC Hui (2015). So many variables: joint modeling in community ecology. Trends in Ecology & Evolution 30:12, 766-779. } \seealso{ \code{\link[=mvgam]{mvgam()}}, \code{\link[=residual_cor]{residual_cor()}} } \author{ Nicholas J Clark } ================================================ FILE: man/lfo_cv.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/lfo_cv.mvgam.R \name{lfo_cv.mvgam} \alias{lfo_cv.mvgam} \alias{lfo_cv} \title{Approximate leave-future-out cross-validation of fitted \pkg{mvgam} objects} \usage{ lfo_cv(object, ...) \method{lfo_cv}{mvgam}( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{...}{Ignored} \item{data}{A \code{dataframe} or \code{list} containing the model response variable and covariates required by the GAM \code{formula}. Should include columns: 'series' (character or factor index of the series IDs) 'time' (numeric index of the time point for each observation). Any other variables to be included in the linear predictor of \code{formula} must also be present} \item{min_t}{Integer specifying the minimum training time required before making predictions from the data. Default is either the \code{30}th timepoint in the observational data, or whatever training time allows for at least \code{10} lfo-cv calculations, if possible. This value is essentially arbitrary so it is highly recommended to change it to something that is more suitable to the data and models being evaluated.} \item{fc_horizon}{Integer specifying the number of time steps ahead for evaluating forecasts} \item{pareto_k_threshold}{Proportion specifying the threshold over which the Pareto shape parameter is considered unstable, triggering a model refit. Default is \code{0.7}} \item{silent}{Verbosity level between \code{0} and \code{2}. If \code{1} (the default), most of the informational messages of compiler and sampler are suppressed. If \code{2}, even more messages are suppressed. The actual sampling progress is still printed. Set \code{refresh = 0} to turn this off as well. If using \code{backend = "rstan"} you can also set open_progress = FALSE to prevent opening additional progress bars.} } \value{ A \code{list} of class \code{mvgam_lfo} containing the approximate ELPD scores, the Pareto-k shape values and 'the specified \code{pareto_k_threshold} } \description{ Approximate leave-future-out cross-validation of fitted \pkg{mvgam} objects } \details{ Approximate leave-future-out cross-validation uses an expanding training window scheme to evaluate a model on its forecasting ability. The steps used in this function mirror those laid out in the \href{https://mc-stan.org/loo/articles/loo2-lfo.html}{lfo vignette from the \code{loo} package}, written by Paul Bürkner, Jonah Gabry, Aki Vehtari. First, we refit the model using the first \code{min_t} observations to perform a single exact \code{fc_horizon}-ahead forecast step. This forecast is evaluated against the \code{min_t + fc_horizon} out of sample observations using the Expected Log Predictive Density (ELPD). Next, we approximate each successive round of expanding window forecasts by moving forward one step at a time \verb{for i in 1:N_evaluations} and re-weighting draws from the model's posterior predictive distribution using Pareto Smoothed Importance Sampling (PSIS). In each iteration \code{i}, PSIS weights are obtained for the next observation that would have been included in the model if we had re-fit (i.e. the last observation that would have been in the training data, or \code{min_t + i}). If these importance ratios are stable, we consider the approximation adequate and use the re-weighted posterior's forecast for evaluating the next holdout set of testing observations (\code{(min_t + i + 1):(min_t + i + fc_horizon)}). At some point the importance ratio variability will become too large and importance sampling will fail. This is indicated by the estimated shape parameter \code{k} of the generalized Pareto distribution crossing a certain threshold \code{pareto_k_threshold}. Only then do we refit the model using all of the observations up to the time of the failure. We then restart the process and iterate forward until the next refit is triggered (Bürkner et al. 2020). } \examples{ \dontrun{ # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3, silent = 2) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3, silent = 2) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD } } \references{ Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models Journal of Statistical Computation and Simulation. 90:14, 2499-2523. } \seealso{ \code{\link{forecast}}, \code{\link{score}}, \code{\link{compare_mvgams}} } \author{ Nicholas J Clark } ================================================ FILE: man/logLik.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/logLik.mvgam.R \name{logLik.mvgam} \alias{logLik.mvgam} \title{Compute pointwise Log-Likelihoods from fitted \pkg{mvgam} objects} \usage{ \method{logLik}{mvgam}(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}} \item{linpreds}{Optional \code{matrix} of linear predictor draws to use for calculating pointwise log-likelihoods.} \item{newdata}{Optional \code{data.frame} or \code{list} object specifying which series each column in \code{linpreds} belongs to. If \code{linpreds} is supplied, then \code{newdata} must also be supplied.} \item{family_pars}{Optional \code{list} containing posterior draws of family-specific parameters (i.e. shape, scale or overdispersion parameters). Required if \code{linpreds} and \code{newdata} are supplied.} \item{include_forecast}{Logical. If \code{newdata} were fed to the model to compute forecasts, should the log-likelihood draws for these observations also be returned. Defaults to \code{TRUE}.} \item{...}{Ignored} } \value{ A \code{matrix} of dimension \verb{n_samples x n_observations} containing the pointwise log-likelihood draws for all observations in \code{newdata}. If no \code{newdata} is supplied, log-likelihood draws are returned for all observations that were originally fed to the model (training observations and, if supplied to the original model via the \code{newdata} argument in \code{\link{mvgam}}, testing observations). } \description{ Compute pointwise Log-Likelihoods from fitted \pkg{mvgam} objects } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract log-likelihood values lls <- logLik(mod) str(lls) } } \author{ Nicholas J Clark } ================================================ FILE: man/loo.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/loo.mvgam.R \name{loo.mvgam} \alias{loo.mvgam} \alias{loo_compare.mvgam} \title{LOO information criteria for \pkg{mvgam} models} \usage{ \method{loo}{mvgam}(x, incl_dynamics = FALSE, ...) \method{loo_compare}{mvgam}(x, ..., model_names = NULL, incl_dynamics = FALSE) } \arguments{ \item{x}{Object of class \code{mvgam}} \item{incl_dynamics}{Deprecated and currently ignored} \item{...}{More \code{mvgam} objects} \item{model_names}{If \code{NULL} (the default) will use model names derived from deparsing the call. Otherwise will use the passed values as model names} } \value{ For \code{loo.mvgam}, an object of class \code{psis_loo} (see \code{\link[loo:loo]{loo::loo()}} for details). For \code{loo_compare.mvgam}, an object of class \code{compare.loo} (see \code{\link[loo:loo_compare]{loo::loo_compare()}} for details). } \description{ Extract the LOOIC (leave-one-out information criterion) using \code{\link[loo:loo]{loo::loo()}}. } \details{ When comparing two (or more) fitted \code{mvgam} models, we can estimate the difference in their in-sample predictive accuracies using the Expected Log Predictive Density (ELPD). This metric can be approximated using Pareto Smoothed Importance Sampling (PSIS), which re-weights posterior draws to approximate predictions for a datapoint had it not been included in the original model fit (i.e. leave-one-out cross-validation). See \code{\link[loo:loo]{loo::loo()}} and \code{\link[loo:loo_compare]{loo::loo_compare()}} for further details on how this importance sampling works. Note: In-sample predictive metrics such as PSIS-LOO can sometimes be overly optimistic for models that include process error components (e.g. those with \code{trend_model}, \code{trend_formula}, or \code{factor_formula}). Consider using out-of-sample evaluations for further scrutiny (see \code{\link{forecast.mvgam}}, \code{\link{score.mvgam_forecast}}, \code{\link{lfo_cv}}). } \examples{ \dontrun{ #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes #-------------------------------------------------- set.seed(111) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod1) mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) # Fit a model with hierarchical seasonality mod2 <- update( mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), chains = 2, silent = 2 ) conditional_effects(mod2) loo(mod2) # Add AR1 dynamic errors to mod2 mod3 <- update( mod2, trend_model = AR(), chains = 2, silent = 2 ) conditional_effects(mod3) plot(mod3, type = 'trend') loo(mod3) #-------------------------------------------------- # Compare models using LOO #-------------------------------------------------- loo_compare(mod1, mod2, mod3) options(mc.cores = mc.cores.def) #-------------------------------------------------- # Compare forecast abilities using LFO-CV #-------------------------------------------------- lfo_mod2 <- lfo_cv(mod2, min_t = 92) lfo_mod3 <- lfo_cv(mod3, min_t = 92) # Plot forecast ELPD differences plot( y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints, pch = 16, ylab = 'ELPD_mod2 - ELPD_mod3', xlab = 'Evaluation timepoint' ) abline(h = 0, lty = 'dashed') } } \author{ Nicholas J Clark } ================================================ FILE: man/lv_correlations.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/lv_correlations.R \name{lv_correlations} \alias{lv_correlations} \title{Calculate trend correlations based on latent factor loadings for \pkg{mvgam} models} \usage{ lv_correlations(object) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} that used latent factors, either with \code{use_lv = TRUE} or by supplying a \code{trend_map}. See \code{\link[=mvgam]{mvgam()}} for details and for an example.} } \value{ A \code{list} object containing the mean posterior correlations and the full array of posterior correlations. } \description{ This function uses factor loadings from a fitted dynamic factor \code{mvgam} model to calculate temporal correlations among series' trends. } \details{ Although this function will still work, it is now recommended to use \code{\link[=residual_cor]{residual_cor()}} to obtain residual correlation information in a more user-friendly format that allows for a deeper investigation of relationships among the time series. } \examples{ \dontrun{ #-------------------------------------------------- # Fit a model that uses two AR(1) dynamic factors to model # the temporal dynamics of the four rodent species in the portal_data #-------------------------------------------------- mod <- mvgam( captures ~ series, trend_model = AR(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the two dynamic factors plot(mod, type = 'factors') # Calculate correlations among the series lvcors <- lv_correlations(mod) names(lvcors) lapply(lvcors, class) # Recommended: use residual_cor() instead lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # Plot credible correlations as a matrix plot(lvcors, cluster = TRUE) # Not needed for general use; cleans up connections for automated testing closeAllConnections() } } \seealso{ \code{\link[=residual_cor]{residual_cor()}}, \code{\link[=plot.mvgam_residcor]{plot.mvgam_residcor()}} } ================================================ FILE: man/mcmc_plot.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mcmc_plot.mvgam.R \name{mcmc_plot.mvgam} \alias{mcmc_plot.mvgam} \title{MCMC plots of \pkg{mvgam} parameters, as implemented in \pkg{bayesplot}} \usage{ \method{mcmc_plot}{mvgam}( object, type = "intervals", variable = NULL, regex = FALSE, use_alias = TRUE, ... ) } \arguments{ \item{object}{An \R object typically of class \code{brmsfit}} \item{type}{The type of the plot. Supported types are (as names) \code{hist}, \code{dens}, \code{hist_by_chain}, \code{dens_overlay}, \code{violin}, \code{intervals}, \code{areas}, \code{areas_ridges}, \code{combo}, \code{acf}, \code{acf_bar}, \code{trace}, \code{trace_highlight}, \code{scatter}, \code{hex}, \code{pairs}, \code{violin}, \code{rhat}, \code{rhat_hist}, \code{neff}, \code{neff_hist} and \code{nuts_energy}. For an overview on the various plot types see \code{\link[bayesplot:MCMC-overview]{MCMC-overview}}.} \item{variable}{Names of the variables (parameters) to plot, as given by a character vector or a regular expression (if \code{regex = TRUE}). By default, a hopefully not too large selection of variables is plotted.} \item{regex}{Logical; Indicates whether \code{variable} should be treated as regular expressions. Defaults to \code{FALSE}.} \item{use_alias}{Logical. If more informative names for parameters are available (i.e. for beta coefficients \code{b} or for smoothing parameters \code{rho}), replace the uninformative names with the more informative alias. Defaults to \code{TRUE}.} \item{...}{Additional arguments passed to the plotting functions. See \code{\link[bayesplot:MCMC-overview]{MCMC-overview}} for more details.} } \value{ A \code{\link[ggplot2:ggplot]{ggplot}} object that can be further customized using the \pkg{ggplot2} package. } \description{ Convenient way to call MCMC plotting functions implemented in the \pkg{bayesplot} package for \pkg{mvgam} models } \examples{ \dontrun{ simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') } } \seealso{ \code{\link{mvgam_draws}} for an overview of some of the shortcut strings that can be used for argument \code{variable} } ================================================ FILE: man/model.frame.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/model.frame.mvgam.R \name{model.frame.mvgam} \alias{model.frame.mvgam} \alias{model.frame.mvgam_prefit} \title{Extract model.frame from a fitted \pkg{mvgam} object} \usage{ \method{model.frame}{mvgam}(formula, trend_effects = FALSE, ...) \method{model.frame}{mvgam_prefit}(formula, trend_effects = FALSE, ...) } \arguments{ \item{formula}{a model \code{\link[stats]{formula}} or \code{\link[stats]{terms}} object or an \R object.} \item{trend_effects}{\code{logical}, return the model.frame from the observation model (if \code{FALSE}) or from the underlying process model (if \code{TRUE})} \item{...}{Ignored} } \value{ A \code{matrix} containing the fitted model frame } \description{ Extract model.frame from a fitted \pkg{mvgam} object } \author{ Nicholas J Clark } ================================================ FILE: man/monotonic.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/monotonic.R \name{monotonic} \alias{monotonic} \alias{smooth.construct.moi.smooth.spec} \alias{smooth.construct.mod.smooth.spec} \alias{Predict.matrix.moi.smooth} \alias{Predict.matrix.mod.smooth} \title{Monotonic splines in \pkg{mvgam} models} \usage{ \method{smooth.construct}{moi.smooth.spec}(object, data, knots) \method{smooth.construct}{mod.smooth.spec}(object, data, knots) \method{Predict.matrix}{moi.smooth}(object, data) \method{Predict.matrix}{mod.smooth}(object, data) } \arguments{ \item{object}{A smooth specification object, usually generated by a term \code{s(x, bs = "moi", ...)} or \code{s(x, bs = "mod", ...)}} \item{data}{a list containing just the data (including any \code{by} variable) required by this term, with names corresponding to \code{object$term} (and \code{object$by}). The \code{by} variable is the last element.} \item{knots}{a list containing any knots supplied for basis setup --- in same order and with same names as \code{data}. Can be \code{NULL}. See details for further information.} } \value{ An object of class \code{"moi.smooth"} or \code{"mod.smooth"}. In addition to the usual elements of a smooth class documented under \code{\link[mgcv]{smooth.construct}}, this object will contain a slot called \code{boundary} that defines the endpoints beyond which the spline will begin extrapolating (extrapolation is flat due to the first order penalty placed on the smooth function) } \description{ Uses constructors from package \pkg{splines2} to build monotonically increasing or decreasing splines. Details also in Wang & Yan (2021). } \details{ The constructor is not normally called directly, but is rather used internally by \link{mvgam}. If they are not supplied then the knots of the spline are placed evenly throughout the covariate values to which the term refers: For example, if fitting 101 data with an 11 knot spline of x then there would be a knot at every 10th (ordered) x value. The spline is an implementation of the closed-form I-spline basis based on the recursion formula given by Ramsay (1988), in which the basis coefficients must be constrained to either be non-negative (for monotonically increasing functions) or non-positive (monotonically decreasing) \cr \cr Take note that when using either monotonic basis, the number of basis functions \code{k} must be supplied as an even integer due to the manner in which monotonic basis functions are constructed } \note{ This constructor will result in a valid smooth if using a call to \code{\link[mgcv]{gam}} or \code{\link[mgcv]{bam}}, however the resulting functions will not be guaranteed to be monotonic because constraints on basis coefficients will not be enforced } \examples{ \dontrun{ # Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity library(mgcv) mod_data <- data.frame(y = y, x = x) mod <- gam( y ~ s(x, k = 16), data = mod_data, family = gaussian() ) library(marginaleffects) plot_predictions( mod, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam( y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) plot_predictions( mod2, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c( f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2 ) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam( y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) # Visualise the different monotonic functions plot_predictions( mod, condition = c('x', 'fac', 'fac'), points = 0.5 ) plot(mod, type = 'smooth', realisations = TRUE) # First derivatives (on the link scale) should never be # negative for either factor level (derivs <- slopes( mod, variables = 'x', by = c('x', 'fac'), type = 'link' )) all(derivs$estimate > 0) } } \references{ Wang, Wenjie, and Jun Yan. "Shape-Restricted Regression Splines with R Package splines2." Journal of Data Science 19.3 (2021). \cr \cr Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3(4), 425--441. } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam-class.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam-class.R \name{mvgam-class} \alias{mvgam-class} \title{Fitted \code{mvgam} object description} \description{ A fitted \code{mvgam} object returned by function \code{\link{mvgam}}. Run \code{methods(class = "mvgam")} to see an overview of available methods. } \details{ A \code{mvgam} object contains the following elements: \itemize{ \item \code{call} the original observation model formula \item \code{trend_call} If a \verb{trend_formula was supplied}, the original trend model formula is returned. Otherwise \code{NULL} \item \code{family} \code{character} description of the observation distribution \item \code{trend_model} \code{character} description of the latent trend model \item \code{trend_map} \code{data.frame} describing the mapping of trend states to observations, if supplied in the original model. Otherwise \code{NULL} \item \code{drift} Logical specifying whether a drift term was used in the trend model \item \code{priors} If the model priors were updated from their defaults, the prior \code{dataframe} will be returned. Otherwise \code{NULL} \item \code{model_output} The \code{MCMC} object returned by the fitting engine. If the model was fitted using \code{Stan}, this will be an object of class \code{stanfit} (see \code{\link[rstan]{stanfit-class}} for details). If \code{JAGS} was used as the backend, this will be an object of class \code{runjags} (see \code{\link[runjags]{runjags-class}} for details) \item \code{model_file} The \code{character} string model file used to describe the model in either \code{Stan} or \code{JAGS} syntax \item \code{model_data} If \code{return_model_data} was set to \code{TRUE} when fitting the model, the \code{list} object containing all data objects needed to condition the model is returned. Each item in the \code{list} is described in detail at the top of the \code{model_file}. Otherwise \code{NULL} \item \code{inits} If \code{return_model_data} was set to \code{TRUE} when fitting the model, the initial value functions used to initialise the MCMC chains will be returned. Otherwise \code{NULL} \item \code{monitor_pars} The parameters that were monitored during MCMC sampling are returned as a \verb{character vector} \item \code{sp_names} A \verb{character vector} specifying the names for each smoothing parameter \item \code{mgcv_model} An object of class \code{gam} containing the \code{mgcv} version of the observation model. This object is used for generating the linear predictor matrix when making predictions for new data. The coefficients in this model object will contain the posterior median coefficients from the GAM linear predictor, but these are only used if generating plots of smooth functions that \code{mvgam} currently cannot handle (such as plots for three-dimensional smooths). This model therefore should not be used for inference. See \code{\link[mgcv]{gamObject}} for details \item \code{trend_mgcv_model} If a \verb{trend_formula was supplied}, an object of class \code{gam} containing the \code{mgcv} version of the trend model. Otherwise \code{NULL} \item \code{ytimes} The \code{matrix} object used in model fitting for indexing which series and timepoints were observed in each row of the supplied data. Used internally by some downstream plotting and prediction functions \item \code{resids} A named \code{list} object containing posterior draws of Dunn-Smyth randomized quantile residuals \item \code{use_lv} Logical flag indicating whether latent dynamic factors were used in the model \item \code{n_lv} If \code{use_lv == TRUE}, the number of latent dynamic factors used in the model \item \code{upper_bounds} If bounds were supplied in the original model fit, they will be returned. Otherwise \code{NULL} \item \code{obs_data} The original data object (either a \code{list} or \code{dataframe}) supplied in model fitting. \item \code{test_data} If test data were supplied (as argument \code{newdata} in the original model), it will be returned. Othwerise \code{NULL} \item \code{fit_engine} \code{Character} describing the fit engine, either as \code{stan} or \code{jags} \item \code{backend} \code{Character} describing the backend used for modelling, either as \code{rstan}, \code{cmdstanr} or \code{rjags} \item \code{algorithm} \code{Character} describing the algorithm used for finding the posterior, either as \code{sampling}, \code{laplace}, \code{pathfinder}, \code{meanfield} or \code{fullrank} \item \code{max_treedepth} If the model was fitted using \code{Stan}, the value supplied for the maximum treedepth tuning parameter is returned (see \code{\link[rstan]{stan}} for details). Otherwise \code{NULL} \item \code{adapt_delta} If the model was fitted using \code{Stan}, the value supplied for the adapt_delta tuning parameter is returned (see \code{\link[rstan]{stan}} for details). Otherwise \code{NULL} } } \seealso{ \link{mvgam} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam-package.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam-package.R \docType{package} \name{mvgam-package} \alias{mvgam-package} \title{mvgam: Multivariate (Dynamic) Generalized Additive Models} \description{ \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) \doi{10.1111/2041-210X.13974}. } \seealso{ Useful links: \itemize{ \item \url{https://github.com/nicholasjclark/mvgam} \item \url{https://nicholasjclark.github.io/mvgam/} \item Report bugs at \url{https://github.com/nicholasjclark/mvgam/issues} } } \author{ \strong{Maintainer}: Nicholas J Clark \email{nicholas.j.clark1214@gmail.com} (\href{https://orcid.org/0000-0001-7131-3301}{ORCID}) Other contributors: \itemize{ \item KANK Karunarathna (\href{https://orcid.org/0000-0002-8995-5502}{ORCID}) (ARMA parameterisations and factor models) [contributor] \item Sarah Heaps (\href{https://orcid.org/0000-0002-5543-037X}{ORCID}) (VARMA parameterisations) [contributor] \item Scott Pease (\href{https://orcid.org/0009-0006-8977-9285}{ORCID}) (broom enhancements) [contributor] \item Matthijs Hollanders (\href{https://orcid.org/0000-0003-0796-1018}{ORCID}) (ggplot visualizations) [contributor] } } \keyword{internal} ================================================ FILE: man/mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam.R \name{mvgam} \alias{mvgam} \title{Fit a Bayesian Dynamic GAM to Univariate or Multivariate Time Series} \usage{ mvgam( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... ) } \arguments{ \item{formula}{A \code{formula} object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, \code{s()}, \code{te()}, \code{ti()}, \code{t2()}, as well as time-varying \code{dynamic()} terms, nonparametric \code{gp()} terms and offsets using \code{offset()}, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In \code{nmix()} family models, the \code{formula} is used to set up a linear predictor for the detection probability. Details of the formula syntax used by \pkg{mvgam} can be found in \code{\link{mvgam_formulae}}} \item{trend_formula}{An optional \code{formula} object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. \strong{Important notes:} \itemize{ \item Should not have a response variable specified on the left-hand side (e.g., \code{~ season + s(year)}) \item Use \code{trend} instead of \code{series} for effects that vary across time series \item Only available for \code{RW()}, \code{AR()} and \code{VAR()} trend models \item In \code{nmix()} family models, sets up linear predictor for latent abundance \item Consider dropping one intercept using \code{- 1} convention to avoid estimation challenges }} \item{knots}{An optional \code{list} containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied. Different terms can use different numbers of knots, unless they share a covariate.} \item{trend_knots}{As for \code{knots} above, this is an optional \code{list} of knot values for smooth functions within the \code{trend_formula}.} \item{trend_model}{\code{character} or \code{function} specifying the time series dynamics for the latent trend. \strong{Available options:} \itemize{ \item \code{None}: No latent trend component (GAM component only, like \code{\link[mgcv]{gam}}) \item \code{ZMVN} or \code{ZMVN()}: Zero-Mean Multivariate Normal (Stan only) \item \code{'RW'} or \code{RW()}: Random Walk \item \code{'AR1'}, \code{'AR2'}, \code{'AR3'} or \code{AR(p = 1, 2, 3)}: Autoregressive models \item \code{'CAR1'} or \code{CAR(p = 1)}: Continuous-time AR (Ornstein–Uhlenbeck process) \item \code{'VAR1'} or \code{VAR()}: Vector Autoregressive (Stan only) \item \code{'PWlogistic'}, \code{'PWlinear'} or \code{PW()}: Piecewise trends (Stan only) \item \code{'GP'} or \code{GP()}: Gaussian Process with squared exponential kernel (Stan only) } \strong{Additional features:} \itemize{ \item Moving average and/or correlated process error terms available for most types (e.g., \code{RW(cor = TRUE)} for multivariate Random Walk) \item Hierarchical correlations possible for structured data \item See \link{mvgam_trends} for details and \code{\link[=ZMVN]{ZMVN()}} for examples }} \item{noncentred}{\code{logical}. Use non-centred parameterisation for autoregressive trend models? Can improve efficiency by avoiding degeneracies in latent dynamic random effects estimation. Benefits vary by model - highly informative data may perform worse with this option. Available for \code{RW()}, \code{AR()}, \code{CAR()}, or \code{trend = 'None'} with \code{trend_formula}. Not available for moving average or correlated error models.} \item{family}{\code{family} specifying the exponential observation family for the series. \strong{Supported families:} \itemize{ \item \code{gaussian()}: Real-valued data \item \code{betar()}: Proportional data on \verb{(0,1)} \item \code{lognormal()}: Non-negative real-valued data \item \code{student_t()}: Real-valued data \item \code{Gamma()}: Non-negative real-valued data \item \code{bernoulli()}: Binary data \item \code{poisson()}: Count data (default) \item \code{nb()}: Overdispersed count data \item \code{binomial()}: Count data with imperfect detection when number of trials is known (use \code{cbind()} to bind observations and trials) \item \code{beta_binomial()}: As \code{binomial()} but allows for overdispersion \item \code{nmix()}: Count data with imperfect detection when number of trials is unknown (State-Space N-Mixture model with Poisson latent states and Binomial observations) } See \code{\link{mvgam_families}} for more details.} \item{share_obs_params}{\code{logical}. If \code{TRUE} and the \code{family} has additional family-specific observation parameters (e.g., variance components, dispersion parameters), these will be shared across all outcome variables. Useful when multiple outcomes share properties. Default is \code{FALSE}.} \item{data}{A \code{dataframe} or \code{list} containing the model response variable and covariates required by the GAM \code{formula} and optional \code{trend_formula}. \strong{Required columns for most models:} \itemize{ \item \code{series}: A \code{factor} index of the series IDs (number of levels should equal number of unique series labels) \item \code{time}: \code{numeric} or \code{integer} index of time points. For most dynamic trend types, time should be measured in discrete, regularly spaced intervals (i.e., \code{c(1, 2, 3, ...)}). Irregular spacing is allowed for \code{trend_model = CAR(1)}, but zero intervals are adjusted to \code{1e-12} to prevent sampling errors. } \strong{Special cases:} \itemize{ \item Models with hierarchical temporal correlation (e.g., \code{AR(gr = region, subgr = species)}) should NOT include a \code{series} identifier \item Models without temporal dynamics (\code{trend_model = 'None'} or \code{trend_model = ZMVN()}) don't require a \code{time} variable }} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables as in \code{data}. If included, observations in variable \code{y} will be set to \code{NA} when fitting the model so that posterior simulations can be obtained.} \item{use_lv}{\code{logical}. If \code{TRUE}, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for \code{RW()}, \code{AR()} and \code{GP()} trend models. Default is \code{FALSE}. See \code{\link{lv_correlations}} for examples.} \item{n_lv}{\code{integer} specifying the number of latent dynamic factors to use if \code{use_lv == TRUE}. Cannot exceed \code{n_series}. Default is \code{min(2, floor(n_series / 2))}.} \item{trend_map}{Optional \code{data.frame} specifying which series should depend on which latent trends. Enables multiple series to depend on the same latent trend process with different observation processes. \strong{Required structure:} \itemize{ \item Column \code{series}: Single unique entry for each series (matching factor levels in data) \item Column \code{trend}: Integer values indicating which trend each series depends on } \strong{Notes:} \itemize{ \item Sets up latent factor model by enabling \code{use_lv = TRUE} \item Process model intercept is NOT automatically suppressed \item Not yet supported for continuous time models (\code{CAR()}) }} \item{priors}{An optional \code{data.frame} with prior definitions or, preferably, a vector of \code{brmsprior} objects (see \code{\link[brms]{prior}()}). See \code{\link[=get_mvgam_priors]{get_mvgam_priors()}} and Details for more information.} \item{run_model}{\code{logical}. If \code{FALSE}, the model is not fitted but instead the function returns the model file and the data/initial values needed to fit the model outside of \code{mvgam}.} \item{prior_simulation}{\code{logical}. If \code{TRUE}, no observations are fed to the model, and instead simulations from prior distributions are returned.} \item{residuals}{\code{logical}. Whether to compute series-level randomized quantile residuals. Default is \code{TRUE}. Set to \code{FALSE} to save time and reduce object size (can add later using \link{add_residuals}).} \item{return_model_data}{\code{logical}. If \code{TRUE}, the list of data needed to fit the model is returned, along with initial values for smooth and AR parameters, once the model is fitted. Helpful for users who wish to modify the model file to add other stochastic elements. Default is \code{FALSE} unless \code{run_model == FALSE}.} \item{backend}{Character string naming the package for Stan model fitting. Options are \code{"cmdstanr"} (default) or \code{"rstan"}. Can be set globally via \code{"brms.backend"} option. See https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/ for details.} \item{algorithm}{Character string naming the estimation approach: \itemize{ \item \code{"sampling"}: MCMC (default) \item \code{"meanfield"}: Variational inference with factorized normal distributions \item \code{"fullrank"}: Variational inference with multivariate normal distribution \item \code{"laplace"}: Laplace approximation (cmdstanr only) \item \code{"pathfinder"}: Pathfinder algorithm (cmdstanr only) } Can be set globally via \code{"brms.algorithm"} option. Limited testing suggests \code{"meanfield"} performs best among non-MCMC approximations for dynamic GAMs.} \item{control}{Named \code{list} for controlling sampler behaviour. Valid elements include \code{max_treedepth}, \code{adapt_delta} and \code{init}.} \item{chains}{\code{integer} specifying the number of parallel chains for the model. Ignored for variational inference algorithms.} \item{burnin}{\code{integer} specifying the number of warmup iterations to tune sampling algorithms. Ignored for variational inference algorithms.} \item{samples}{\code{integer} specifying the number of post-warmup iterations for sampling the posterior distribution.} \item{thin}{Thinning interval for monitors. Ignored for variational inference algorithms.} \item{parallel}{\code{logical} specifying whether to use multiple cores for parallel MCMC simulation. If \code{TRUE}, uses \code{min(c(chains, parallel::detectCores() - 1))} cores.} \item{threads}{\code{integer}. Experimental option for within-chain parallelisation in Stan using \code{reduce_sum}. Recommended only for experienced Stan users with slow models. Currently works for all families except \code{nmix()} and when using Cmdstan backend.} \item{save_all_pars}{\code{logical}. Save draws from all variables defined in Stan's \code{parameters} block. Default is \code{FALSE}.} \item{silent}{Verbosity level between \code{0} and \code{2}. If \code{1} (default), most informational messages are suppressed. If \code{2}, even more messages are suppressed. Sampling progress is still printed - set \code{refresh = 0} to disable. For \code{backend = "rstan"}, also set \code{open_progress = FALSE} to prevent additional progress bars.} \item{autoformat}{\code{logical}. Use \code{stanc} parser to automatically format Stan code and check for deprecations. For development purposes - leave as \code{TRUE}.} \item{refit}{\code{logical}. Indicates whether this is a refit called using \code{\link[=update.mvgam]{update.mvgam()}}. Users should leave as \code{FALSE}.} \item{lfo}{\code{logical}. Indicates whether this is part of \link{lfo_cv.mvgam} call. Returns lighter model version for speed. Users should leave as \code{FALSE}.} \item{...}{Further arguments passed to Stan: \itemize{ \item For \code{backend = "rstan"}: passed to \code{\link[rstan]{sampling}()} or \code{\link[rstan]{vb}()} \item For \code{backend = "cmdstanr"}: passed to \code{cmdstanr::sample}, \code{cmdstanr::variational}, \code{cmdstanr::laplace} or \code{cmdstanr::pathfinder} methods }} } \value{ A \code{list} object of class \code{mvgam} containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See \code{\link{mvgam-class}} for details. Use \code{methods(class = "mvgam")} for an overview on available methods. } \description{ This function estimates the posterior distribution for Generalised Additive Models (GAMs) that can include smooth spline functions, specified in the GAM formula, as well as latent temporal processes, specified by \code{trend_model}. Further modelling options include State-Space representations to allow covariates and dynamic processes to occur on the latent 'State' level while also capturing observation-level effects. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fits can easily be assessed and compared with posterior predictive checks, forecast comparisons and leave-one-out / leave-future-out cross-validation. } \details{ Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence but we do not want to rely on extrapolating from a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours). In addition, smooths can often try to wiggle excessively to capture any autocorrelation that is present in a time series, which exacerbates the problem of forecasting ahead. As GAMs are very naturally viewed through a Bayesian lens, and we often must model time series that show complex distributional features and missing data, parameters for \pkg{mvgam} models are estimated in a Bayesian framework using Markov Chain Monte Carlo by default. \strong{Getting Started Resources:} \itemize{ \item General overview: \code{vignette("mvgam_overview")} and \code{vignette("data_in_mvgam")} \item Full list of vignettes: \code{vignette(package = "mvgam")} \item Real-world examples: \code{\link{mvgam_use_cases}} \item Quick reference: \href{https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf}{mvgam cheatsheet} } } \section{Model Specification Details}{ \strong{Formula Syntax:} Details of the formula syntax used by \pkg{mvgam} can be found in \code{\link{mvgam_formulae}}. Note that it is possible to supply an empty formula where there are no predictors or intercepts in the observation model (i.e. \code{y ~ 0} or \code{y ~ -1}). In this case, an intercept-only observation model will be set up but the intercept coefficient will be fixed at zero. This can be handy if you wish to fit pure State-Space models where the variation in the dynamic trend controls the average expectation, and/or where intercepts are non-identifiable (as in piecewise trends). \strong{Families and Link Functions:} Details of families supported by \pkg{mvgam} can be found in \code{\link{mvgam_families}}. \strong{Trend Models:} Details of latent error process models supported by \pkg{mvgam} can be found in \code{\link{mvgam_trends}}. } \section{Prior Specifications}{ Default priors for intercepts and any variance parameters are chosen to be vaguely informative, but these should always be checked by the user. Prior distributions for most important model parameters can be altered (see \code{\link[=get_mvgam_priors]{get_mvgam_priors()}} for details). Note that latent trends are estimated on the link scale so choose priors accordingly. However more control over the model specification can be accomplished by setting \code{run_model = FALSE} and then editing the model code (found in the \code{model_file} slot in the returned object) before running the model using either \pkg{rstan} or \pkg{cmdstanr}. This is encouraged for complex modelling tasks. \strong{Important:} No priors are formally checked to ensure they are in the right syntax so it is up to the user to ensure these are correct. } \section{Model Components}{ \strong{Random Effects:} For any smooth terms using the random effect basis (\code{\link[mgcv]{smooth.construct.re.smooth.spec}}), a non-centred parameterisation is automatically employed to avoid degeneracies that are common in hierarchical models. Note however that centred versions may perform better for series that are particularly informative, so as with any foray into Bayesian modelling, it is worth building an understanding of the model's assumptions and limitations by following a principled workflow. Also note that models are parameterised using \code{drop.unused.levels = FALSE} in \code{\link[mgcv]{jagam}} to ensure predictions can be made for all levels of the supplied factor variable. \strong{Observation Level Parameters:} When more than one series is included in \code{data} and an observation family that contains more than one parameter is used, additional observation family parameters (i.e. \code{phi} for \code{nb()} or \code{sigma} for \code{gaussian()}) are by default estimated independently for each series. But if you wish for the series to share the same observation parameters, set \code{share_obs_params = TRUE}. } \section{Model Diagnostics}{ \strong{Residuals:} For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics. If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution. } \section{Computational Backend}{ \strong{Using Stan:} \pkg{mvgam} is primarily designed to use Hamiltonian Monte Carlo for parameter estimation via the software \code{Stan} (using either the \code{cmdstanr} or \code{rstan} interface). There are great advantages when using \code{Stan} over Gibbs / Metropolis Hastings samplers, which includes the option to estimate nonlinear effects via \href{https://arxiv.org/abs/2004.11408}{Hilbert space approximate Gaussian Processes}, the availability of a variety of inference algorithms (i.e. variational inference, laplacian inference etc...) and \href{https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648}{capabilities to enforce stationarity for complex Vector Autoregressions}. Because of the many advantages of \code{Stan} over \code{JAGS}, \strong{further development of the package will only be applied to \code{Stan}}. This includes the planned addition of more response distributions, plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged to opt for \code{Stan} over \code{JAGS} in any proceeding workflows. } \section{Recommended Workflow}{ \strong{How to Start:} The \href{https://github.com/nicholasjclark/mvgam/raw/master/misc/mvgam_cheatsheet.pdf}{\code{mvgam} cheatsheet} is a good starting place if you are just learning to use the package. It gives an overview of the package's key functions and objects, as well as providing a reasonable workflow that new users can follow. \strong{Recommended Steps:} \enumerate{ \item \strong{Data Preparation:} Check that your data are in a suitable tidy format for \pkg{mvgam} modeling (see the \href{https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html}{data formatting vignette} for guidance) \item \strong{Data Exploration:} Inspect features of the data using \code{\link{plot_mvgam_series}}. Now is also a good time to familiarise yourself with the package's example workflows that are detailed in the vignettes: \itemize{ \item \href{https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html}{Getting started vignette} \item \href{https://nicholasjclark.github.io/mvgam/articles/shared_states.html}{Shared latent states vignette} \item \href{https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html}{Time-varying effects vignette} \item \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{State-Space models vignette} \item \href{https://nicholasjclark.github.io/mvgam/articles/nmixtures.html}{"Fitting N-mixture models in \code{mvgam}"} \item \href{https://nicholasjclark.github.io/mvgam/reference/jsdgam.html}{"Joint Species Distribution Models in \code{mvgam}"} \item \href{https://ecogambler.netlify.app/blog/time-varying-seasonality/}{"Incorporating time-varying seasonality in forecast models"} \item \href{https://ecogambler.netlify.app/blog/autocorrelated-gams/}{"Temporal autocorrelation in GAMs and the \code{mvgam} package"} } \item \strong{Model Structure:} Carefully think about how to structure linear predictor effects (i.e. smooth terms using \code{\link[mgcv]{s}()}, \code{\link[mgcv]{te}()} or \code{\link[mgcv]{ti}()}, GPs using \code{\link[brms]{gp}()}, dynamic time-varying effects using \code{\link[=dynamic]{dynamic()}}, and parametric terms), latent temporal trend components (see \code{\link{mvgam_trends}}) and the appropriate observation family (see \code{\link{mvgam_families}}). Use \code{\link[=get_mvgam_priors]{get_mvgam_priors()}} to see default prior distributions for stochastic parameters. \item \strong{Prior Specification:} Change default priors using appropriate prior knowledge (see \code{\link[brms]{prior}()}). When using State-Space models with a \code{trend_formula}, pay particular attention to priors for any variance parameters such as process errors and observation errors. Default priors on these parameters are chosen to be vaguely informative and to avoid zero (using Inverse Gamma priors), but more informative priors will often help with model efficiency and convergence. \item \strong{Model Fitting:} Fit the model using either Hamiltonian Monte Carlo or an approximation algorithm (i.e. change the \code{backend} argument) and use \code{\link[=summary.mvgam]{summary.mvgam()}}, \code{\link[=conditional_effects.mvgam]{conditional_effects.mvgam()}}, \code{\link[=mcmc_plot.mvgam]{mcmc_plot.mvgam()}}, \code{\link[=pp_check.mvgam]{pp_check.mvgam()}}, \code{\link[=pairs.mvgam]{pairs.mvgam()}} and \code{\link[=plot.mvgam]{plot.mvgam()}} to inspect / interrogate the model. \item \strong{Model Comparison:} Update the model as needed and use \code{\link[=loo_compare.mvgam]{loo_compare.mvgam()}} for in-sample model comparisons, or alternatively use \code{\link[=forecast.mvgam]{forecast.mvgam()}}, \code{\link[=lfo_cv.mvgam]{lfo_cv.mvgam()}} and \code{\link[=score.mvgam_forecast]{score.mvgam_forecast()}} to compare models based on out-of-sample forecasts (see the \href{https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html}{forecast evaluation vignette} for guidance). \item \strong{Inference and Prediction:} When satisfied with the model structure, use \code{\link[=predict.mvgam]{predict.mvgam()}}, \code{\link[marginaleffects]{plot_predictions}()} and/or \code{\link[marginaleffects]{plot_slopes}()} for more targeted simulation-based inferences (see \href{https://ecogambler.netlify.app/blog/interpreting-gams/}{"How to interpret and report nonlinear effects from Generalized Additive Models"} for some guidance on interpreting GAMs). For time series models, use \code{\link[=hindcast.mvgam]{hindcast.mvgam()}}, \code{\link[=fitted.mvgam]{fitted.mvgam()}}, \code{\link[=augment.mvgam]{augment.mvgam()}} and \code{\link[=forecast.mvgam]{forecast.mvgam()}} to inspect posterior hindcast / forecast distributions. \item \strong{Documentation:} Use \code{\link[=how_to_cite]{how_to_cite()}} to obtain a scaffold methods section (with full references) to begin describing this model in scientific publications. } } \examples{ \dontrun{ # ============================================================================= # Basic Multi-Series Time Series Modeling # ============================================================================= # Simulate three time series that have shared seasonal dynamics, # independent AR(1) trends, and Poisson observations set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = "all") # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent AR1 temporal process. # Note that 'noncentred = TRUE' will likely give performance gains. # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, run_model = FALSE ) # View the model code in Stan language stancode(mod1) # View the data objects needed to fit the model in Stan sdata1 <- standata(mod1) str(sdata1) # Now fit the model mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) # Extract the model summary summary(mod1) # Plot the historical trend and hindcast distributions for one series hc_trend <- hindcast(mod1, type = "trend") plot(hc_trend) hc_predicted <- hindcast(mod1, type = "response") plot(hc_predicted) # Residual diagnostics plot(mod1, type = "residuals", series = 1) resids <- residuals(mod1) str(resids) # Fitted values and residuals can be added directly to the training data augment(mod1) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) fc_summary <- summary(fc) head(fc_summary, 12) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = "smooths") # Plot estimated first derivatives of the smooth plot(mod1, type = "smooths", derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = "smooths", residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = "smooths", realisations = TRUE) # Plot conditional response predictions using marginaleffects conditional_effects(mod1) plot_predictions(mod1, condition = "season", points = 0.5) # Generate posterior predictive checks using bayesplot pp_check(mod1) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = "betas") head(beta_draws_df) str(beta_draws_df) # Investigate model fit mc.cores.def <- getOption("mc.cores") options(mc.cores = 1) loo(mod1) options(mc.cores = mc.cores.def) # ============================================================================= # Vector Autoregressive (VAR) Models # ============================================================================= # Fit a model to the portal time series that uses a latent # Vector Autoregression of order 1 mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Plot the process error variance-covariance matrix in the same way mcmc_plot( mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calculate Generalized Impulse Response Functions for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot some of them plot(irfs, series = 1) plot(irfs, series = 2) # Calculate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # ============================================================================= # Dynamic Factor Models # ============================================================================= # Now fit a model that uses two RW dynamic factors to model # the temporal dynamics of the four rodent species mod <- mvgam( captures ~ series, trend_model = RW(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the factors plot(mod, type = 'factors') # Plot the hindcast distributions hcs <- hindcast(mod) plot(hcs, series = 1) plot(hcs, series = 2) plot(hcs, series = 3) plot(hcs, series = 4) # Use residual_cor() to calculate temporal correlations among the series # based on the factor loadings lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # For those correlations whose credible intervals did not include # zero, plot them as a correlation matrix (all other correlations # are shown as zero on this plot) plot(lvcors, cluster = TRUE) # ============================================================================= # Shared Latent Trends with Custom Trend Mapping # ============================================================================= # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has its own unique latent trend trend_map <- data.frame( series = unique(mod_data$series), trend = c(1, 1, 2) ) # Fit the model using AR1 trends mod <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), trend_map = trend_map, trend_model = AR(), data = mod_data, return_model_data = TRUE, chains = 2, silent = 2 ) # The mapping matrix is now supplied as data to the model in the 'Z' element mod$model_data$Z # The first two series share an identical latent trend; the third is different plot(residual_cor(mod)) plot(mod, type = "trend", series = 1) plot(mod, type = "trend", series = 2) plot(mod, type = "trend", series = 3) # ============================================================================= # Time-Varying (Dynamic) Coefficients # ============================================================================= # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(123) N <- 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for (i in 2:N) { beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) } plot(beta_temp) # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate some noisy Gaussian observations out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5 ) # Gather necessary data into a data.frame; split into training / testing data <- data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180, ] data_test <- data[181:200, ] # Fit the model using the dynamic() function mod <- mvgam( formula = out ~ dynamic( temp, scale = FALSE, k = 40 ), family = gaussian(), data = data_train, newdata = data_test, chains = 2, silent = 2 ) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = "smooths") fc <- forecast(mod, newdata = data_test) plot(fc) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = "dashed", lwd = 2) points(beta_temp, pch = 16) # ============================================================================= # Working with Offset Terms # ============================================================================= # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam( prop_trend = 0, mu = c(0, 2, 2), seasonality = "hierarchical" ) # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam( formula = y ~ offset(offset) + s(series, bs = "re") + s(season, bs = "cc") + s(season, by = series, m = 1, k = 5), data = dat$data_train, chains = 2, silent = 2 ) # Inspect the model file to see the modification to the linear predictor (eta) stancode(mod) # Forecasts for the first two series will differ in magnitude fc <- forecast(mod, newdata = dat$data_test) plot(fc, series = 1, ylim = c(0, 75)) plot(fc, series = 2, ylim = c(0, 75)) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 fc <- forecast(mod, newdata = dat$data_test) plot(fc) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = "link", newdata = dat$data_test, summary = FALSE ) series1_inds <- which(dat$data_test$series == "series_1") series2_inds <- which(dat$data_test$series == "series_2") # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = "l", col = "grey75", ylim = range(preds_rr), ylab = "Series1 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series1_inds], col = "grey75") } plot(preds_rr[1, series2_inds], type = "l", col = "darkred", ylim = range(preds_rr), ylab = "Series2 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series2_inds], col = "darkred") } layout(1) # ============================================================================= # Binomial Family Models # ============================================================================= # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = "series1", x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = "series2", x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) plot_mvgam_series(data = dat, series = "all") # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( formula = cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat, chains = 2, silent = 2 ) summary(mod) pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) conditional_effects(mod, type = "link") # To view predictions on the probability scale, # use ntrials = 1 in datagrid() plot_predictions( mod, by = c('x', 'series'), newdata = datagrid( x = runif(100, -2, 2), series = unique, ntrials = 1 ), type = 'expected' ) # Not needed for general use; cleans up connections for automated testing closeAllConnections() } } \references{ Nicholas J Clark & Konstans Wells (2023). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784. Nicholas J Clark, SK Morgan Ernest, Henry Senyondo, Juniper Simonis, Ethan P White, Glenda M Yenni, KANK Karunarathna (2025). Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. 13:e18929 https://doi.org/10.7717/peerj.18929 } \seealso{ \code{\link[mgcv]{jagam}()}, \code{\link[mgcv]{gam}()}, \code{\link[mgcv]{gam.models}}, \code{\link[=get_mvgam_priors]{get_mvgam_priors()}}, \code{\link[=jsdgam]{jsdgam()}}, \code{\link[=hindcast.mvgam]{hindcast.mvgam()}}, \code{\link[=forecast.mvgam]{forecast.mvgam()}}, \code{\link[=predict.mvgam]{predict.mvgam()}} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_diagnostics.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_diagnostics.R \name{mvgam_diagnostics} \alias{mvgam_diagnostics} \alias{nuts_params} \alias{rhat} \alias{neff_ratio} \alias{nuts_params.mvgam} \alias{log_posterior.mvgam} \alias{rhat.mvgam} \alias{neff_ratio.mvgam} \title{Extract diagnostic quantities of \pkg{mvgam} models} \usage{ \method{nuts_params}{mvgam}(object, pars = NULL, ...) \method{log_posterior}{mvgam}(object, ...) \method{rhat}{mvgam}(x, pars = NULL, ...) \method{neff_ratio}{mvgam}(object, pars = NULL, ...) } \arguments{ \item{object, x}{A \code{mvgam} or \code{jsdgam} object.} \item{pars}{An optional character vector of parameter names. For \code{nuts_params} these will be NUTS sampler parameter names rather than model parameters. If pars is omitted all parameters are included.} \item{...}{Arguments passed to individual methods.} } \value{ The exact form of the output depends on the method. } \description{ Extract quantities that can be used to diagnose sampling behavior of the algorithms applied by \pkg{Stan} at the back-end of \pkg{mvgam}. } \details{ For more details see \code{\link[bayesplot:bayesplot-extractors]{bayesplot-extractors}}. } \examples{ \dontrun{ simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == "divergent__")$Value) head(neff_ratio(mod)) } } ================================================ FILE: man/mvgam_draws.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/as.data.frame.mvgam.R \name{mvgam_draws} \alias{mvgam_draws} \alias{as.data.frame.mvgam} \alias{as.matrix.mvgam} \alias{as.array.mvgam} \alias{as_draws.mvgam} \alias{as_draws_matrix.mvgam} \alias{as_draws_df.mvgam} \alias{as_draws_array.mvgam} \alias{as_draws_list.mvgam} \alias{as_draws_rvars.mvgam} \title{Extract posterior draws from fitted \pkg{mvgam} objects} \usage{ \method{as.data.frame}{mvgam}( x, row.names = NULL, optional = TRUE, variable = "betas", use_alias = TRUE, regex = FALSE, ... ) \method{as.matrix}{mvgam}(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) \method{as.array}{mvgam}(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) \method{as_draws}{mvgam}( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) \method{as_draws_matrix}{mvgam}( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) \method{as_draws_df}{mvgam}( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) \method{as_draws_array}{mvgam}( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) \method{as_draws_list}{mvgam}( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) \method{as_draws_rvars}{mvgam}(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...) } \arguments{ \item{x}{\code{list} object of class \code{mvgam}} \item{row.names}{Ignored} \item{optional}{Ignored} \item{variable}{A character specifying which parameters to extract. Can either be one of the following options: \itemize{ \item \code{obs_params} (other parameters specific to the observation model, such as overdispersions for negative binomial models or observation error SD for gaussian / student-t models) \item \code{betas} (beta coefficients from the GAM observation model linear predictor; default) \item \code{smooth_params} (smoothing parameters from the GAM observation model) \item \code{linpreds} (estimated linear predictors on whatever link scale was used in the model) \item \code{trend_params} (parameters governing the trend dynamics, such as AR parameters, trend SD parameters or Gaussian Process parameters) \item \code{trend_betas} (beta coefficients from the GAM latent process model linear predictor; only available if a \code{trend_formula} was supplied in the original model) \item \code{trend_smooth_params} (process model GAM smoothing parameters; only available if a \code{trend_formula} was supplied in the original model) \item \code{trend_linpreds} (process model linear predictors on the identity scale; only available if a \code{trend_formula} was supplied in the original model) } OR can be a character vector providing the variables to extract.} \item{use_alias}{Logical. If more informative names for parameters are available (i.e. for beta coefficients \code{b} or for smoothing parameters \code{rho}), replace the uninformative names with the more informative alias. Defaults to \code{TRUE}.} \item{regex}{Logical. If not using one of the prespecified options for extractions, should \code{variable} be treated as a (vector of) regular expressions? Any variable in \code{x} matching at least one of the regular expressions will be selected. Defaults to \code{FALSE}.} \item{...}{Ignored} \item{inc_warmup}{Should warmup draws be included? Defaults to \code{FALSE}.} } \value{ A \code{data.frame}, \code{matrix}, or \code{array} containing the posterior draws. } \description{ Extract posterior draws in conventional formats as data.frames, matrices, or arrays. } \examples{ \dontrun{ sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = sim$data_train, family = Gamma(), chains = 2, silent = 2 ) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars) } } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_families.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/families.R \name{mvgam_families} \alias{mvgam_families} \alias{tweedie} \alias{student_t} \alias{betar} \alias{nb} \alias{lognormal} \alias{student} \alias{bernoulli} \alias{beta_binomial} \alias{nmix} \title{Supported \pkg{mvgam} families} \usage{ tweedie(link = "log") student_t(link = "identity") betar(...) nb(...) lognormal(...) student(...) bernoulli(...) beta_binomial(...) nmix(link = "log") } \arguments{ \item{link}{a specification for the family link function. At present these cannot be changed} \item{...}{Arguments to be passed to the \pkg{mgcv} version of the associated functions} } \value{ Objects of class \code{family} } \description{ Supported \pkg{mvgam} families } \details{ \code{mvgam} currently supports the following standard observation families: \itemize{ \item \code{\link[stats]{gaussian}} with identity link, for real-valued data \item \code{\link[stats]{poisson}} with log-link, for count data \item \code{\link[stats]{Gamma}} with log-link, for non-negative real-valued data \item \code{\link[stats]{binomial}} with logit-link, for count data when the number of trials is known (and must be supplied) } In addition, the following extended families from the \code{mgcv} and \code{brms} packages are supported: \itemize{ \item \code{\link[mgcv]{betar}} with logit-link, for proportional data on \verb{(0,1)} \item \code{\link[mgcv]{nb}} with log-link, for count data \item \code{\link[brms]{lognormal}} with identity-link, for non-negative real-valued data \item \code{\link[brms]{bernoulli}} with logit-link, for binary data \item \code{\link[brms]{beta_binomial}} with logit-link, as for \code{binomial()} but allows for overdispersion } Finally, \code{mvgam} supports the three extended families described here: \itemize{ \item \code{tweedie} with log-link, for count data (power parameter \code{p} fixed at \code{1.5}) \item \code{student_t()} (or \code{\link[brms]{student}}) with identity-link, for real-valued data \item \code{nmix} for count data with imperfect detection modeled via a State-Space N-Mixture model. The latent states are Poisson (with log link), capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. The observation \code{formula} in these models is used to set up a linear predictor for the detection probability (with logit link). See the example below for a more detailed worked explanation of the \code{nmix()} family } Only \code{poisson()}, \code{nb()}, and \code{tweedie()} are available if using \code{JAGS}. All families, apart from \code{tweedie()}, are supported if using \code{Stan}. Note that currently it is not possible to change the default link functions in \pkg{mvgam}, so any call to change these will be silently ignored } \examples{ \dontrun{ # ============================================================================= # N-mixture Models # ============================================================================= set.seed(999) # Simulate observations for species 1, which shows a declining trend and # 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) \%>\% # add 'series' information, which is an identifier of site, replicate # and species dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 80 ) \%>\% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a # smaller detection probability (0.45 for this species) testdat <- testdat \%>\% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) \%>\% dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), cap = 50 ) \%>\% dplyr::select(-replicate) ) # series identifiers testdat$species <- factor( testdat$species, levels = unique(testdat$species) ) testdat$series <- factor( testdat$series, levels = unique(testdat$series) ) # The trend_map to state how replicates are structured testdat \%>\% # each unique combination of site*species is a separate process dplyr::mutate( trend = as.numeric(factor(paste0(site, species))) ) \%>\% dplyr::select(trend, series) \%>\% dplyr::distinct() -> trend_map trend_map # Fit a model mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), chains = 2 ) # The usual diagnostics summary(mod) # Plotting conditional effects library(ggplot2) plot_predictions( mod, condition = 'species', type = 'detection' ) + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') # ============================================================================= # Binomial Models # ============================================================================= # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat ) summary(mod) } } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_fevd-class.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_fevd-class.R \name{mvgam_fevd-class} \alias{mvgam_fevd-class} \title{\code{mvgam_fevd} object description} \description{ A \code{mvgam_fevd} object returned by function \code{\link[=fevd]{fevd()}}. Run \code{methods(class = "mvgam_fevd")} to see an overview of available methods. } \details{ A forecast error variance decomposition is useful for quantifying the amount of information each series that in a Vector Autoregression contributes to the forecast distributions of the other series in the autoregression. This object contains the forecast error variance decomposition using the orthogonalised impulse response coefficient matrices \eqn{\Psi_h}, which can be used to quantify the contribution of series \eqn{j} to the h-step forecast error variance of series \eqn{k}: \deqn{ \sigma_k^2(h) = \sum_{j=1}^K(\psi_{kj, 0}^2 + \ldots + \psi_{kj, h-1}^2) \quad } If the orthogonalised impulse reponses \eqn{(\psi_{kj, 0}^2 + \ldots + \psi_{kj, h-1}^2)} are divided by the variance of the forecast error \eqn{\sigma_k^2(h)}, this yields an interpretable percentage representing how much of the forecast error variance for \eqn{k} can be explained by an exogenous shock to \eqn{j}. This percentage is what is calculated and returned in objects of class \code{mvgam_fevd}, where the posterior distribution of variance decompositions for each variable in the original model is contained in a separate slot within the returned \code{list} object } \references{ Lütkepohl, H (2006). New Introduction to Multiple Time Series Analysis. Springer, New York. } \seealso{ \code{\link[=mvgam]{mvgam()}}, \code{\link[=VAR]{VAR()}} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_forecast-class.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_forecast-class.R \name{mvgam_forecast-class} \alias{mvgam_forecast-class} \title{\code{mvgam_forecast} object description} \description{ A \code{mvgam_forecast} object returned by function \code{\link{hindcast}} or \code{\link{forecast}}. Run \code{methods(class = "mvgam_forecast")} to see an overview of available methods. } \details{ A \code{mvgam_forecast} object contains the following elements: \itemize{ \item \code{call} the original observation model formula \item \code{trend_call} If a \verb{trend_formula was supplied}, the original trend model formula is returned. Otherwise \code{NULL} \item \code{family} \code{character} description of the observation distribution \item \code{family_pars} \code{list} containing draws of family-specific parameters (i.e. shape, scale or overdispersion parameters). Only returned if \code{type = link}. Otherwise \code{NULL} \item \code{trend_model} \code{character} description of the latent trend model \item \code{drift} Logical specifying whether a drift term was used in the trend model \item \code{use_lv} Logical flag indicating whether latent dynamic factors were used in the model \item \code{fit_engine} \code{Character} describing the fit engine, either as \code{stan} or \code{jags} \item \code{type} The type of predictions included (either \code{link}, \code{response} or \code{trend}) \item \code{series_names} Names of the time series, taken from \code{levels(data$series)} in the original model fit \item \code{train_observations} A \code{list} of training observation vectors of length \code{n_series} \item \code{train_times} A \code{list} of the unique training times of length \code{n_series} \item \code{test_observations} If the \code{\link{forecast}} function was used, a \code{list} of test observation vectors of length \code{n_series}. Otherwise \code{NULL} \item \code{test_times} If the \code{\link{forecast}} function was used, a \code{list} of the unique testing (validation) times of length \code{n_series}. Otherwise \code{NULL} \item \code{hindcasts} A \code{list} of posterior hindcast distributions of length \code{n_series}. \item \code{forecasts} If the \code{\link{forecast}} function was used, a \code{list} of posterior forecast distributions of length \code{n_series}. Otherwise \code{NULL} } } \seealso{ \link{mvgam}, \link{hindcast.mvgam}, \link{forecast.mvgam} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_formulae.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_formulae.R \name{mvgam_formulae} \alias{mvgam_formulae} \title{Details of formula specifications in \pkg{mvgam} models} \description{ Details of formula specifications in \pkg{mvgam} models } \details{ \code{\link{mvgam}} will accept an observation model formula and an optional process model formula (via the argument \code{trend_formula}). Neither of these formulae can be specified as lists, contrary to the accepted behaviour in some \code{mgcv} or \code{brms} models. \cr \cr Note that it is possible to supply an empty formula where there are no predictors or intercepts in the observation model (i.e. \code{y ~ 0} or \code{y ~ -1}). In this case, an intercept-only observation model will be set up but the intercept coefficient will be fixed at zero. This can be handy if you wish to fit pure State-Space models where the variation in the dynamic trend controls the average expectation, and/or where intercepts are non-identifiable. \cr \cr The formulae supplied to \code{\link{mvgam}} and \code{\link{jsdgam}} are exactly like those supplied to \code{\link{glm}} except that smooth terms, \code{\link[mgcv]{s}}, \code{\link[mgcv]{te}}, \code{\link[mgcv]{ti}} and \code{\link[mgcv]{t2}}, time-varying effects using \code{\link{dynamic}}, monotonically increasing (using \code{s(x, bs = 'moi')}) or decreasing splines (using \code{s(x, bs = 'mod')}; see \code{\link{smooth.construct.moi.smooth.spec}} for details), as well as Gaussian Process functions using \code{\link[brms]{gp}} and offsets using \code{\link[stats]{offset}} can be added to the right hand side (and \code{.} is not supported in \code{mvgam} formulae). \cr \cr Further details on specifying different kinds of smooth functions, and how to control their behaviours by modifying their potential complexities and / or how the penalties behave, can be found in the extensive documentation for the \code{mgcv} package. } \seealso{ \code{\link{mvgam}}, \code{\link[mgcv]{formula.gam}}, \code{\link[mgcv]{gam.models}}, \code{\link[mgcv]{jagam}}, \code{\link[mgcv]{gam}}, \code{\link[mgcv]{s}}, \code{\link[brms]{gp}}, \code{\link[stats]{formula}} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_irf-class.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_irf-class.R \name{mvgam_irf-class} \alias{mvgam_irf-class} \title{\code{mvgam_irf} object description} \description{ A \code{mvgam_irf} object returned by function \code{\link{irf}}. Run \code{methods(class = "mvgam_irf")} to see an overview of available methods. } \details{ Generalized or Orthogonalized Impulse Response Functions can be computed using the posterior estimates of Vector Autoregressive parameters. This function generates a positive "shock" for a target process at time \code{t = 0} and then calculates how each of the remaining processes in the latent VAR are expected to respond over the forecast horizon \code{h}. The function computes IRFs for all processes in the object and returns them in an array that can be plotted using the S3 \code{plot} function. To inspect community-level metrics of stability using latent VAR processes, you can use the related \code{\link[=stability]{stability()}} function. A \code{mvgam_irf} object contains a \code{list} of posterior impulse response functions, each stored as its own \code{list} } \references{ PH Pesaran & Shin Yongcheol (1998). Generalized impulse response analysis in linear multivariate models. Economics Letters 58: 17–29. } \seealso{ \link{mvgam}, \link{VAR} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_marginaleffects.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/marginaleffects.mvgam.R \name{mvgam_marginaleffects} \alias{mvgam_marginaleffects} \alias{get_coef.mvgam} \alias{set_coef.mvgam} \alias{get_vcov.mvgam} \alias{get_predict.mvgam} \alias{get_data.mvgam} \alias{get_data.mvgam_prefit} \alias{find_predictors.mvgam} \alias{find_predictors.mvgam_prefit} \title{Helper functions for \pkg{marginaleffects} calculations in \pkg{mvgam} models} \usage{ \method{get_coef}{mvgam}(model, trend_effects = FALSE, ...) \method{set_coef}{mvgam}(model, coefs, trend_effects = FALSE, ...) \method{get_vcov}{mvgam}(model, vcov = NULL, ...) \method{get_predict}{mvgam}( model, newdata, type = "response", mfx, newparams, ndraws, se.fit, process_error = FALSE, ... ) \method{get_data}{mvgam}(x, source = "environment", verbose = TRUE, ...) \method{get_data}{mvgam_prefit}(x, source = "environment", verbose = TRUE, ...) \method{find_predictors}{mvgam}( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) \method{find_predictors}{mvgam_prefit}( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) } \arguments{ \item{model}{Model object} \item{trend_effects}{\code{logical}, extract from the process model component (only applicable if a \code{trend_formula} was specified in the model)} \item{...}{Additional arguments are passed to the \code{predict()} method supplied by the modeling package.These arguments are particularly useful for mixed-effects or bayesian models (see the online vignettes on the \code{marginaleffects} website). Available arguments can vary from model to model, depending on the range of supported arguments by each modeling package. See the "Model-Specific Arguments" section of the \code{?slopes} documentation for a non-exhaustive list of available arguments.} \item{coefs}{vector of coefficients to insert in the model object} \item{vcov}{Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values: \itemize{ \item FALSE: Do not compute standard errors. This can speed up computation considerably. \item TRUE: Unit-level standard errors using the default \code{vcov(model)} variance-covariance matrix. \item String which indicates the kind of uncertainty estimates to return. \itemize{ \item Heteroskedasticity-consistent: \code{"HC"}, \code{"HC0"}, \code{"HC1"}, \code{"HC2"}, \code{"HC3"}, \code{"HC4"}, \code{"HC4m"}, \code{"HC5"}. See \code{?sandwich::vcovHC} \item Heteroskedasticity and autocorrelation consistent: \code{"HAC"} \item Mixed-Models degrees of freedom: "satterthwaite", "kenward-roger" \item Other: \code{"NeweyWest"}, \code{"KernHAC"}, \code{"OPG"}. See the \code{sandwich} package documentation. \item "rsample", "boot", "fwb", and "simulation" are passed to the \code{method} argument of the \code{inferences()} function. To customize the bootstrap or simulation process, call \code{inferences()} directly. } \item One-sided formula which indicates the name of cluster variables (e.g., \code{~unit_id}). This formula is passed to the \code{cluster} argument of the \code{sandwich::vcovCL} function. \item Square covariance matrix \item Function which returns a covariance matrix (e.g., \code{stats::vcov(model)}) }} \item{newdata}{Grid of predictor values at which we evaluate the slopes. \itemize{ \item Warning: Please avoid modifying your dataset between fitting the model and calling a \code{marginaleffects} function. This can sometimes lead to unexpected results. \item \code{NULL} (default): Unit-level slopes for each observed value in the dataset (empirical distribution). The dataset is retrieved using \code{\link[insight:get_data]{insight::get_data()}}, which tries to extract data from the environment. This may produce unexpected results if the original data frame has been altered since fitting the model. \item \code{\link[marginaleffects:datagrid]{datagrid()}} call to specify a custom grid of regressors. For example: \itemize{ \item \code{newdata = datagrid(cyl = c(4, 6))}: \code{cyl} variable equal to 4 and 6 and other regressors fixed at their means or modes. \item See the Examples section and the \code{\link[marginaleffects:datagrid]{datagrid()}} documentation. } \item \code{\link[=subset]{subset()}} call with a single argument to select a subset of the dataset used to fit the model, ex: \code{newdata = subset(treatment == 1)} \item \code{\link[dplyr:filter]{dplyr::filter()}} call with a single argument to select a subset of the dataset used to fit the model, ex: \code{newdata = filter(treatment == 1)} \item string: \itemize{ \item "mean": Slopes evaluated when each predictor is held at its mean or mode. \item "median": Slopes evaluated when each predictor is held at its median or mode. \item "balanced": Slopes evaluated on a balanced grid with every combination of categories and numeric variables held at their means. \item "tukey": Slopes evaluated at Tukey's 5 numbers. \item "grid": Slopes evaluated on a grid of representative numbers (Tukey's 5 numbers and unique values of categorical predictors). } }} \item{type}{string indicates the type (scale) of the predictions used to compute contrasts or slopes. This can differ based on the model type, but will typically be a string such as: "response", "link", "probs", or "zero". When an unsupported string is entered, the model-specific list of acceptable values is returned in an error message. When \code{type} is \code{NULL}, the first entry in the error message is used by default.} \item{mfx}{Ignored} \item{newparams}{Ignored} \item{ndraws}{Ignored} \item{se.fit}{Ignored} \item{process_error}{\code{logical}. If \code{TRUE}, uncertainty in the latent process (or trend) model is incorporated in predictions} \item{x}{A fitted model.} \item{source}{String, indicating from where data should be recovered. If \code{source = "environment"} (default), data is recovered from the environment (e.g. if the data is in the workspace). This option is usually the fastest way of getting data and ensures that the original variables used for model fitting are returned. Note that always the \emph{current} data is recovered from the environment. Hence, if the data was modified \emph{after} model fitting (e.g., variables were recoded or rows filtered), the returned data may no longer equal the model data. If \code{source = "frame"} (or \code{"mf"}), the data is taken from the model frame. Any transformed variables are back-transformed, if possible. This option returns the data even if it is not available in the environment, however, in certain edge cases back-transforming to the original data may fail. If \code{source = "environment"} fails to recover the data, it tries to extract the data from the model frame; if \code{source = "frame"} and data cannot be extracted from the model frame, data will be recovered from the environment. Both ways only returns observations that have no missing data in the variables used for model fitting.} \item{verbose}{Toggle messages and warnings.} \item{effects}{Should model data for fixed effects (\code{"fixed"}), random effects (\code{"random"}) or both (\code{"all"}) be returned? Only applies to mixed or gee models.} \item{component}{Which type of parameters to return, such as parameters for the conditional model, the zero-inflated part of the model, the dispersion term, the instrumental variables or marginal effects be returned? Applies to models with zero-inflated and/or dispersion formula, or to models with instrumental variables (so called fixed-effects regressions), or models with marginal effects (from \strong{mfx}). See details in section \emph{Model Components} .May be abbreviated. Note that the \emph{conditional} component also refers to the \emph{count} or \emph{mean} component - names may differ, depending on the modeling package. There are three convenient shortcuts (not applicable to \emph{all} model classes): \itemize{ \item \code{component = "all"} returns all possible parameters. \item If \code{component = "location"}, location parameters such as \code{conditional}, \code{zero_inflated}, \code{smooth_terms}, or \code{instruments} are returned (everything that are fixed or random effects - depending on the \code{effects} argument - but no auxiliary parameters). \item For \code{component = "distributional"} (or \code{"auxiliary"}), components like \code{sigma}, \code{dispersion}, \code{beta} or \code{precision} (and other auxiliary parameters) are returned. }} \item{flatten}{Logical, if \code{TRUE}, the values are returned as character vector, not as list. Duplicated values are removed.} } \value{ Objects suitable for internal 'marginaleffects' functions to proceed. See \code{\link[marginaleffects:get_coef]{marginaleffects::get_coef()}}, \code{\link[marginaleffects:set_coef]{marginaleffects::set_coef()}}, \code{\link[marginaleffects:get_vcov]{marginaleffects::get_vcov()}}, \code{\link[marginaleffects:get_predict]{marginaleffects::get_predict()}}, \code{\link[insight:get_data]{insight::get_data()}} and \code{\link[insight:find_predictors]{insight::find_predictors()}} for details } \description{ Helper functions for \pkg{marginaleffects} calculations in \pkg{mvgam} models Functions needed for working with \pkg{marginaleffects} Functions needed for getting data / objects with \pkg{insight} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_residcor-class.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_residcor-class.R \name{mvgam_residcor-class} \alias{mvgam_residcor-class} \title{\code{mvgam_residcor} object description} \value{ Objects of this class are structured as a \code{list} with the following components: \item{cor, cor_lower, cor_upper}{A set of \eqn{p \times p} correlation matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to \code{probs}} \item{sig_cor}{A \eqn{p \times p} correlation matrix containing only those correlations whose credible interval does not contain zero. All other correlations are set to zero} \item{prec, prec_lower, prec_upper}{A set of \eqn{p \times p} precision matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to \code{probs}} \item{sig_prec}{A \eqn{p \times p} precision matrix containing only those precisions whose credible interval does not contain zero. All other precisions are set to zero} \item{cov}{A \eqn{p \times p} posterior median or mean covariance matrix} \item{trace}{The median/mean point estimator of the trace (sum of the diagonal elements) of the residual covariance matrix \code{cov}} } \description{ A \code{mvgam_residcor} object returned by function \code{\link[=residual_cor]{residual_cor()}}. Run \code{methods(class = "mvgam_residcor")} to see an overview of available methods. } \details{ Hui (2016) provides an excellent description of the quantities that this function calculates, so this passage is heavily paraphrased from his associated \pkg{boral} package. In latent factor models, the residual covariance matrix is calculated based on the matrix of latent factor loading matrix \eqn{\Theta}, where the residual covariance matrix \eqn{\Sigma = \Theta\Theta'}. A strong residual covariance/correlation matrix between two species can be interpreted as evidence of species interactions (e.g., facilitation or competition), missing covariates, as well as any additional species correlation not accounted for by shared environmental captured in \code{formula}. The residual precision matrix (also known as partial correlation matrix, Ovaskainen et al., 2016) is defined as the inverse of the residual correlation matrix. The precision matrix is often used to identify direct or causal relationships between two species e.g., two species can have a zero precision but still be correlated, which can be interpreted as saying that two species are not directly associated, but they are still correlated \emph{through} other species. In other words, they are conditionally independent given the other species. It is important that the precision matrix does not exhibit the exact same properties of the correlation e.g., the diagonal elements are not equal to 1. Nevertheless, relatively larger values of precision may imply stronger direct relationships between two species. In addition to the residual correlation and precision matrices, the median or mean point estimator of trace of the residual covariance matrix is returned, \eqn{\sum\limits_{j=1}^p [\Theta\Theta']_{jj}}. Often used in other areas of multivariate statistics, the trace may be interpreted as the amount of covariation explained by the latent factors. One situation where the trace may be useful is when comparing a pure latent factor model (where no terms are suppled to \code{formula}) versus a model with latent factors and some additional predictors in \code{formula} -- the proportional difference in trace between these two models may be interpreted as the proportion of covariation between species explained by the predictors in \code{formula}. Of course, the trace itself is random due to the MCMC sampling, and so it is not always guaranteed to produce sensible answers. } \references{ Francis KC Hui (2016). BORAL - Bayesian ordination and regression analysis of multivariate abundance data in R. Methods in Ecology and Evolution. 7, 744-750. \cr \cr Otso Ovaskainen et al. (2016). Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution, 7, 549-555. } \seealso{ \code{\link[=jsdgam]{jsdgam()}}, \code{\link[=residual_cor]{residual_cor()}} } \author{ Nicholas J Clark } ================================================ FILE: man/mvgam_trends.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/trends.R \name{mvgam_trends} \alias{mvgam_trends} \title{Supported latent trend models in \pkg{mvgam}} \description{ Supported latent trend models in \pkg{mvgam} } \details{ \code{mvgam} currently supports the following dynamic trend models: \itemize{ \item \code{None} (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similar to what is estimated by \code{\link[mgcv]{gam}}) \item \code{ZMVN()} (zero-mean correlated errors, useful for modelling time series where no autoregressive terms are needed or for modelling data that are not sampled as time series) \item \code{RW()} \item \verb{AR(p = 1, 2, or 3)} \item \code{CAR(p = 1)} (continuous time autoregressive trends; only available in \code{Stan}) \item \code{VAR()} (only available in \code{Stan}) \item \code{PW()} (piecewise linear or logistic trends; only available in \code{Stan}) \item \code{GP()} (Gaussian Process with squared exponential kernel; only available in \code{Stan}) } For most dynamic trend types available in \code{mvgam} (see argument \code{trend_model}), time should be measured in discrete, regularly spaced intervals (i.e. \code{c(1, 2, 3, ...)}). However, you can use irregularly spaced intervals if using \code{trend_model = CAR(1)}, though note that any temporal intervals that are exactly \code{0} will be adjusted to a very small number (\code{1e-12}) to prevent sampling errors. For all autoregressive trend types apart from \code{CAR()}, moving average and/or correlated process error terms can also be estimated (for example, \code{RW(cor = TRUE)} will set up a multivariate Random Walk if \code{data} contains \verb{>1} series). Hierarchical process error correlations can also be handled if the data contain relevant observation units that are nested into relevant grouping and subgrouping levels (i.e. using \code{AR(gr = region, subgr = species)}). Note that only \code{RW}, \code{AR1}, \code{AR2} and \code{AR3} are available if using \code{JAGS}. All trend models are supported if using \code{Stan}. Dynamic factor models can be used in which the latent factors evolve as either \code{RW}, \code{AR1-3}, \code{VAR} or \code{GP}. For \code{VAR} models (i.e. \code{VAR} and \code{VARcor} models), users can either fix the trend error covariances to be \code{0} (using \code{VAR}) or estimate them and potentially allow for contemporaneously correlated errors using \code{VARcor}. For all \code{VAR} models, stationarity of the latent process is enforced through the prior using the parameterisation given by Heaps (2022). Stationarity is not enforced when using \code{AR1}, \code{AR2} or \code{AR3} models, though this can be changed by the user by specifying lower and upper bounds on autoregressive parameters using functionality in \link{get_mvgam_priors} and the \code{priors} argument in \link{mvgam}. Piecewise trends follow the formulation in the popular \code{prophet} package produced by \code{Facebook}, where users can allow for changepoints to control the potential flexibility of the trend. See Taylor and Letham (2018) for details. } \references{ Sarah E. Heaps (2022) Enforcing stationarity through the prior in Vector Autoregressions. Journal of Computational and Graphical Statistics. 32:1, 1–10. Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American Statistician 72.1, 37–45. } \seealso{ \code{\link{RW}}, \code{\link{AR}}, \code{\link{CAR}}, \code{\link{VAR}}, \code{\link{PW}}, \code{\link{GP}}, \code{\link{ZMVN}} } ================================================ FILE: man/mvgam_use_cases.Rd ================================================ \name{mvgam_use_cases} \alias{mvgam_use_cases} \title{Example use cases for \pkg{mvgam}} \description{ \pkg{mvgam} is a package for fitting dynamic generalized additive models (GAMs) to univariate or multivariate data. It combines the flexibility of smooth functions with latent temporal processes to model autocorrelation, seasonality, and uncertainty. The package supports both univariate and multivariate time series, making it especially useful for ecological and environmental forecasting. Bayesian inference via Stan allows for full uncertainty quantification and forecasting in complex, non-Gaussian settings. This help page provides external links to example applications and discussions relevant to the use of \pkg{mvgam} models. These examples span non-Gaussian time series modelling, multivariate abundance forecasting, and the use of complex predictors such as time-varying seasonality, monotonic nonlinear effects and Gaussian processes. } \details{ \strong{Non-Gaussian time series modelling and forecasting} \pkg{mvgam} is designed for real-world time series data that include discrete, zero-inflated, or overdispersed observations. It supports latent dynamic components and smooth terms to model autocorrelation, trends, and uncertainty. \itemize{ \item \href{https://stats.stackexchange.com/questions/657495}{Uncertain serial autocorrelation in GAM count model residuals} \item \href{https://discourse.mc-stan.org/t/fitting-an-autoregressive-model-and-poisson-process-interdependently/37268}{Fitting an autoregressive model and Poisson process interdependently} \item \href{https://stats.stackexchange.com/questions/652174}{Cyclical residual patterns and variable selection in GAMs} \item \href{https://stats.stackexchange.com/questions/437125}{Causality between two binary time series} \item \href{https://stats.stackexchange.com/questions/285100}{Logistic regression on time series data} \item \href{https://discourse.mc-stan.org/t/autocorrelation-for-unevenly-spaced-time-series/10001}{Autocorrelation for unevenly spaced time series} \item \href{https://stats.stackexchange.com/questions/664160}{Visualising autocorrelation in irregularly spaced count data} \item \href{https://ecogambler.netlify.app/blog/vector-autoregressions/}{Blog post: State-Space Vector Autoregressions in \code{mvgam}} \item \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{Vignette: State-Space models in \code{mvgam}} \item \href{https://www.youtube.com/watch?v=0zZopLlomsQ&t=4s}{Video tutorial: Ecological forecasting with Dynamic Generalized Additive Models} } \strong{Multivariate time series modelling and forecasting} \pkg{mvgam} supports multivariate models with shared or correlated latent trends, making it suitable for a broad range of applications that gather data on multiple time series simultaneously. \itemize{ \item \href{https://stats.stackexchange.com/questions/172645}{Ecological modelling: multivariate abundance time-series data} \item \href{https://discourse.mc-stan.org/t/account-for-relationships-between-species-in-a-multivariate-brms-model/32566}{Relationships between species in multivariate models} \item \href{https://discourse.mc-stan.org/t/confirmatory-factor-analysis-using-brms/23139}{Confirmatory factor analysis using \code{brms}} \item \href{https://discourse.mc-stan.org/t/chains-stuck-in-a-local-optimum-correlated-poisson-distributions/37414}{Chains stuck in a local optimum: correlated Poisson distributions} \item \href{https://ecogambler.netlify.app/blog/distributed-lags-mgcv/}{Blog post: Hierarchical distributed lag models in \code{mgcv} and \code{mvgam}} \item \href{https://nicholasjclark.github.io/mvgam/articles/shared_states.html}{Vignette: Multivariate series with shared latent states} \item \href{https://www.youtube.com/watch?v=2POK_FVwCHk}{Video tutorial: Time series in R and Stan using the \code{mvgam} package: hierarchical GAMs} } \strong{Seasonality and other complex predictors} \pkg{mvgam} allows for flexible modelling of seasonal patterns and nonlinear effects using cyclic smooths, Gaussian processes, monotonic smooths and hierarchical structures. \itemize{ \item \href{https://stats.stackexchange.com/questions/478384}{Gaussian process smoothers (\code{bs = "gp"}) in GAMs} \item \href{https://stats.stackexchange.com/questions/612312}{Fitting a GAM with double seasonality to a daily time series} \item \href{https://stats.stackexchange.com/questions/648143}{Simulating time series with different seasonal effects} \item \href{https://discourse.mc-stan.org/t/adding-time-as-monotne-predictor/37109}{Adding time as a monotone predictor} \item \href{https://ecogambler.netlify.app/blog/time-varying-seasonality/}{Blog post: Incorporating time-varying seasonality in forecast models} \item \href{https://nicholasjclark.github.io/mvgam/articles/time_varying_effects.html}{Vignette: Time-varying effects in \code{mvgam}} \item \href{https://www.youtube.com/watch?v=fzPJUW8x6DU}{Video tutorial: Time series in R and Stan using the \code{mvgam} package: an introduction} } } \author{ Nicholas J Clark } ================================================ FILE: man/ordinate.jsdgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ordinate.jsdgam.R \name{ordinate.jsdgam} \alias{ordinate.jsdgam} \alias{ordinate} \title{Latent variable ordination plots from jsdgam objects} \usage{ ordinate(object, ...) \method{ordinate}{jsdgam}( object, which_lvs = c(1, 2), biplot = TRUE, alpha = 0.5, label_sites = TRUE, ... ) } \arguments{ \item{object}{\code{list} object of class \code{jsdgam} resulting from a call to \code{\link[=jsdgam]{jsdgam()}}} \item{...}{ignored} \item{which_lvs}{A \code{vector} of indices indicating the two latent variables to be plotted (if number of the latent variables specified in the model was more than 2). Defaults to \code{c(1, 2)}} \item{biplot}{\code{Logical}. If \code{TRUE}, both the site and the species scores will be plotted, with names for the taxa interpreted based on the \code{species} argument in the original call to \code{\link[=jsdgam]{jsdgam()}}. If \code{FALSE}, only the site scores will be plotted} \item{alpha}{A proportional numeric scalar between \code{0} and \code{1} that controls the relative scaling of the latent variables and their loading coefficients} \item{label_sites}{\code{Logical} flag. If \code{TRUE}, site scores will be plotted as labels using names based on the \code{unit} argument in the original call to \code{\link[=jsdgam]{jsdgam()}}. If \code{FALSE}, site scores will be shown as points only} } \value{ An \code{ggplot} object } \description{ Plot an ordination of latent variables and their factor loadings from \code{jsdgam} models } \details{ This function constructs a two-dimensional scatterplot in ordination space. The chosen latent variables are first re-rotated using singular value decomposition, so that the first plotted latent variable does not have to be the first latent variable that was estimated in the original model. Posterior median estimates of the variables and the species' loadings on these variables are then used to construct the resulting plot. Some attempt at de-cluttering the resulting plot is made by using \code{geom_label_repel()} and \code{geom_text_repel} from the \pkg{ggrepel} package, but if there are many sites and/or species then some labels may be removed automatically. Note that you can typically get better, more readable plot layouts if you also have the \pkg{ggarrow} and \pkg{ggpp} packages installed } \examples{ \dontrun{ # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot a residual ordination biplot ordinate( mod, alpha = 0.7 ) # Compare to a residual correlation plot plot( residual_cor(mod) ) } } \seealso{ \code{\link[=jsdgam]{jsdgam()}}, \code{\link[=residual_cor]{residual_cor()}} } \author{ Nicholas J Clark } ================================================ FILE: man/pairs.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pairs.mvgam.R \name{pairs.mvgam} \alias{pairs.mvgam} \title{Create a matrix of output plots from a \code{mvgam} object} \usage{ \method{pairs}{mvgam}(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...) } \arguments{ \item{x}{An object of class \code{mvgam} or \code{jsdgam}} \item{variable}{Names of the variables (parameters) to plot, as given by a character vector or a regular expression (if \code{regex = TRUE}). By default, a hopefully not too large selection of variables is plotted.} \item{regex}{Logical; Indicates whether \code{variable} should be treated as regular expressions. Defaults to \code{FALSE}.} \item{use_alias}{Logical. If more informative names for parameters are available (i.e. for beta coefficients \code{b} or for smoothing parameters \code{rho}), replace the uninformative names with the more informative alias. Defaults to \code{TRUE}.} \item{...}{Further arguments to be passed to \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}}.} } \value{ Plottable objects whose classes depend on the arguments supplied. See \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}} for details. } \description{ A \code{\link[graphics:pairs]{pairs}} method that is customized for MCMC output. } \details{ For a detailed description see \code{\link[bayesplot:MCMC-scatterplots]{mcmc_pairs}}. } \examples{ \dontrun{ simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE) } } ================================================ FILE: man/piecewise_trends.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_trend_types.R \name{PW} \alias{PW} \title{Specify piecewise linear or logistic trends in \pkg{mvgam} models} \usage{ PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = "linear" ) } \arguments{ \item{n_changepoints}{A non-negative integer specifying the number of potential changepoints. Potential changepoints are selected uniformly from the first \code{changepoint_range} proportion of timepoints in \code{data}. Default is \code{10}.} \item{changepoint_range}{Proportion of history in \code{data} in which trend changepoints will be estimated. Defaults to \code{0.8} for the first 80\%.} \item{changepoint_scale}{Parameter modulating the flexibility of the automatic changepoint selection by altering the scale parameter of a Laplace distribution. The resulting prior will be \code{double_exponential(0, changepoint_scale)}. Large values will allow many changepoints and a more flexible trend, while small values will allow few changepoints. Default is \code{0.05}.} \item{growth}{Character string specifying either \code{'linear'} or \code{'logistic'} growth of the trend. If \code{'logistic'}, a variable labelled \code{cap} MUST be in \code{data} to specify the maximum saturation point for the trend (see details and examples in \code{\link{mvgam}} for more information). Default is \code{'linear'}.} } \value{ An object of class \code{mvgam_trend}, which contains a list of arguments to be interpreted by the parsing functions in \code{mvgam}. } \description{ Set up piecewise linear or logistic trend models in \code{mvgam}. These functions do not evaluate their arguments – they exist purely to help set up a model with particular piecewise trend models. } \details{ \emph{Offsets and intercepts}: For each of these trend models, an offset parameter is included in the trend estimation process. This parameter will be incredibly difficult to identify if you also include an intercept in the observation formula. For that reason, it is highly recommended that you drop the intercept from the formula (i.e. \code{y ~ x + 0} or \code{y ~ x - 1}, where \code{x} are your optional predictor terms). \emph{Logistic growth and the cap variable}: When forecasting growth, there is often some maximum achievable point that a time series can reach. For example, total market size, total population size or carrying capacity in population dynamics. It can be advantageous for the forecast to saturate at or near this point so that predictions are more sensible. This function allows you to make forecasts using a logistic growth trend model, with a specified carrying capacity. Note that this capacity does not need to be static over time; it can vary with each series × timepoint combination if necessary. But you must supply a \code{cap} value for each observation in the data when using \code{growth = 'logistic'}. For observation families that use a non-identity link function, the \code{cap} value will be internally transformed to the link scale (i.e. your specified \code{cap} will be log-transformed if you are using a \code{poisson()} or \code{nb()} family). It is therefore important that you specify the \code{cap} values on the scale of your outcome. Note also that no missing values are allowed in \code{cap}. } \examples{ \dontrun{ # Example of logistic growth with possible changepoints dNt = function(r, N, k) { r * N * (k - N) } Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { if (i \%in\% c(5, 15, 25, 41, 45, 60, 80)) { N[i + 1] <- max( 1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k) ) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') y <- rpois(100, expected) plot(y, xlab = 'Time') mod_data <- data.frame( y = y, time = 1:100, cap = 35, series = as.factor('series_1') ) plot_mvgam_series(data = mod_data) mod <- mvgam( y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data, chains = 2, silent = 2 ) summary(mod) hc <- hindcast(mod) plot(hc) library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs( y = 'Potential changepoint', x = 'Rate change' ) how_to_cite(mod) } } \references{ Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37–45. } \author{ Nicholas J Clark } ================================================ FILE: man/pipe.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils-pipe.R \name{\%>\%} \alias{\%>\%} \title{Pipe operator} \usage{ lhs \%>\% rhs } \arguments{ \item{lhs}{A value or the magrittr placeholder.} \item{rhs}{A function call using the magrittr semantics.} } \value{ The result of calling \code{rhs(lhs)}. } \description{ See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. } \keyword{internal} ================================================ FILE: man/plot.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot.mvgam.R \name{plot.mvgam} \alias{plot.mvgam} \title{Default plots for \pkg{mvgam} models} \usage{ \method{plot}{mvgam}( x, type = "residuals", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... ) } \arguments{ \item{x}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{type}{\code{character} specifying which type of plot to return. Options are: \code{"series"}, \code{"residuals"}, \code{"smooths"}, \code{"re"} (random effect smooths), \code{"pterms"} (parametric effects), \code{"forecast"}, \code{"trend"}, \code{"uncertainty"}, \code{"factors"}} \item{series}{\code{integer} specifying which series in the set is to be plotted. This is ignored if \code{type == 're'}} \item{residuals}{\code{logical}. If \code{TRUE} and \code{type = 'smooths'}, posterior quantiles of partial residuals are added to plots of 1-D smooths as a series of ribbon rectangles. Partial residuals for a smooth term are the median Dunn-Smyth residuals that would be obtained by dropping the term concerned from the model, while leaving all other estimates fixed (i.e. the estimates for the term plus the original median Dunn-Smyth residuals). Note that because \code{mvgam} works with Dunn-Smyth residuals and not working residuals, which are used by \code{mgcv}, the magnitudes of partial residuals will be different to what you would expect from \code{\link[mgcv]{plot.gam}}. Interpretation is similar though, as these partial residuals should be evenly scattered around the smooth function if the function is well estimated} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original \code{formula}. This argument is optional when plotting out of sample forecast period observations (when \code{type = forecast}) and required when plotting uncertainty components (\code{type = uncertainty}).} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{trend_effects}{logical. If \code{TRUE} and a \code{trend_formula} was used in model fitting, terms from the trend (i.e. process) model will be plotted} \item{...}{Additional arguments for each individual plotting function.} } \value{ A base R plot or set of plots } \description{ This function takes a fitted \code{mvgam} object and produces plots of smooth functions, forecasts, trends and uncertainty components } \details{ These plots are useful for getting an overview of the fitted model and its estimated random effects or smooth functions, but the individual plotting functions and the functions from the \code{marginaleffects} and \code{gratia} packages offer far more customisation. } \examples{ \dontrun{ # Simulate some time series dat <- sim_mvgam( T = 80, n_series = 3 ) # Fit a basic model mod <- mvgam( y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with 'marginaleffects' utilities library(marginaleffects) plot_predictions( mod, condition = 'season', type = 'link' ) plot_predictions( mod, condition = c('season', 'series', 'series'), type = 'link' ) plot_predictions( mod, condition = 'series', type = 'link' ) # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam( y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) plot(mod, type = 'smooths', trend_effects = TRUE) # But 'marginaleffects' functions work without any modification plot_predictions( mod, condition = 'season', type = 'link' ) } } \seealso{ \code{\link{plot_mvgam_resids}}, \code{\link{plot_mvgam_smooth}}, \code{\link{plot_mvgam_fc}}, \code{\link{plot_mvgam_trend}}, \code{\link{plot_mvgam_uncertainty}}, \code{\link{plot_mvgam_factors}}, \code{\link{plot_mvgam_randomeffects}}, \code{\link{conditional_effects.mvgam}}, \code{\link[marginaleffects]{plot_predictions}}, \code{\link[marginaleffects]{plot_slopes}}, \code{\link{gratia_mvgam_enhancements}} } \author{ Nicholas J Clark } ================================================ FILE: man/plot.mvgam_fevd.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_fevd-class.R \name{plot.mvgam_fevd} \alias{plot.mvgam_fevd} \title{Plot forecast error variance decompositions from an \code{mvgam_fevd} object} \usage{ \method{plot}{mvgam_fevd}(x, ...) } \arguments{ \item{x}{\code{list} object of class \code{mvgam_fevd}. See \code{\link[=fevd]{fevd()}}} \item{...}{ignored} } \value{ A \code{\link[ggplot2]{ggplot}} object, which can be further customized using the \pkg{ggplot2} package } \description{ This function takes an \code{mvgam_fevd} object and produces a plot of the posterior median contributions to forecast variance for each series in the fitted Vector Autoregression } \author{ Nicholas J Clark } ================================================ FILE: man/plot.mvgam_irf.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_irf-class.R \name{plot.mvgam_irf} \alias{plot.mvgam_irf} \title{Plot impulse responses from an \code{mvgam_irf} object} \usage{ \method{plot}{mvgam_irf}(x, series = 1, ...) } \arguments{ \item{x}{\code{list} object of class \code{mvgam_irf}. See \code{\link[=irf]{irf()}}} \item{series}{\code{integer} specifying which process series should be given the shock} \item{...}{ignored} } \value{ A \code{ggplot} object showing the expected response of each latent time series to a shock of the focal \code{series} } \description{ This function takes an \code{mvgam_irf} object and produces plots of Impulse Response Functions } \author{ Nicholas J Clark } ================================================ FILE: man/plot.mvgam_lfo.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/lfo_cv.mvgam.R \name{plot.mvgam_lfo} \alias{plot.mvgam_lfo} \title{Plot Pareto-k and ELPD values from a \code{mvgam_lfo} object} \usage{ \method{plot}{mvgam_lfo}(x, ...) } \arguments{ \item{x}{An object of class \code{mvgam_lfo}} \item{...}{Ignored} } \value{ A \code{ggplot} object presenting Pareto-k and ELPD values over the evaluation timepoints. For the Pareto-k plot, a dashed red line indicates the specified threshold chosen for triggering model refits. For the ELPD plot, a dashed red line indicates the bottom 10\% quantile of ELPD values. Points below this threshold may represent outliers that were more difficult to forecast } \description{ This function takes an object of class \code{mvgam_lfo} and creates several informative diagnostic plots } ================================================ FILE: man/plot.mvgam_residcor.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_residcor-class.R \name{plot.mvgam_residcor} \alias{plot.mvgam_residcor} \title{Plot residual correlations based on latent factors} \usage{ \method{plot}{mvgam_residcor}(x, cluster = FALSE, ...) } \arguments{ \item{x}{\code{list} object of class \code{mvgam_residcor} resulting from a call to \code{residual_cor(..., summary = TRUE)}} \item{cluster}{Logical. Should the variables be re-arranged within the plot to group the correlation matrix into clusters of positive and negative correlations? Defaults to \code{FALSE}} \item{...}{ignored} } \value{ A \code{ggplot} object } \description{ Plot residual correlation estimates from Joint Species Distribution (\code{jsdgam}) or dynamic factor (\code{mvgam}) models } \details{ This function plots the significant residual correlations from a \code{mvgam_residcor} object, whereby the posterior mean (if \code{robust = FALSE}) or posterior median (if \code{robust = TRUE}) correlations are shown only those correlations whose credible interval does not contain zero. All other correlations are set to zero in the returned plot } \seealso{ \code{\link[=jsdgam]{jsdgam()}}, \code{\link[=lv_correlations]{lv_correlations()}}, \code{\link[=residual_cor]{residual_cor()}} } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_factors.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_factors.R \name{plot_mvgam_factors} \alias{plot_mvgam_factors} \title{Latent factor summaries for a fitted \pkg{mvgam} object} \usage{ plot_mvgam_factors(object, plot = TRUE) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{plot}{\code{logical} specifying whether factors should be plotted} } \value{ A \code{data.frame} of factor contributions } \description{ This function takes a fitted \code{mvgam} object and returns plots and summary statistics for the latent dynamic factors } \details{ If the model in \code{object} was estimated using dynamic factors, it is possible that not all factors contributed to the estimated trends. This is due to the regularisation penalty that acts independently on each factor's Gaussian precision, which will squeeze un-needed factors to a white noise process (effectively dropping that factor from the model). In this function, each factor is tested against a null hypothesis of white noise by calculating the sum of the factor's 2nd derivatives. A factor that has a larger contribution will have a larger sum due to the weaker penalty on the factor's precision. If \code{plot == TRUE}, the factors are also plotted. } \examples{ \dontrun{ simdat <- sim_mvgam() mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, chains = 2, silent = 2 ) plot_mvgam_factors(mod) } } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_forecasts.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_fc.R \name{plot_mvgam_forecasts} \alias{plot_mvgam_forecasts} \alias{plot_mvgam_fc} \alias{plot.mvgam_forecast} \title{Plot posterior forecast predictions from \pkg{mvgam} models} \usage{ plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) \method{plot}{mvgam_forecast}( x, series = 1, realisations = FALSE, n_realisations = 15, xlab, ylab, ylim, ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original \code{formula}. If included, the covariate information in \code{newdata} will be used to generate forecasts from the fitted model equations. If this same \code{newdata} was originally included in the call to \code{mvgam}, then forecasts have already been produced by the generative model and these will simply be extracted and plotted. However if no \code{newdata} was supplied to the original model call, an assumption is made that the \code{newdata} supplied here comes sequentially after the data supplied as \code{data} in the original model (i.e. we assume there is no time gap between the last observation of series 1 in \code{data} and the first observation for series 1 in \code{newdata}). If \code{newdata} contains observations in column \code{y}, these observations will be used to compute a Discrete Rank Probability Score for the forecast distribution} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{realisations}{\code{logical}. If \code{TRUE}, forecast realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible forecasts. If \code{FALSE}, the default, empirical quantiles of the forecast distribution are shown} \item{n_realisations}{\code{integer} specifying the number of posterior realisations to plot, if \code{realisations = TRUE}. Ignored otherwise} \item{hide_xlabels}{\code{logical}. If \code{TRUE}, no xlabels are printed to allow the user to add custom labels using \code{axis} from base \code{R}} \item{xlab}{Label for x axis} \item{ylab}{Label for y axis} \item{ylim}{Optional \code{vector} of y-axis limits (min, max)} \item{n_cores}{\code{integer} specifying number of cores for generating forecasts in parallel} \item{return_forecasts}{\code{logical}. If \code{TRUE}, the function will plot the forecast as well as returning the forecast object (as a \code{matrix} of dimension \code{n_samples} x \code{horizon})} \item{return_score}{\code{logical}. If \code{TRUE} and out of sample test data is provided as \code{newdata}, a probabilistic score will be calculated and returned. The score used will depend on the observation family from the fitted model. Discrete families (\code{poisson}, \code{negative binomial}, \code{tweedie}) use the Discrete Rank Probability Score. Other families use the Continuous Rank Probability Score. The value returned is the \code{sum} of all scores within the out of sample forecast horizon} \item{...}{Further \code{\link[graphics]{par}} graphical parameters} \item{x}{Object of class \code{mvgam_forecast}} } \value{ A base \code{R} graphics plot (for \code{plot_mvgam_fc}) or a \code{ggplot} object (for \code{plot.mvgam_forecast}) and an optional \code{list} containing the forecast distribution and the out of sample probabilistic forecast score } \description{ Plot posterior forecast predictions from \pkg{mvgam} models } \details{ \code{plot_mvgam_fc} generates posterior predictions from an object of class \code{mvgam}, calculates posterior empirical quantiles and plots them against the observed data. If \code{realisations = FALSE}, the returned plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of increasingly darker shades of red) as well as the posterior median (as a dark red line). If \code{realisations = TRUE}, a set of \code{n_realisations} posterior draws are shown. This function produces an older style base \code{R} plot, as opposed to \code{plot.mvgam_forecast} \code{plot.mvgam_forecast} takes an object of class \code{mvgam_forecast}, in which forecasts have already been computed, and plots the resulting forecast distribution as a \code{ggplot} object. This function is therefore more versatile and is recommended over the older and clunkier \code{plot_mvgam_fc} version If \code{realisations = FALSE}, these posterior quantiles are plotted along with the true observed data that was used to train the model. Otherwise, a spaghetti plot is returned to show possible forecast paths. } \examples{ \dontrun{ simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) } } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_pterms.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_pterms.R \name{plot_mvgam_pterms} \alias{plot_mvgam_pterms} \title{Plot parametric term partial effects for \pkg{mvgam} models} \usage{ plot_mvgam_pterms(object, trend_effects = FALSE) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{trend_effects}{logical. If \code{TRUE} and a \code{trend_formula} was used in model fitting, terms from the trend (i.e. process) model will be plotted} } \value{ A base \code{R} graphics plot } \description{ This function plots posterior empirical quantiles for partial effects of parametric terms } \details{ Posterior empirical quantiles of each parametric term's partial effect estimates (on the link scale) are calculated and visualised as ribbon plots. These effects can be interpreted as the partial effect that a parametric term contributes when all other terms in the model have been set to \code{0} } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_randomeffects.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_randomeffects.R \name{plot_mvgam_randomeffects} \alias{plot_mvgam_randomeffects} \title{Plot random effect terms from \pkg{mvgam} models} \usage{ plot_mvgam_randomeffects(object, trend_effects = FALSE) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{trend_effects}{logical. If \code{TRUE} and a \code{trend_formula} was used in model fitting, terms from the trend (i.e. process) model will be plotted} } \value{ A base \code{R} graphics plot } \description{ This function plots posterior empirical quantiles for random effect smooths (bs = re) } \details{ Posterior empirical quantiles of random effect coefficient estimates (on the link scale) are calculated and visualised as ribbon plots. Labels for coefficients are taken from the levels of the original factor variable that was used to specify the smooth in the model's formula } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_resids.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_resids.R \name{plot_mvgam_resids} \alias{plot_mvgam_resids} \title{Residual diagnostics for a fitted \pkg{mvgam} object} \usage{ plot_mvgam_resids(object, series = 1, n_draws = 100L, n_points = 1000L) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{n_draws}{\code{integer} specifying the number of posterior residual draws to use for calculating uncertainty in the \code{"ACF"} and \code{"pACF"} frames. Default is \code{100}} \item{n_points}{\code{integer} specifying the maximum number of points to show in the "Resids vs Fitted" and "Normal Q-Q Plot" frames. Default is \code{1000}} } \value{ A facetted \code{ggplot} object } \description{ This function takes a fitted \code{mvgam} object and returns various residual diagnostic plots } \details{ A total of four ggplot plots are generated to examine posterior Dunn-Smyth residuals for the specified series. Plots include a residuals vs fitted values plot, a Q-Q plot, and two plots to check for any remaining temporal autocorrelation in the residuals. Note, all plots only report statistics from a sample of up to \code{100} posterior draws (to save computational time), so uncertainty in these relationships may not be adequately represented. } \examples{ \dontrun{ simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Plot Dunn Smyth residuals for some series plot_mvgam_resids(mod) plot_mvgam_resids(mod, series = 2) } } \author{ Nicholas J Clark Nicholas J Clark and Matthijs Hollanders } ================================================ FILE: man/plot_mvgam_series.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_series.R \name{plot_mvgam_series} \alias{plot_mvgam_series} \title{Plot observed time series used for \pkg{mvgam} modelling} \usage{ plot_mvgam_series( object, data, newdata, y = "y", lines = TRUE, series = 1, n_bins = NULL, log_scale = FALSE ) } \arguments{ \item{object}{Optional \code{list} object returned from \code{mvgam}. Either \code{object} or \code{data} must be supplied} \item{data}{Optional \code{data.frame} or \code{list} of training data containing at least 'series' and 'time'. Use this argument if training data have been gathered in the correct format for \code{mvgam} modelling but no model has yet been fitted.} \item{newdata}{Optional \code{data.frame} or \code{list} of test data containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of \code{formula}. If included, the observed values in the test data are compared to the model's forecast distribution for exploring biases in model predictions} \item{y}{Character. What is the name of the outcome variable in the supplied data? Defaults to \code{'y'}} \item{lines}{Logical. If \code{TRUE}, line plots are used for visualizing time series. If \code{FALSE}, points are used.} \item{series}{Either an \code{integer} specifying which series in the set is to be plotted or the string 'all', which plots all series available in the supplied data} \item{n_bins}{\code{integer} specifying the number of bins to use for binning observed values when plotting a histogram. Default is to use the number of bins returned by a call to \code{hist} in base \code{R}} \item{log_scale}{\code{logical}. If \code{series == 'all'}, this flag is used to control whether the time series plot is shown on the log scale (using \code{log(Y + 1)}). This can be useful when visualizing many series that may have different observed ranges. Default is \code{FALSE}} } \value{ A set of ggplot objects. If \code{series} is an integer, the plots will show observed time series, autocorrelation and cumulative distribution functions, and a histogram for the series. If \code{series == 'all'}, a set of observed time series plots is returned in which all series are shown on each plot but only a single focal series is highlighted, with all remaining series shown as faint gray lines. } \description{ This function takes either a fitted \code{mvgam} object or a \code{data.frame} object and produces plots of observed time series, ACF, CDF and histograms for exploratory data analysis } \examples{ # Simulate and plot series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' ) plot_mvgam_series( data = sim_data$data_train, newdata = sim_data$data_test, series = 1 ) # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' ) } \author{ Nicholas J Clark and Matthijs Hollanders } ================================================ FILE: man/plot_mvgam_smooth.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_smooth.R \name{plot_mvgam_smooth} \alias{plot_mvgam_smooth} \title{Plot smooth terms from \pkg{mvgam} models} \usage{ plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{trend_effects}{logical. If \code{TRUE} and a \code{trend_formula} was used in model fitting, terms from the trend (i.e. process) model will be plotted} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{smooth}{Either a \code{character} or \code{integer} specifying which smooth term to be plotted} \item{residuals}{\code{logical}. If \code{TRUE}, posterior quantiles of partial residuals are added to plots of 1-D smooths as a series of ribbon rectangles. Partial residuals for a smooth term are the median Dunn-Smyth residuals that would be obtained by dropping the term concerned from the model, while leaving all other estimates fixed (i.e. the estimates for the term plus the original median Dunn-Smyth residuals). Note that because \code{mvgam} works with Dunn-Smyth residuals and not working residuals, which are used by \code{mgcv}, the magnitudes of partial residuals will be different to what you would expect from \code{\link[mgcv]{plot.gam}}. Interpretation is similar though, as these partial residuals should be evenly scattered around the smooth function if the function is well estimated} \item{n_resid_bins}{\code{integer} specifying the number of bins to group the covariate into when plotting partial residuals. Setting this argument too high can make for messy plots that are difficult to interpret, while setting it too low will likely mask some potentially useful patterns in the partial residuals. Default is \code{25}} \item{realisations}{\code{logical}. If \code{TRUE}, posterior realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible functions. If \code{FALSE}, the default, empirical quantiles of the posterior distribution are shown} \item{n_realisations}{\code{integer} specifying the number of posterior realisations to plot, if \code{realisations = TRUE}. Ignored otherwise} \item{derivatives}{\code{logical}. If \code{TRUE}, an additional plot will be returned to show the estimated 1st derivative for the specified smooth (Note: this only works for univariate smooths)} \item{newdata}{Optional \code{dataframe} for predicting the smooth, containing at least 'series' in addition to any other variables included in the linear predictor of the original model's \code{formula}. Note that this currently is only supported for plotting univariate smooths} } \value{ A base \code{R} graphics plot } \description{ This function plots posterior empirical quantiles for a series-specific smooth term } \details{ Smooth functions are shown as empirical quantiles (or spaghetti plots) of posterior partial expectations across a sequence of values between the variable's \code{min} and \code{max}, while zeroing out effects of all other variables. At present, only univariate and bivariate smooth plots are allowed, though note that bivariate smooths rely on default behaviour from \code{\link[mgcv]{plot.gam}}. \code{plot_mvgam_smooth} generates posterior predictions from an object of class \code{mvgam}, calculates posterior empirical quantiles and plots them. If \code{realisations = FALSE}, the returned plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of increasingly darker shades of red) as well as the posterior median (as a dark red line). If \code{realisations = TRUE}, a set of \code{n_realisations} posterior draws are shown. For more nuanced visualisation, supply \code{newdata} just as you would when predicting from a \code{\link[mgcv]{gam}} model or use the more flexible \code{\link{conditional_effects.mvgam}}. Alternatively, if you prefer to use partial effect plots in the style of \code{gratia}, and if you have the \code{gratia} package installed, you can use \code{draw.mvgam}. See \code{\link{gratia_mvgam_enhancements}} for details. } \seealso{ \code{\link[mgcv]{plot.gam}}, \code{\link{conditional_effects.mvgam}}, \code{\link{gratia_mvgam_enhancements}} } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_trend.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_trend.R \name{plot_mvgam_trend} \alias{plot_mvgam_trend} \title{Plot latent trend predictions from \pkg{mvgam} models} \usage{ plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, xlab, ylab ) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing at least 'series' and 'time' in addition to any other variables included in the linear predictor of the original \code{formula}.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{realisations}{\code{logical}. If \code{TRUE}, posterior trend realisations are shown as a spaghetti plot, making it easier to visualise the diversity of possible trend paths. If \code{FALSE}, the default, empirical quantiles of the posterior distribution are shown} \item{n_realisations}{\code{integer} specifying the number of posterior realisations to plot, if \code{realisations = TRUE}. Ignored otherwise} \item{n_cores}{Deprecated. Parallel processing is no longer supported} \item{derivatives}{\code{logical}. If \code{TRUE}, an additional plot will be returned to show the estimated 1st derivative for the estimated trend} \item{xlab}{Label for x axis} \item{ylab}{Label for y axis} } \value{ A \code{ggplot} object } \description{ Plot latent trend predictions from \pkg{mvgam} models } \examples{ \dontrun{ simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) # Plot estimated trends for some series plot_mvgam_trend(mod) plot_mvgam_trend(mod, series = 2) # Extrapolate trends forward in time and plot on response scale plot_mvgam_trend( mod, newdata = simdat$data_test ) plot_mvgam_trend( mod, newdata = simdat$data_test, series = 2 ) # But it is recommended to compute extrapolations for all series # first and then plot trend_fc <- forecast( mod, newdata = simdat$data_test ) plot(trend_fc, series = 1) plot(trend_fc, series = 2) } } \author{ Nicholas J Clark } ================================================ FILE: man/plot_mvgam_uncertainty.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot_mvgam_uncertainty.R \name{plot_mvgam_uncertainty} \alias{plot_mvgam_uncertainty} \title{Plot forecast uncertainty contributions from \pkg{mvgam} models} \usage{ plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = "topleft", hide_xlabels = FALSE ) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{newdata}{A \code{dataframe} or \code{list} containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of \code{formula}} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{legend_position}{The location may also be specified by setting x to a single keyword from the list: "none", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location (if it is not "none").} \item{hide_xlabels}{\code{logical}. If \code{TRUE}, no xlabels are printed to allow the user to add custom labels using \code{axis} from base \code{R}} } \value{ A base \code{R} graphics plot } \description{ Plot forecast uncertainty contributions from \pkg{mvgam} models } \details{ The basic idea of this function is to compute forecasts by ignoring one of the two primary components in a correlated residual model (i.e. by either ignoring the linear predictor effects or by ignoring the residual dynamics). Some caution is required however, as this function was designed early in the \pkg{mvgam} development cycle and there are now many types of models that it cannot handle very well. For example, models with shared latent states, or any type of State-Space models that include terms in the \code{trend_formula}, will either fail or give nonsensical results. Improvements are in the works to provide a more general way to decompose forecast uncertainties, so please check back at a later date. } ================================================ FILE: man/portal_data.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/portal_data.R \docType{data} \name{portal_data} \alias{portal_data} \title{Portal Project rodent capture survey data} \format{ A \code{data.frame} containing the following fields: \describe{ \item{time}{time of sampling, in lunar monthly cycles} \item{series}{factor indicator of the time series, i.e. the species} \item{captures}{total captures across all control plots at each time point} \item{ndvi_ma12}{12-month moving average of the mean Normalised Difference Vegetation Index} \item{mintemp}{monthly mean of minimum temperature} } } \source{ \url{https://github.com/weecology/PortalData/blob/main/SiteandMethods/Methods.md} } \usage{ portal_data } \description{ A dataset containing time series of total captures (across all control plots) for select rodent species from the Portal Project } \keyword{datasets} ================================================ FILE: man/posterior_epred.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/posterior_epred.mvgam.R \name{posterior_epred.mvgam} \alias{posterior_epred.mvgam} \title{Draws from the expected value of the posterior predictive distribution for \pkg{mvgam} objects} \usage{ \method{posterior_epred}{mvgam}( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables that were included in the original \code{data} used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{ndraws}{Positive \code{integer} indicating how many posterior draws should be used. If \code{NULL} (the default) all draws are used.} \item{process_error}{\code{logical}. If \code{TRUE} and \code{newdata} is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If \code{FALSE}, uncertainty in the latent trend component is ignored when calculating predictions. If no \code{newdata} is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)} \item{...}{Ignored} } \value{ A \code{matrix} of dimension \code{n_samples x n_obs}, where \code{n_samples} is the number of posterior samples from the fitted object and \code{n_obs} is the number of observations in \code{newdata} } \description{ Compute posterior draws of the expected value of the posterior predictive distribution (i.e. the conditional expectation). Can be performed for the data used to fit the model (posterior predictive checks) or for new data. By definition, these predictions have smaller variance than the posterior predictions performed by the \code{\link{posterior_predict.mvgam}} method. This is because only the uncertainty in the expected value of the posterior predictive distribution is incorporated in the draws computed by \code{posterior_epred} while the residual error is ignored there. However, the estimated means of both methods averaged across draws should be very similar. } \details{ Note that for all types of predictions for models that did not include a \code{trend_formula}, uncertainty in the dynamic trend component can be ignored by setting \code{process_error = FALSE}. However, if a \code{trend_formula} was supplied in the model, predictions for this component cannot be ignored. If \code{process_error = TRUE}, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from \code{brms} models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a \code{mvgam} model, while the forecasting functions \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends. } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations) } } \seealso{ \code{\link{hindcast.mvgam}}, \code{\link{posterior_linpred.mvgam}}, \code{\link{posterior_predict.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/posterior_linpred.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/posterior_epred.mvgam.R \name{posterior_linpred.mvgam} \alias{posterior_linpred.mvgam} \title{Posterior draws of the linear predictor for \pkg{mvgam} objects} \usage{ \method{posterior_linpred}{mvgam}( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{transform}{\code{logical}; if \code{FALSE} (the default), draws of the linear predictor are returned. If \code{TRUE}, draws of the transformed linear predictor, i.e. the conditional expectation, are returned.} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables that were included in the original \code{data} used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.} \item{ndraws}{Positive \code{integer} indicating how many posterior draws should be used. If \code{NULL} (the default) all draws are used.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{process_error}{\code{logical}. If \code{TRUE} and \code{newdata} is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If \code{FALSE}, uncertainty in the latent trend component is ignored when calculating predictions. If no \code{newdata} is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)} \item{...}{Ignored} } \value{ A \code{matrix} of dimension \code{n_samples x n_obs}, where \code{n_samples} is the number of posterior samples from the fitted object and \code{n_obs} is the number of observations in \code{newdata} } \description{ Compute posterior draws of the linear predictor, that is draws before applying any link functions or other transformations. Can be performed for the data used to fit the model (posterior predictive checks) or for new data. } \details{ Note that for all types of predictions for models that did not include a \code{trend_formula}, uncertainty in the dynamic trend component can be ignored by setting \code{process_error = FALSE}. However, if a \code{trend_formula} was supplied in the model, predictions for this component cannot be ignored. If \code{process_error = TRUE}, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from \code{brms} models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a \code{mvgam} model, while the forecasting functions \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends. } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds) } } \seealso{ \code{\link{hindcast.mvgam}}, \code{\link{posterior_epred.mvgam}}, \code{\link{posterior_predict.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/posterior_predict.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/posterior_epred.mvgam.R \name{posterior_predict.mvgam} \alias{posterior_predict.mvgam} \title{Draws from the posterior predictive distribution for \pkg{mvgam} objects} \usage{ \method{posterior_predict}{mvgam}( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables that were included in the original \code{data} used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{ndraws}{Positive \code{integer} indicating how many posterior draws should be used. If \code{NULL} (the default) all draws are used.} \item{process_error}{Logical. If \code{TRUE} and \code{newdata} is supplied, expected uncertainty in the process model is accounted for by using draws from any latent trend SD parameters. If \code{FALSE}, uncertainty in the latent trend component is ignored when calculating predictions. If no \code{newdata} is supplied, draws from the fitted model's posterior predictive distribution will be used (which will always include uncertainty in any latent trend components)} \item{...}{Ignored} } \value{ A \code{matrix} of dimension \code{n_samples x new_obs}, where \code{n_samples} is the number of posterior samples from the fitted object and \code{n_obs} is the number of observations in \code{newdata} } \description{ Compute posterior draws of the posterior predictive distribution. Can be performed for the data used to fit the model (posterior predictive checks) or for new data. By definition, these draws have higher variance than draws of the expected value of the posterior predictive distribution computed by \code{\link{posterior_epred.mvgam}}. This is because the residual error is incorporated in \code{posterior_predict}. However, the estimated means of both methods averaged across draws should be very similar. } \details{ Note that for all types of predictions for models that did not include a \code{trend_formula}, uncertainty in the dynamic trend component can be ignored by setting \code{process_error = FALSE}. However, if a \code{trend_formula} was supplied in the model, predictions for this component cannot be ignored. If \code{process_error = TRUE}, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from \code{brms} models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a \code{mvgam} model, while the forecasting functions \code{\link{plot_mvgam_fc}} and \code{\link{forecast.mvgam}} are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends. } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) } } \seealso{ \code{\link{hindcast.mvgam}}, \code{\link{posterior_linpred.mvgam}}, \code{\link{posterior_epred.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/pp_check.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ppc.mvgam.R \name{pp_check.mvgam} \alias{pp_check.mvgam} \alias{pp_check} \title{Posterior Predictive Checks for \code{mvgam} models} \usage{ \method{pp_check}{mvgam}( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... ) } \arguments{ \item{object}{An object of class \code{mvgam}} \item{type}{Type of the ppc plot as given by a character string. See \code{\link[bayesplot:PPC-overview]{PPC}} for an overview of currently supported types. You may also use an invalid type (e.g. \code{type = "xyz"}) to get a list of supported types in the resulting error message.} \item{ndraws}{Positive integer indicating how many posterior draws should be used. If \code{NULL} all draws are used. If not specified, the number of posterior draws is chosen automatically. Ignored if \code{draw_ids} is not \code{NULL}.} \item{prefix}{The prefix of the \pkg{bayesplot} function to be applied. Either `"ppc"` (posterior predictive check; the default) or `"ppd"` (posterior predictive distribution), the latter being the same as the former except that the observed data is not shown for `"ppd"`.} \item{group}{Optional name of a factor variable in the model by which to stratify the ppc plot. This argument is required for ppc \code{*_grouped} types and ignored otherwise.} \item{x}{Optional name of a variable in the model. Only used for ppc types having an \code{x} argument and ignored otherwise.} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the variables included in the linear predictor of \code{formula}. If not supplied, predictions are generated for the original observations used for the model fit. Ignored if using one of the residual plots (i.e. 'resid_hist')} \item{...}{Further arguments passed to \code{\link{predict.mvgam}} as well as to the PPC function specified in \code{type}} } \value{ A ggplot object that can be further customized using the \pkg{ggplot2} package. } \description{ Perform unconditional posterior predictive checks with the help of the \pkg{bayesplot} package. } \details{ Unlike the conditional posterior checks provided by \code{\link{ppc}}, This function computes \emph{unconditional} posterior predictive checks (i.e. it generates predictions for fake data without considering the true observations associated with those fake data). For a detailed explanation of each of the ppc functions, see the \code{\link[bayesplot:PPC-overview]{PPC}} documentation of the \pkg{\link[bayesplot:bayesplot-package]{bayesplot}} package. } \examples{ \dontrun{ simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) # Use pp_check(mod, type = "xyz") for a list of available plot types # Default is a density overlay for all observations pp_check(mod) # Rootograms particularly useful for count data pp_check(mod, type = "rootogram") # Grouping plots by series is useful pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "stat_freqpoly_grouped", group = "series", ndraws = 50 ) # Several types can be used to plot distributions of randomized # quantile residuals pp_check( object = mod, x = "season", type = "resid_ribbon" ) pp_check( object = mod, x = "season", group = "series", type = "resid_ribbon_grouped" ) pp_check(mod, ndraws = 5, type = "resid_hist_grouped", group = "series" ) # Custom functions accepted pp_check(mod, type = "stat", stat = function(x) mean(x == 0)) pp_check(mod, type = "stat_grouped", stat = function(x) mean(x == 0), group = "series" ) # Some functions accept covariates to set the x-axes pp_check(mod, x = "season", type = "ribbon_grouped", prob = 0.5, prob_outer = 0.8, group = "series" ) # Many plots can be made without the observed data pp_check(mod, prefix = "ppd") } } \seealso{ \code{\link{ppc}}, \code{\link{predict.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/ppc.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ppc.mvgam.R \name{ppc.mvgam} \alias{ppc.mvgam} \alias{ppc} \title{Plot conditional posterior predictive checks from \pkg{mvgam} models} \usage{ ppc(object, ...) \method{ppc}{mvgam}( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... ) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{...}{Further \code{\link[graphics]{par}} graphical parameters} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing at least 'series' and 'time' for the forecast horizon, in addition to any other variables included in the linear predictor of \code{formula}. If included, the observed values in the test data are compared to the model's forecast distribution for exploring biases in model predictions. Note this is only useful if the same \code{newdata} was also included when fitting the original model.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{series}{\code{integer} specifying which series in the set is to be plotted} \item{type}{\code{character} specifying the type of posterior predictive check to calculate and plot. Valid options are: 'rootogram', 'mean', 'hist', 'density', 'prop_zero', 'pit' and 'cdf'} \item{n_bins}{\code{integer} specifying the number of bins to use for binning observed values when plotting a rootogram or histogram. Default is \code{50} bins for a rootogram, which means that if there are >50 unique observed values, bins will be used to prevent overplotting and facilitate interpretation. Default for a histogram is to use the number of bins returned by a call to \code{hist} in base \code{R}} \item{legend_position}{The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location. Or alternatively, use "none" to hide the legend.} \item{xlab}{Label for x axis} \item{ylab}{Label for y axis} } \value{ A base \code{R} graphics plot showing either a posterior rootogram (for \code{type == 'rootogram'}), the predicted vs observed mean for the series (for \code{type == 'mean'}), predicted vs observed proportion of zeroes for the series (for \code{type == 'prop_zero'}), predicted vs observed histogram for the series (for \code{type == 'hist'}), kernel density or empirical CDF estimates for posterior predictions (for \code{type == 'density'} or \code{type == 'cdf'}) or a Probability Integral Transform histogram (for \code{type == 'pit'}). } \description{ Plot conditional posterior predictive checks from \pkg{mvgam} models } \details{ Conditional posterior predictions are drawn from the fitted \code{mvgam} and compared against the empirical distribution of the observed data for a specified series to help evaluate the model's ability to generate unbiased predictions. For all plots apart from \code{type = 'rootogram'}, posterior predictions can also be compared to out of sample observations as long as these observations were included as 'data_test' in the original model fit and supplied here. Rootograms are currently only plotted using the 'hanging' style. Note that the predictions used for these plots are \emph{conditional on the observed data}, i.e. they are those predictions that have been generated directly within the \code{mvgam()} model. They can be misleading if the model included flexible dynamic trend components. For a broader range of posterior checks that are created using \emph{unconditional} "new data" predictions, see \code{\link{pp_check.mvgam}} } \examples{ \dontrun{ # Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim( 1, n = 200, scale = 2 ) mod <- mvgam( y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Posterior checks ppc(mod, type = "hist") ppc(mod, type = "density") ppc(mod, type = "cdf") # Many more options are available with pp_check() pp_check(mod) pp_check(mod, type = "ecdf_overlay") pp_check(mod, type = "freqpoly") } } \seealso{ \code{\link{pp_check.mvgam}}, \code{\link{predict.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/predict.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/predict.mvgam.R \name{predict.mvgam} \alias{predict.mvgam} \title{Predict from a fitted \pkg{mvgam} model} \usage{ \method{predict}{mvgam}( object, newdata, data_test, type = "link", process_error = FALSE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} or \code{jsdgam}. See \code{\link[=mvgam]{mvgam()}}} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables that were included in the original \code{data} used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.} \item{data_test}{Deprecated. Still works in place of \code{newdata} but users are recommended to use \code{newdata} instead for more seamless integration into \code{R} workflows} \item{type}{When this has the value \code{link} (default) the linear predictor is calculated on the link scale. If \code{expected} is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When \code{response} is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When \code{variance} is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When \code{type = "terms"}, each component of the linear predictor is returned separately in the form of a \code{list} (possibly with standard errors, if \code{summary = TRUE}): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type \code{latent_N} will return the estimated latent abundances from an N-mixture distribution, while type \code{detection} will return the estimated detection probability from an N-mixture distribution} \item{process_error}{Logical. If \code{TRUE} and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from a stationary, zero-centred multivariate Normal distribution using any estimated process variance-covariance parameters. If \code{FALSE}, uncertainty in the latent trend component is ignored when calculating predictions} \item{summary}{Should summary statistics be returned instead of the raw values? Default is \code{TRUE}..} \item{robust}{If \code{FALSE} (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If \code{TRUE}, the median and the median absolute deviation (MAD) are applied instead. Only used if \code{summary} is \code{TRUE}.} \item{probs}{The percentiles to be computed by the \code{quantile} function. Only used if \code{summary} is \code{TRUE}.} \item{...}{Ignored} } \value{ Predicted values on the appropriate scale. If \code{summary = FALSE} and \code{type != "terms"}, the output is a matrix of dimension \verb{n_draw x n_observations} containing predicted values for each posterior draw in \code{object}. If \code{summary = TRUE} and \code{type != "terms"}, the output is an \code{n_observations} x \code{E} matrix. The number of summary statistics \code{E} is equal to \code{2 + length(probs)}: The \code{Estimate} column contains point estimates (either mean or median depending on argument \code{robust}), while the \code{Est.Error} column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument \code{robust}). The remaining columns starting with \code{Q} contain quantile estimates as specified via argument \code{probs}. If \code{type = "terms"} and \code{summary = FALSE}, the output is a named \code{list} containing a separate slot for each effect, with the effects returned as matrices of dimension \verb{n_draw x 1}. If \code{summary = TRUE}, the output resembles that from \code{\link[mgcv]{predict.gam}} when using the call \code{predict.gam(object, type = "terms", se.fit = TRUE)}, where mean contributions from each effect are returned in \code{matrix} form while standard errors (representing the interval: \code{(max(probs) - min(probs)) / 2}) are returned in a separate \code{matrix} } \description{ Predict from a fitted \pkg{mvgam} model } \details{ Note that if your model included a latent temporal trend (i.e. if you used something other than \code{"None"} for the \code{trend_model} argument), the predictions returned by this function will ignore autocorrelation coefficients or GP length scale coefficients by \emph{assuming the process is stationary}. This approach is similar to how predictions are computed from other types of regression models that can include correlated residuals, \emph{ultimately treating the temporal dynamics as random effect nuisance parameters}. The \code{predict} function is therefore more suited to scenario-based posterior simulation from the GAM components of a \code{mvgam} model, while the hindcast / forecast functions \code{\link[=hindcast.mvgam]{hindcast.mvgam()}} and \code{\link[=forecast.mvgam]{forecast.mvgam()}} are better suited to generate predictions that respect the temporal dynamics of estimated latent trends at the actual time points supplied in \code{data} and \code{newdata}. } \examples{ \dontrun{ # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(123) simdat <- sim_mvgam( seasonality = 'hierarchical', prop_trend = 0.75, trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality # and AR(1) dynamics mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = AR(), noncentred = TRUE, chains = 2, silent = 2 ) # Generate predictions against observed data preds <- predict( mod1, summary = TRUE ) head(preds) # Generate predictions against test data preds <- predict( mod1, newdata = simdat$data_test, summary = TRUE ) head(preds) # Use plot_predictions(), which relies on predict() # to more easily see how the latent AR(1) dynamics are # being ignored when using predict() plot_predictions( mod1, by = c('time', 'series', 'series'), points = 0.5 ) # Using the hindcast() function will give a more accurate # representation of how the AR(1) processes were estimated to give # accurate predictions to the in-sample training data hc <- hindcast(mod1) plot(hc) + plot(hc, series = 2) + plot(hc, series = 3) } } \seealso{ \code{\link[=hindcast.mvgam]{hindcast.mvgam()}}, \code{\link[=forecast.mvgam]{forecast.mvgam()}}, \code{\link[=fitted.mvgam]{fitted.mvgam()}}, \code{\link[=augment.mvgam]{augment.mvgam()}} } \author{ Nicholas J Clark } ================================================ FILE: man/print.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/print.mvgam.R \name{print.mvgam} \alias{print.mvgam} \title{Print a fitted \pkg{mvgam} object} \usage{ \method{print}{mvgam}(x, ...) } \arguments{ \item{x}{\code{list} object returned from \code{mvgam}} \item{...}{Ignored} } \value{ A \code{list} is printed on-screen } \description{ This function takes a fitted \code{mvgam} or \code{jsdgam} object and prints a quick summary. } \details{ A brief summary of the model's call is printed } \author{ Nicholas J Clark } ================================================ FILE: man/print.mvgam_summary.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/summary.mvgam.R \name{print.mvgam_summary} \alias{print.mvgam_summary} \title{Print method for mvgam_summary objects} \usage{ \method{print}{mvgam_summary}(x, ...) } \arguments{ \item{x}{An object of class \code{mvgam_summary}} \item{...}{Additional arguments (ignored)} } \value{ Invisibly returns the input object after printing } \description{ Print method for mvgam_summary objects } ================================================ FILE: man/reexports.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/as.data.frame.mvgam.R, R/conditional_effects.R, % R/forecast.mvgam.R, R/get_mvgam_priors.R, R/loo.mvgam.R, % R/marginaleffects.mvgam.R, R/mcmc_plot.mvgam.R, R/mvgam_formulae.R, % R/posterior_epred.mvgam.R, R/stan_utils.R, R/tidier_methods.R \docType{import} \name{reexports} \alias{reexports} \alias{as_draws} \alias{as_draws_matrix} \alias{as_draws_df} \alias{as_draws_array} \alias{as_draws_list} \alias{as_draws_rvars} \alias{conditional_effects} \alias{forecast} \alias{prior} \alias{prior_} \alias{set_prior} \alias{prior_string} \alias{loo} \alias{loo_compare} \alias{predictions} \alias{avg_predictions} \alias{plot_predictions} \alias{slopes} \alias{plot_slopes} \alias{comparisons} \alias{plot_comparisons} \alias{datagrid} \alias{hypotheses} \alias{get_predict} \alias{get_data} \alias{mcmc_plot} \alias{gp} \alias{s} \alias{te} \alias{ti} \alias{t2} \alias{posterior_predict} \alias{posterior_epred} \alias{posterior_linpred} \alias{stancode} \alias{standata} \alias{tidy} \alias{augment} \title{Objects exported from other packages} \keyword{internal} \description{ These objects are imported from other packages. Follow the links below to see their documentation. \describe{ \item{brms}{\code{\link[brms:conditional_effects.brmsfit]{conditional_effects}}, \code{\link[brms]{gp}}, \code{\link[brms:mcmc_plot.brmsfit]{mcmc_plot}}, \code{\link[brms:set_prior]{prior}}, \code{\link[brms:set_prior]{prior_}}, \code{\link[brms:set_prior]{prior_string}}, \code{\link[brms]{set_prior}}, \code{\link[brms]{stancode}}, \code{\link[brms]{standata}}} \item{generics}{\code{\link[generics]{augment}}, \code{\link[generics]{forecast}}, \code{\link[generics]{tidy}}} \item{insight}{\code{\link[insight]{get_data}}} \item{loo}{\code{\link[loo]{loo}}, \code{\link[loo]{loo_compare}}} \item{marginaleffects}{\code{\link[marginaleffects:predictions]{avg_predictions}}, \code{\link[marginaleffects]{comparisons}}, \code{\link[marginaleffects]{datagrid}}, \code{\link[marginaleffects]{get_predict}}, \code{\link[marginaleffects]{hypotheses}}, \code{\link[marginaleffects]{plot_comparisons}}, \code{\link[marginaleffects]{plot_predictions}}, \code{\link[marginaleffects]{plot_slopes}}, \code{\link[marginaleffects]{predictions}}, \code{\link[marginaleffects]{slopes}}} \item{mgcv}{\code{\link[mgcv]{s}}, \code{\link[mgcv]{t2}}, \code{\link[mgcv]{te}}, \code{\link[mgcv:te]{ti}}} \item{posterior}{\code{\link[posterior:draws]{as_draws}}, \code{\link[posterior:draws_array]{as_draws_array}}, \code{\link[posterior:draws_df]{as_draws_df}}, \code{\link[posterior:draws_list]{as_draws_list}}, \code{\link[posterior:draws_matrix]{as_draws_matrix}}, \code{\link[posterior:draws_rvars]{as_draws_rvars}}} \item{rstantools}{\code{\link[rstantools]{posterior_epred}}, \code{\link[rstantools]{posterior_linpred}}, \code{\link[rstantools]{posterior_predict}}} }} ================================================ FILE: man/residual_cor.jsdgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/residual_cor.R \name{residual_cor.jsdgam} \alias{residual_cor.jsdgam} \alias{residual_cor} \alias{residual_cor.mvgam} \title{Extract residual correlations based on latent factors} \usage{ residual_cor(object, ...) \method{residual_cor}{mvgam}( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) \method{residual_cor}{jsdgam}( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} resulting from a call to \code{\link[=jsdgam]{jsdgam()}} or a call to \code{\link[=mvgam]{mvgam()}} in which either \code{use_lv = TRUE} or a multivariate process was used with \code{cor = TRUE} (see \code{\link[=RW]{RW()}} and \code{\link[=VAR]{VAR()}} for examples)} \item{...}{ignored} \item{summary}{Should summary statistics be returned instead of the raw values? Default is \code{TRUE}..} \item{robust}{If \code{FALSE} (the default) the mean is used as a measure of central tendency. If \code{TRUE}, the median is used instead. Only used if \code{summary} is \code{TRUE}} \item{probs}{The percentiles to be computed by the \code{quantile} function. Only used if \code{summary} is \code{TRUE}.} } \value{ If \code{summary = TRUE}, a \code{list} of \code{\link{mvgam_residcor-class}} with the following components: \item{cor, cor_lower, cor_upper}{A set of \eqn{p \times p} correlation matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to \code{probs}} \item{sig_cor}{A \eqn{p \times p} correlation matrix containing only correlations whose credible interval does not contain zero. All other correlations are set to zero} \item{prec, prec_lower, prec_upper}{A set of \eqn{p \times p} precision matrices, containing either the posterior median or mean estimate, plus lower and upper limits of the corresponding credible intervals supplied to \code{probs}} \item{sig_prec}{A \eqn{p \times p} precision matrix containing only precisions whose credible interval does not contain zero. All other precisions are set to zero} \item{cov}{A \eqn{p \times p} posterior median or mean covariance matrix} \item{trace}{The median/mean point estimator of the trace (sum of the diagonal elements) of the residual covariance matrix \code{cov}} If \code{summary = FALSE}, this function returns a \code{list} containing the following components: \item{all_cormat}{A \eqn{n_{draws} \times p \times p} \code{array} of posterior residual correlation matrix draws} \item{all_covmat}{A \eqn{n_{draws} \times p \times p} \code{array} of posterior residual covariance matrix draws} \item{all_presmat}{A \eqn{n_{draws} \times p \times p} \code{array} of posterior residual precision matrix draws} \item{all_trace}{A \eqn{n_{draws}} \code{vector} of posterior covariance trace draws} } \description{ Compute residual correlation estimates from Joint Species Distribution (\code{jsdgam}) or \code{mvgam} models that either used latent factors or included correlated process errors directly } \details{ See \code{\link{mvgam_residcor-class}} for a description of the quantities that are computed and returned by this function, along with key references. } \examples{ \dontrun{ # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot residual correlations plot( residual_cor(mod) ) # Compare to a residual ordination biplot if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # Not needed for general use; cleans up connections for automated testing closeAllConnections() } } \references{ Hui, F. K. C. (2016). boral – Bayesian Ordination and Regression Analysis of Multivariate Abundance Data in r. \emph{Methods in Ecology and Evolution}, 7(6), 744-750. \doi{10.1111/2041-210X.12514} } \seealso{ \code{\link[=jsdgam]{jsdgam()}}, \code{\link[=lv_correlations]{lv_correlations()}}, \code{\link{mvgam_residcor-class}} } ================================================ FILE: man/residuals.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/residuals.mvgam.R \name{residuals.mvgam} \alias{residuals.mvgam} \title{Posterior draws of residuals from \pkg{mvgam} models} \usage{ \method{residuals}{mvgam}(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...) } \arguments{ \item{object}{An object of class \code{mvgam}} \item{summary}{Should summary statistics be returned instead of the raw values? Default is \code{TRUE}..} \item{robust}{If \code{FALSE} (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If \code{TRUE}, the median and the median absolute deviation (MAD) are applied instead. Only used if \code{summary} is \code{TRUE}.} \item{probs}{The percentiles to be computed by the \code{quantile} function. Only used if \code{summary} is \code{TRUE}.} \item{...}{Ignored} } \value{ An \code{array} of randomized quantile residual values. If \code{summary = FALSE} the output resembles those of \code{\link{posterior_epred.mvgam}} and \code{\link{predict.mvgam}}. If \code{summary = TRUE} the output is an \code{n_observations} x \code{E} matrix. The number of summary statistics \code{E} is equal to \code{2 + length(probs)}. The \code{Estimate} column contains point estimates (either mean or median depending on argument \code{robust}), while the \code{Est.Error} column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument \code{robust}). The remaining columns starting with \code{Q} contain quantile estimates as specified via argument \code{probs}. } \description{ This method extracts posterior draws of Dunn-Smyth (randomized quantile) residuals in the order in which the data were supplied to the model. It includes additional arguments for obtaining summaries of the computed residuals. } \details{ This method gives residuals as Dunn-Smyth (randomized quantile) residuals. Any observations that were missing (i.e. \code{NA}) in the original data will have missing values in the residuals. } \examples{ \dontrun{ # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract posterior residuals resids <- residuals(mod) str(resids) # Or add them directly to the observed data, along with fitted values augment(mod, robust = FALSE, probs = c(0.25, 0.75)) } } \seealso{ \code{\link{augment.mvgam}} } \author{ Nicholas J Clark } ================================================ FILE: man/score.mvgam_forecast.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/score.mvgam_forecast.R \name{score.mvgam_forecast} \alias{score.mvgam_forecast} \alias{score} \title{Compute probabilistic forecast scores for \pkg{mvgam} models} \usage{ \method{score}{mvgam_forecast}( object, score = "crps", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...) } \arguments{ \item{object}{\code{mvgam_forecast} object. See \code{\link[=forecast.mvgam]{forecast.mvgam()}}.} \item{score}{\code{character} specifying the type of proper scoring rule to use for evaluation. Options are: \code{sis} (i.e. the Scaled Interval Score), \code{energy}, \code{variogram}, \code{elpd} (i.e. the Expected log pointwise Predictive Density), \code{drps} (i.e. the Discrete Rank Probability Score), \code{crps} (the Continuous Rank Probability Score) or \code{brier} (the latter of which is only applicable for \code{bernoulli} models. Note that when choosing \code{elpd}, the supplied object must have forecasts on the \code{link} scale so that expectations can be calculated prior to scoring. If choosing \code{brier}, the object must have forecasts on the \code{expected} scale (i.e. probability predictions). For all other scores, forecasts should be supplied on the \code{response} scale (i.e. posterior predictions)} \item{log}{\code{logical}. Should the forecasts and truths be logged prior to scoring? This is often appropriate for comparing performance of models when series vary in their observation ranges. Ignored if \code{score = 'brier'}} \item{weights}{optional \code{vector} of weights (where \code{length(weights) == n_series}) for weighting pairwise correlations when evaluating the variogram score for multivariate forecasts. Useful for down-weighting series that have larger magnitude observations or that are of less interest when forecasting. Ignored if \code{score != 'variogram'}} \item{interval_width}{proportional value on \verb{[0.05,0.95]} defining the forecast interval for calculating coverage and, if \code{score = 'sis'}, for calculating the interval score. Ignored if \code{score = 'brier'}} \item{n_cores}{\code{integer} specifying number of cores for calculating scores in parallel} \item{...}{Ignored} } \value{ A \code{list} containing scores and interval coverages per forecast horizon. If \code{score \%in\% c('drps', 'crps', 'elpd', 'brier')}, the list will also contain return the sum of all series-level scores per horizon. If \code{score \%in\% c('energy','variogram')}, no series-level scores are computed and the only score returned will be for all series. For all scores apart from \code{elpd} and \code{brier}, the \code{in_interval} column in each series-level slot is a binary indicator of whether or not the true value was within the forecast's corresponding posterior empirical quantiles. Intervals are not calculated when using \code{elpd} because forecasts will only contain the linear predictors } \description{ Compute probabilistic forecast scores for \pkg{mvgam} models } \examples{ \dontrun{ # Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam( y ~ 1, trend_model = RW(), data = data$data_train, newdata = data$data_test, chains = 2, silent = 2 ) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) plot(fc) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores) # An example using binary data data <- sim_mvgam(family = bernoulli()) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = data$data_train, newdata = data$data_test, family = bernoulli(), chains = 2, silent = 2 ) # Extract forecasts on the expectation (probability) scale fc <- forecast(mod, type = 'expected') plot(fc) # Compute Brier scores fc_scores <- score(fc, score = 'brier') str(fc_scores) } } \references{ Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. \emph{Journal of the American Statistical Association}, 102(477), 359-378. \doi{10.1198/016214506000001437} } \seealso{ \code{\link{forecast.mvgam}}, \code{\link{ensemble}} } \author{ Nicholas J Clark } ================================================ FILE: man/series_to_mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/series_to_mvgam.R \name{series_to_mvgam} \alias{series_to_mvgam} \title{Convert timeseries object to format necessary for \pkg{mvgam} models} \usage{ series_to_mvgam(series, freq, train_prop = 0.85) } \arguments{ \item{series}{\code{\link[xts]{xts}} or \code{\link[stats]{ts}} object to be converted to \code{\link{mvgam}} format} \item{freq}{\code{integer}. The seasonal frequency of the series} \item{train_prop}{\code{numeric} stating the proportion of data to use for training. Should be between \code{0.25} and \code{0.95}} } \value{ A \code{list} object containing outputs needed for \code{\link{mvgam}}, including 'data_train' and 'data_test' } \description{ This function converts univariate or multivariate time series (\code{xts} or \code{ts} objects) to the format necessary for \code{\link{mvgam}}. } \examples{ # A ts object example data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) series_to_mvgam(series, frequency(series), 0.85) # An xts object example library(xts) dates <- seq(as.Date("2001-05-01"), length = 30, by = "quarter") data <- cbind( c(gas = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))) ) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) series_to_mvgam(series, freq = 4, train_prop = 0.85) } ================================================ FILE: man/sim_mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/sim_mvgam.R \name{sim_mvgam} \alias{sim_mvgam} \title{Simulate a set of time series for modelling in \pkg{mvgam}} \usage{ sim_mvgam( T = 100, n_series = 3, seasonality = "shared", use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 ) } \arguments{ \item{T}{\code{integer}. Number of observations (timepoints)} \item{n_series}{\code{integer}. Number of discrete time series} \item{seasonality}{\code{character}. Either \code{shared}, meaning that all series share the exact same seasonal pattern, or \code{hierarchical}, meaning that there is a global seasonality but each series' pattern can deviate slightly} \item{use_lv}{\code{logical}. If \code{TRUE}, use dynamic factors to estimate series' latent trends in a reduced dimension format. If \code{FALSE}, estimate independent latent trends for each series} \item{n_lv}{\code{integer}. Number of latent dynamic factors for generating the series' trends. Defaults to \code{0}, meaning that dynamics are estimated independently for each series} \item{trend_model}{\code{character} specifying the time series dynamics for the latent trend. Options are: \itemize{ \item \code{None} (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by \code{\link[mgcv]{gam}}) \item \code{RW} (random walk with possible drift) \item \code{AR1} (with possible drift) \item \code{AR2} (with possible drift) \item \code{AR3} (with possible drift) \item \code{VAR1} (contemporaneously uncorrelated VAR1) \item \code{VAR1cor} (contemporaneously correlated VAR1) \item \code{GP} (Gaussian Process with squared exponential kernel) } See \link{mvgam_trends} for more details} \item{drift}{\code{logical}, simulate a drift term for each trend} \item{prop_trend}{\code{numeric}. Relative importance of the trend for each series. Should be between \code{0} and \code{1}} \item{trend_rel}{Deprecated. Use \code{prop_trend} instead} \item{freq}{\code{integer}. The seasonal frequency of the series} \item{family}{\code{family} specifying the exponential observation family for the series. Currently supported families are: \code{nb()}, \code{poisson()}, \code{bernoulli()}, \code{tweedie()}, \code{gaussian()}, \code{betar()}, \code{lognormal()}, \code{student()} and \code{Gamma()}} \item{phi}{\code{vector} of dispersion parameters for the series (i.e. \code{size} for \code{nb()} or \code{phi} for \code{betar()}). If \code{length(phi) < n_series}, the first element of \code{phi} will be replicated \code{n_series} times. Defaults to \code{5} for \code{nb()} and \code{tweedie()}; \code{10} for \code{betar()}} \item{shape}{\code{vector} of shape parameters for the series (i.e. \code{shape} for \code{gamma()}). If \code{length(shape) < n_series}, the first element of \code{shape} will be replicated \code{n_series} times. Defaults to \code{10}} \item{sigma}{\code{vector} of scale parameters for the series (i.e. \code{sd} for \code{gaussian()} or \code{student()}, \code{log(sd)} for \code{lognormal()}). If \code{length(sigma) < n_series}, the first element of \code{sigma} will be replicated \code{n_series} times. Defaults to \code{0.5} for \code{gaussian()} and \code{student()}; \code{0.2} for \code{lognormal()}} \item{nu}{\code{vector} of degrees of freedom parameters for the series (i.e. \code{nu} for \code{student()}). If \code{length(nu) < n_series}, the first element of \code{nu} will be replicated \code{n_series} times. Defaults to \code{3}} \item{mu}{\code{vector} of location parameters for the series. If \code{length(mu) < n_series}, the first element of \code{mu} will be replicated \code{n_series} times. Defaults to small random values between \code{-0.5} and \code{0.5} on the link scale} \item{prop_missing}{\code{numeric} stating proportion of observations that are missing. Should be between \code{0} and \code{0.8}, inclusive} \item{prop_train}{\code{numeric} stating the proportion of data to use for training. Should be between \code{0.2} and \code{1}} } \value{ A \code{list} object containing outputs needed for \code{\link{mvgam}}, including 'data_train' and 'data_test', as well as some additional information about the simulated seasonality and trend dependencies } \description{ This function simulates sets of time series data for fitting a multivariate GAM that includes shared seasonality and dependence on State-Space latent dynamic factors. Random dependencies among series, i.e. correlations in their long-term trends, are included in the form of correlated loadings on the latent dynamic factors } \examples{ # Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series(data = sim_data$data_train, series = 'all') # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series(data = sim_data$data_train, series = 'all') } \references{ Clark, N. J. and Wells, K. (2022). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. \emph{Methods in Ecology and Evolution}, 13(11), 2388-2404. \doi{10.1111/2041-210X.13974} } ================================================ FILE: man/stability.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/stability.R \name{stability.mvgam} \alias{stability.mvgam} \alias{stability} \title{Calculate measures of latent VAR community stability} \usage{ stability(object, ...) \method{stability}{mvgam}(object, ...) } \arguments{ \item{object}{\code{list} object of class \code{mvgam} resulting from a call to \code{\link[=mvgam]{mvgam()}} that used a Vector Autoregressive latent process model (either as \code{VAR(cor = FALSE)} or \code{VAR(cor = TRUE)})} \item{...}{Ignored} } \value{ A \code{data.frame} containing posterior draws for each stability metric. } \description{ Compute reactivity, return rates and contributions of interactions to stationary forecast variance from \pkg{mvgam} models with Vector Autoregressive dynamics. } \details{ These measures of stability can be used to assess how important inter-series dependencies are to the variability of a multivariate system and to ask how systems are expected to respond to environmental perturbations. Using the formula for a latent VAR(1) as: \deqn{ \mu_t \sim \text{MVNormal}(A(\mu_{t - 1}), \Sigma) } this function will calculate the long-term stationary forecast distribution of the system, which has mean \eqn{\mu_{\infty}} and variance \eqn{\Sigma_{\infty}}, to then calculate the following quantities: \itemize{ \item \code{prop_int}: Proportion of the volume of the stationary forecast distribution that is attributable to lagged interactions: \deqn{ det(A)^2 } \if{html}{\out{
}}\preformatted{\\item `prop_int_adj`: Same as `prop_int` but scaled by the number of series \eqn{p}: \deqn{ det(A)^{2/p} } \\item `prop_int_offdiag`: Sensitivity of `prop_int` to inter-series interactions (off-diagonals of \eqn{A}): \deqn{ [2~det(A) (A^{-1})^T] } \\item `prop_int_diag`: Sensitivity of `prop_int` to intra-series interactions (diagonals of \eqn{A}): \deqn{ [2~det(A) (A^{-1})^T] } \\item `prop_cov_offdiag`: Sensitivity of \eqn{\Sigma_{\infty}} to inter-series error correlations: \deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] } \\item `prop_cov_diag`: Sensitivity of \eqn{\Sigma_{\infty}} to error variances: \deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] } \\item `reactivity`: Degree to which the system moves away from a stable equilibrium following a perturbation. If \eqn{\sigma_{max}(A)} is the largest singular value of \eqn{A}: \deqn{ \log\sigma_{max}(A) } \\item `mean_return_rate`: Asymptotic return rate of the mean of the transition distribution to the stationary mean: \deqn{ \max(\lambda_{A}) } \\item `var_return_rate`: Asymptotic return rate of the variance of the transition distribution to the stationary variance: \deqn{ \max(\lambda_{A \otimes A}) } }\if{html}{\out{
}} } Major advantages of using \pkg{mvgam} to compute these metrics are that well-calibrated uncertainties are available and that VAR processes are forced to be stationary. These properties make it simple and insightful to calculate and inspect aspects of both long-term and short-term stability. You can also inspect interactions among the time series in a latent VAR process using \code{\link{irf}} for impulse response functions or \code{\link{fevd}} for forecast error variance decompositions. } \examples{ \dontrun{ # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam( y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Calculate stability metrics for this system metrics <- stability(mod) # Proportion of stationary forecast distribution attributable to interactions hist( metrics$prop_int, xlim = c(0, 1), xlab = 'Prop_int', main = '', col = '#B97C7C', border = 'white' ) # Inter- vs intra-series interaction contributions layout(matrix(1:2, nrow = 2)) hist( metrics$prop_int_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series interactions', col = '#B97C7C', border = 'white' ) hist( metrics$prop_int_diag, xlim = c(0, 1), xlab = 'Contribution to interaction effect', main = 'Intra-series interactions (density dependence)', col = 'darkblue', border = 'white' ) layout(1) # Inter- vs intra-series contributions to forecast variance layout(matrix(1:2, nrow = 2)) hist( metrics$prop_cov_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series covariances', col = '#B97C7C', border = 'white' ) hist( metrics$prop_cov_diag, xlim = c(0, 1), xlab = 'Contribution to forecast variance', main = 'Intra-series variances', col = 'darkblue', border = 'white' ) layout(1) # Reactivity: system response to perturbation hist( metrics$reactivity, main = '', xlab = 'Reactivity', col = '#B97C7C', border = 'white', xlim = c( -1 * max(abs(metrics$reactivity)), max(abs(metrics$reactivity)) ) ) abline(v = 0, lwd = 2.5) } } \references{ AR Ives, B Dennis, KL Cottingham & SR Carpenter (2003). Estimating community stability and ecological interactions from time-series data. \emph{Ecological Monographs}, 73, 301–330. } \seealso{ \code{\link{VAR}}, \code{\link{irf}}, \code{\link{fevd}} } \author{ Nicholas J Clark } ================================================ FILE: man/summary.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/summary.mvgam.R \name{summary.mvgam} \alias{summary.mvgam} \alias{summary.mvgam_prefit} \alias{coef.mvgam} \title{Summary for a fitted \pkg{mvgam} models} \usage{ \method{summary}{mvgam}(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...) \method{summary}{mvgam_prefit}(object, ...) \method{coef}{mvgam}(object, summarise = TRUE, ...) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}} \item{include_betas}{Logical. Print a summary that includes posterior summaries of all linear predictor beta coefficients (including spline coefficients)? Defaults to \code{TRUE} but use \code{FALSE} for a more concise summary} \item{smooth_test}{Logical. Compute estimated degrees of freedom and approximate p-values for smooth terms? Defaults to \code{TRUE}, but users may wish to set to \code{FALSE} for complex models with many smooth or random effect terms} \item{digits}{The number of significant digits for printing out the summary; defaults to \code{2}.} \item{...}{Ignored} \item{summarise}{\code{logical}. Summaries of coefficients will be returned if \code{TRUE}. Otherwise the full posterior distribution will be returned} } \value{ For \code{summary.mvgam}, an object of class \code{mvgam_summary} containing: \itemize{ \item \code{model_spec}: Model specification details (formulas, family, dimensions) \item \code{parameters}: Parameter estimates and significance tests \item \code{diagnostics}: MCMC convergence diagnostics \item \code{sampling_info}: Sampling algorithm details } For \code{summary.mvgam_prefit}, a \code{list} is printed on-screen showing the model specifications For \code{coef.mvgam}, either a \code{matrix} of posterior coefficient distributions (if \code{summarise == FALSE} or \code{data.frame} of coefficient summaries) } \description{ These functions take a fitted \code{mvgam} or \code{jsdgam} object and return various useful summaries } \details{ \code{summary.mvgam} and \code{summary.mvgam_prefit} return brief summaries of the model's call, along with posterior intervals for some of the key parameters in the model. Note that some smooths have extra penalties on the null space, so summaries for the \code{rho} parameters may include more penalty terms than the number of smooths in the original model formula. Approximate p-values for smooth terms are also returned, with methods used for their calculation following those used for \code{mgcv} equivalents (see \code{\link[mgcv]{summary.gam}} for details). The Estimated Degrees of Freedom (edf) for smooth terms is computed using either \code{edf.type = 1} for models with no trend component, or \code{edf.type = 0} for models with trend components. These are described in the documentation for \code{\link[mgcv]{jagam}}. Experiments suggest these p-values tend to be more conservative than those that might be returned from an equivalent model fit with \code{\link[mgcv]{summary.gam}} using \code{method = 'REML'} \code{coef.mvgam} returns either summaries or full posterior estimates for \code{GAM} component coefficients } \examples{ \dontrun{ simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) mod_summary <- summary(mod) mod_summary } } \author{ Nicholas J Clark } ================================================ FILE: man/summary.mvgam_fevd.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_fevd-class.R \name{summary.mvgam_fevd} \alias{summary.mvgam_fevd} \title{Posterior summary of forecast error variance decompositions} \usage{ \method{summary}{mvgam_fevd}(object, probs = c(0.025, 0.975), ...) } \arguments{ \item{object}{an object of class \code{mvgam_fevd} obtained using the \code{fevd()} function. This object will contain draws from the posterior distribution of the forecast error variance decompositions.} \item{probs}{The upper and lower percentiles to be computed by the \code{quantile} function, in addition to the median} \item{...}{ignored} } \value{ A long-format \code{tibble} / \code{data.frame} reporting the posterior median, upper and lower percentiles of the error variance decompositions of each series at all horizons. } \description{ This function takes an \code{mvgam_fevd} object and calculates a posterior summary of the error variance decompositions of each series, at all horizons } \seealso{ \code{\link{fevd}}, \code{\link{plot.mvgam_fevd}} } \author{ Nicholas J Clark } ================================================ FILE: man/summary.mvgam_forecast.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_forecast-class.R \name{summary.mvgam_forecast} \alias{summary.mvgam_forecast} \title{Posterior summary of hindcast and forecast objects} \usage{ \method{summary}{mvgam_forecast}(object, probs = c(0.025, 0.975), ...) } \arguments{ \item{object}{an object of class \code{mvgam_forecast} obtained using either the \code{hindcast()} or \code{function()} function. This object will contain draws from the posterior distribution of hindcasts and forecasts.} \item{probs}{The upper and lower percentiles to be computed by the \code{quantile} function, in addition to the median} \item{...}{ignored} } \value{ A long-format \code{tibble} / \code{data.frame} reporting the posterior median, upper and lower percentiles of the predictions for each series at each of the timepoints that were originally supplied in \code{data} and, optionally, in \code{newdata}. } \description{ This function takes an \code{mvgam_forecast} object and calculates a posterior summary of the hindcast and forecast distributions of each series, along with any true values that were included in \code{data} and \code{newdata} if \code{type = 'response'} was used in the call to \code{hindcast()} or \code{function()} } \seealso{ \code{\link{forecast.mvgam}}, \code{\link{plot.mvgam_forecast}} } \author{ Nicholas J Clark } ================================================ FILE: man/summary.mvgam_irf.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/mvgam_irf-class.R \name{summary.mvgam_irf} \alias{summary.mvgam_irf} \title{Posterior summary of impulse responses} \usage{ \method{summary}{mvgam_irf}(object, probs = c(0.025, 0.975), ...) } \arguments{ \item{object}{an object of class \code{mvgam_irf} obtained using the \code{irf()} function. This object will contain draws from the posterior distribution of the impulse responses.} \item{probs}{The upper and lower percentiles to be computed by the \code{quantile} function, in addition to the median} \item{...}{ignored} } \value{ A long-format \code{tibble} / \code{data.frame} reporting the posterior median, upper and lower percentiles of the impulse responses of each series to shocks from each of the other series at all horizons. } \description{ This function takes an \code{mvgam_irf} object and calculates a posterior summary of the impulse responses of each series to shocks from each of the other series, at all horizons } \seealso{ \code{\link{irf}}, \code{\link{plot.mvgam_irf}} } \author{ Nicholas J Clark } ================================================ FILE: man/tidy.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/tidier_methods.R \name{tidy.mvgam} \alias{tidy.mvgam} \title{Tidy an \code{mvgam} object's parameter posteriors} \usage{ \method{tidy}{mvgam}(x, probs = c(0.025, 0.5, 0.975), ...) } \arguments{ \item{x}{An object of class \code{mvgam}.} \item{probs}{The desired probability levels of the parameters' posteriors. Defaults to \code{c(0.025, 0.5, 0.975)}, i.e. 2.5\%, 50\%, and 97.5\%.} \item{...}{Unused, included for generic consistency only.} } \value{ A \code{tibble} containing: \itemize{ \item "parameter": The parameter in question. \item "type": The component of the model that the parameter belongs to (see details). \item "mean": The posterior mean. \item "sd": The posterior standard deviation. \item percentile(s): Any percentiles of interest from these posteriors. } } \description{ Get parameters' posterior statistics, implementing the generic \code{tidy} from the package \pkg{broom}. } \details{ The parameters are categorized by the column "type". For instance, the intercept of the observation model (i.e. the "formula" arg to \code{mvgam()}) has the "type" "observation_beta". The possible "type"s are: \itemize{ \item observation_family_extra_param: any extra parameters for your observation model, e.g. sigma for a gaussian observation model. These parameters are not directly derived from the latent trend components (contrast to mu). \item observation_beta: betas from your observation model, excluding any smooths. If your formula was \code{y ~ x1 + s(x2, bs='cr')}, then your intercept and \code{x1}'s beta would be categorized as this. \item random_effect_group_level: Group-level random effects parameters, i.e. the mean and sd of the distribution from which the specific random intercepts/slopes are considered to be drawn from. \item random_effect_beta: betas for the individual random intercepts/slopes. \item trend_model_param: parameters from your \code{trend_model}. \item trend_beta: analog of "observation_beta", but for any \code{trend_formula}. \item trend_random_effect_group_level: analog of "random_effect_group_level", but for any \code{trend_formula}. \item trend_random_effect_beta: analog of "random_effect_beta", but for any \code{trend_formula}. } Additionally, GP terms can be incorporated in several ways, leading to different "type"s (or absence!): \itemize{ \item \code{s(bs = "gp")}: No parameters returned. \item \code{gp()} in \code{formula}: "type" of "observation_param". \item \code{gp()} in \code{trend_formula}: "type" of "trend_formula_param". \item \code{GP()} in \code{trend_model}: "type" of "trend_model_param". } } \examples{ \dontrun{ set.seed(0) simdat <- sim_mvgam( T = 100, n_series = 3, trend_model = AR(), prop_trend = 0.75, family = gaussian() ) simdat$data_train$x <- rnorm(nrow(simdat$data_train)) simdat$data_train$year_fac <- factor(simdat$data_train$year) mod <- mvgam( y ~ -1 + s(time, by = series, bs = 'cr', k = 20) + x, trend_formula = ~ s(year_fac, bs = 're') - 1, trend_model = AR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2 ) tidy(mod, probs = c(0.2, 0.5, 0.8)) } } \seealso{ Other tidiers: \code{\link{augment.mvgam}()} } \concept{tidiers} ================================================ FILE: man/update.mvgam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/update.mvgam.R \name{update.mvgam} \alias{update.mvgam} \alias{update.jsdgam} \title{Update an existing \pkg{mvgam} model object} \usage{ \method{update}{mvgam}( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) \method{update}{jsdgam}( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) } \arguments{ \item{object}{\code{list} object returned from \code{mvgam}. See \code{\link[=mvgam]{mvgam()}}} \item{formula}{Optional new \code{formula} object. Note, \code{mvgam} currently does not support dynamic formula updates such as removal of specific terms with \code{- term}. When updating, the entire formula needs to be supplied.} \item{trend_formula}{An optional \code{formula} object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. \strong{Important notes:} \itemize{ \item Should not have a response variable specified on the left-hand side (e.g., \code{~ season + s(year)}) \item Use \code{trend} instead of \code{series} for effects that vary across time series \item Only available for \code{RW()}, \code{AR()} and \code{VAR()} trend models \item In \code{nmix()} family models, sets up linear predictor for latent abundance \item Consider dropping one intercept using \code{- 1} convention to avoid estimation challenges }} \item{knots}{An optional \code{list} containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied. Different terms can use different numbers of knots, unless they share a covariate.} \item{trend_knots}{As for \code{knots} above, this is an optional \code{list} of knot values for smooth functions within the \code{trend_formula}.} \item{trend_model}{\code{character} or \code{function} specifying the time series dynamics for the latent trend. \strong{Available options:} \itemize{ \item \code{None}: No latent trend component (GAM component only, like \code{\link[mgcv]{gam}}) \item \code{ZMVN} or \code{ZMVN()}: Zero-Mean Multivariate Normal (Stan only) \item \code{'RW'} or \code{RW()}: Random Walk \item \code{'AR1'}, \code{'AR2'}, \code{'AR3'} or \code{AR(p = 1, 2, 3)}: Autoregressive models \item \code{'CAR1'} or \code{CAR(p = 1)}: Continuous-time AR (Ornstein–Uhlenbeck process) \item \code{'VAR1'} or \code{VAR()}: Vector Autoregressive (Stan only) \item \code{'PWlogistic'}, \code{'PWlinear'} or \code{PW()}: Piecewise trends (Stan only) \item \code{'GP'} or \code{GP()}: Gaussian Process with squared exponential kernel (Stan only) } \strong{Additional features:} \itemize{ \item Moving average and/or correlated process error terms available for most types (e.g., \code{RW(cor = TRUE)} for multivariate Random Walk) \item Hierarchical correlations possible for structured data \item See \link{mvgam_trends} for details and \code{\link[=ZMVN]{ZMVN()}} for examples }} \item{family}{\code{family} specifying the exponential observation family for the series. \strong{Supported families:} \itemize{ \item \code{gaussian()}: Real-valued data \item \code{betar()}: Proportional data on \verb{(0,1)} \item \code{lognormal()}: Non-negative real-valued data \item \code{student_t()}: Real-valued data \item \code{Gamma()}: Non-negative real-valued data \item \code{bernoulli()}: Binary data \item \code{poisson()}: Count data (default) \item \code{nb()}: Overdispersed count data \item \code{binomial()}: Count data with imperfect detection when number of trials is known (use \code{cbind()} to bind observations and trials) \item \code{beta_binomial()}: As \code{binomial()} but allows for overdispersion \item \code{nmix()}: Count data with imperfect detection when number of trials is unknown (State-Space N-Mixture model with Poisson latent states and Binomial observations) } See \code{\link{mvgam_families}} for more details.} \item{share_obs_params}{\code{logical}. If \code{TRUE} and the \code{family} has additional family-specific observation parameters (e.g., variance components, dispersion parameters), these will be shared across all outcome variables. Useful when multiple outcomes share properties. Default is \code{FALSE}.} \item{data}{A \code{dataframe} or \code{list} containing the model response variable and covariates required by the GAM \code{formula} and optional \code{trend_formula}. \strong{Required columns for most models:} \itemize{ \item \code{series}: A \code{factor} index of the series IDs (number of levels should equal number of unique series labels) \item \code{time}: \code{numeric} or \code{integer} index of time points. For most dynamic trend types, time should be measured in discrete, regularly spaced intervals (i.e., \code{c(1, 2, 3, ...)}). Irregular spacing is allowed for \code{trend_model = CAR(1)}, but zero intervals are adjusted to \code{1e-12} to prevent sampling errors. } \strong{Special cases:} \itemize{ \item Models with hierarchical temporal correlation (e.g., \code{AR(gr = region, subgr = species)}) should NOT include a \code{series} identifier \item Models without temporal dynamics (\code{trend_model = 'None'} or \code{trend_model = ZMVN()}) don't require a \code{time} variable }} \item{newdata}{Optional \code{dataframe} or \code{list} of test data containing the same variables as in \code{data}. If included, observations in variable \code{y} will be set to \code{NA} when fitting the model so that posterior simulations can be obtained.} \item{trend_map}{Optional \code{data.frame} specifying which series should depend on which latent trends. Enables multiple series to depend on the same latent trend process with different observation processes. \strong{Required structure:} \itemize{ \item Column \code{series}: Single unique entry for each series (matching factor levels in data) \item Column \code{trend}: Integer values indicating which trend each series depends on } \strong{Notes:} \itemize{ \item Sets up latent factor model by enabling \code{use_lv = TRUE} \item Process model intercept is NOT automatically suppressed \item Not yet supported for continuous time models (\code{CAR()}) }} \item{use_lv}{\code{logical}. If \code{TRUE}, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for \code{RW()}, \code{AR()} and \code{GP()} trend models. Default is \code{FALSE}. See \code{\link{lv_correlations}} for examples.} \item{n_lv}{\code{integer} specifying the number of latent dynamic factors to use if \code{use_lv == TRUE}. Cannot exceed \code{n_series}. Default is \code{min(2, floor(n_series / 2))}.} \item{priors}{An optional \code{data.frame} with prior definitions or, preferably, a vector of \code{brmsprior} objects (see \code{\link[brms]{prior}()}). See \code{\link[=get_mvgam_priors]{get_mvgam_priors()}} and Details for more information.} \item{chains}{\code{integer} specifying the number of parallel chains for the model. Ignored for variational inference algorithms.} \item{burnin}{\code{integer} specifying the number of warmup iterations to tune sampling algorithms. Ignored for variational inference algorithms.} \item{samples}{\code{integer} specifying the number of post-warmup iterations for sampling the posterior distribution.} \item{threads}{\code{integer}. Experimental option for within-chain parallelisation in Stan using \code{reduce_sum}. Recommended only for experienced Stan users with slow models. Currently works for all families except \code{nmix()} and when using Cmdstan backend.} \item{algorithm}{Character string naming the estimation approach: \itemize{ \item \code{"sampling"}: MCMC (default) \item \code{"meanfield"}: Variational inference with factorized normal distributions \item \code{"fullrank"}: Variational inference with multivariate normal distribution \item \code{"laplace"}: Laplace approximation (cmdstanr only) \item \code{"pathfinder"}: Pathfinder algorithm (cmdstanr only) } Can be set globally via \code{"brms.algorithm"} option. Limited testing suggests \code{"meanfield"} performs best among non-MCMC approximations for dynamic GAMs.} \item{lfo}{\code{logical}. Indicates whether this is part of \link{lfo_cv.mvgam} call. Returns lighter model version for speed. Users should leave as \code{FALSE}.} \item{...}{Other arguments to be passed to \code{\link{mvgam}} or \code{\link{jsdgam}}} \item{factor_formula}{Optional new \code{formula} object for the factor linear predictors} \item{factor_knots}{An optional \code{list} containing user specified knot values to be used for basis construction of any smooth terms in \code{factor_formula}. For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value supplied (note that the number of knots is not always just \code{k}). Different terms can use different numbers of knots, unless they share a covariate} } \value{ A \code{list} object of class \code{mvgam} containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each outcome variable and key information needed for other functions in the package. See \code{\link{mvgam-class}} for details. Use \code{methods(class = "mvgam")} for an overview on available methods. A \code{list} object of class \code{mvgam} containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See \code{\link{mvgam-class}} for details. Use \code{methods(class = "mvgam")} for an overview on available methods. } \description{ This function allows a previously fitted \pkg{mvgam} model to be updated. } \examples{ \dontrun{ # Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) summary(mod) conditional_effects(mod, type = 'link') # Update to an AR2 model updated_mod <- update( mod, trend_model = AR(p = 2), noncentred = TRUE ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') # Now update to a Binomial AR1 by adding information on trials # requires that we supply newdata that contains the 'trials' variable simdat$data_train$trials <- max(simdat$data_train$y) + 15 updated_mod <- update( mod, formula = cbind(y, trials) ~ s(season, bs = 'cc'), noncentred = TRUE, data = simdat$data_train, family = binomial() ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') } } \author{ Nicholas J Clark } ================================================ FILE: memcheck.R ================================================ devtools::load_all() library(testthat) # Test the c++ functions, which can all be done using sim_mvgam() # and forecast() capture_output(sim_mvgam(family = gaussian(), trend_model = RW())) capture_output(sim_mvgam(family = gaussian(), trend_model = AR(p = 1))) capture_output(sim_mvgam(family = gaussian(), trend_model = AR(p = 2))) capture_output(sim_mvgam(family = gaussian(), trend_model = AR(p = 3))) capture_output(sim_mvgam(family = gaussian(), trend_model = VAR())) capture_output(sim_mvgam(family = gaussian(), trend_model = VAR(cor = TRUE))) fc <- forecast( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_examp_dat$data_test ) ================================================ FILE: misc/BeamOptions.tex ================================================ \setbeamertemplate{bibliography item}[text] \setbeamertemplate{caption}[numbered] \usepackage{url,lmodern} \usepackage{amsmath,amsfonts,amssymb,graphicx, amsthm,subfigure,multirow,hyperref,textcomp,natbib} \usepackage[iso,american]{isodate} \usepackage{listings} \usepackage[dvipsnames]{xcolor} \hypersetup{colorlinks=true,urlcolor=BrickRed,linkcolor=BrickRed} \definecolor{mygray}{gray}{0.3} \setbeamertemplate{itemize item}{\color{mygray}$\blacktriangleright$} \addtobeamertemplate{frametitle}{\vskip 0.4in}{} \usepackage{eso-pic} \setlength{\paperwidth}{16.5in} \setlength{\paperheight}{10.5in} \setlength{\mathindent}{5pt} \usepackage{lmodern} \setbeamertemplate{navigation symbols}{} \defbeamertemplate*{footline}{infolines theme} { \leavevmode% \hbox{% \begin{beamercolorbox}[wd=.333333\paperwidth,ht=2.25ex,dp=1ex,center]{author in head/foot}% \usebeamerfont{author in head/foot}\insertshortauthor%~~(\insertshortinstitute) \end{beamercolorbox}% \begin{beamercolorbox}[wd=.333333\paperwidth,ht=2.25ex,dp=1ex,center]{title in head/foot}% \usebeamerfont{title in head/foot}\insertshorttitle \end{beamercolorbox}% \begin{beamercolorbox}[wd=.333333\paperwidth,ht=2.25ex,dp=1ex,right]{date in head/foot}% \usebeamerfont{date in head/foot}\insertshortdate{}\hspace*{2em} \insertframenumber{} / \inserttotalframenumber\hspace*{2ex} \end{beamercolorbox}}% \vskip0pt% } \newcommand\AtPagemyUpperLeft[1]{\AtPageLowerLeft{% \put(\LenToUnit{0.88\paperwidth},\LenToUnit{0.88\paperheight}){#1}}} \AddToShipoutPictureFG{ \AtPagemyUpperLeft{{ \href{https://nicholasjclark.github.io/mvgam/}{\includegraphics[scale=0.5,keepaspectratio]{mvgam_logo.png}} \hspace{0.025in} \href{https://mc-stan.org/}{\includegraphics[scale=0.75,keepaspectratio]{stan_logo.png}} }} }% ================================================ FILE: misc/cache/__packages ================================================ knitr ggplot2 nlme mgcv Rcpp brms marginaleffects insight mvgam ================================================ FILE: misc/mvgam_cheatsheet-concordance.tex ================================================ \Sconcordance{concordance:mvgam_cheatsheet.tex:mvgam_cheatsheet.Rnw:% 1 7 1 50 0 1 6 3 1 1 7 1 1 1 8 95 1 4 0 34 1 % 6 0 17 1 3 0 10 1 1 4 34 1 1 6 8 1 4 0 5 1 9 % 0 22 1} ================================================ FILE: misc/mvgam_cheatsheet.Rnw ================================================ \documentclass[final,9pt,fleqn]{beamer} \input{BeamOptions.tex} \setbeamertemplate{footline}{\hfill {\footnotesize \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC BY-SA 4.0} $\circ$ Nicholas J. Clark $\circ$ Learn more at \href{https://nicholasjclark.github.io/mvgam/index.html}{https://nicholasjclark.github.io/mvgam/index.html} $\circ$ package version $1.0.9$ $\circ$ updated: \today} \hspace {0.1in} \vspace{0.1in}} \begin{document} <>= library(knitr) opts_chunk$set( concordance=TRUE ) @ <>= library(knitr) options(replace.assign=TRUE, width=50, digits=4) opts_knit[["set"]](progress=FALSE) library("ggplot2"); theme_set(theme_classic(base_family = 'serif')) library("mvgam") @ <>= set.seed(1234) simdat <- sim_mvgam(n_series = 1) model <- mvgam(y ~ s(season, bs = 'cc'), trend_model = RW(), data = simdat$data_train) fc <- forecast(model, newdata = simdat$data_test) @ \begin{frame}[fragile] \frametitle{{\fontsize{41}{43} \selectfont \textcolor{mygray}{mvgam ::}} {\fontsize{25}{25} \textbf{\textcolor{mygray}{CHEATSHEET}}}} \vspace{-0.6in} \begin{columns} \begin{column}{0.02\paperwidth} % left margin space \end{column} \begin{column}{0.3\paperwidth} \begin{block} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} The \texttt{mvgam} package provides tools for fitting and interrogating univariate or multivariate State-Space time series models that can include nonlinear smooth functions of covariates, dynamic temporal processes and random effects. A wide variety of latent dynamic processes can be specified. The package also provides tools for interpreting effects, computing and scoring forecasts, as well as generating model code and data objects for further customisation. Models are fitted using \texttt{Stan} for full Bayesian inference. \end{block} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Modelling with \texttt{\color{Orchid} mvgam()}}} Usage: \texttt{\color{Orchid} mvgam(formula, trend\_formula, data, trend\_model, family, ...)} \medskip \texttt{\color{Orchid} formula}: observation model regression formula, built off the \texttt{mgcv} package. See \texttt{\color{Orchid}?mvgam\_formulae} for more guidance \medskip \texttt{\color{Orchid} trend\_formula}: optional process model formula (see \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{the State-Space model vignette} and \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{the shared latent states vignette} for guidance on using trend formulae) \medskip \texttt{\color{Orchid} data}: a \texttt{data.frame} or \texttt{list} containing the response variable(s) and optional predictor variables. See \href{https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html}{the data formatting vignette} for guidance on data preparation \medskip \texttt{\color{Orchid} trend\_model}: optional latent dynamic process. Options include (among others): \begin{itemize} \item\texttt{\color{Orchid} None}: default, no dynamic process \item\texttt{\color{Orchid} RW(ma = FALSE, cor = FALSE)}: random walk \item\texttt{\color{Orchid} AR(p = 1, ma = FALSE, cor = FALSE)}: autoregressive \item\texttt{\color{Orchid} VAR(ma = FALSE, cor = FALSE)}: vector autoregressive \item\texttt{\color{Orchid} PW(growth = 'linear')}: piecewise linear \item\texttt{\color{Orchid} PW(growth = 'logistic')}: piecewise logistic, with max saturation \item\texttt{\color{Orchid} GP()}: squared exponential Gaussian Process \end{itemize} For autoregressive processes (\texttt{\color{Orchid} RW(), AR() or VAR()}), moving average and correlated process errors can also be specified by changing the \texttt{\color{Orchid} ma} and \texttt{\color{Orchid} cor} arguments \medskip \texttt{\color{Orchid} family}: observation distribution. Options include (among others): \begin{itemize} \item\texttt{\color{Orchid} gaussian()}: Gaussian with identity link \item\texttt{\color{Orchid} student-t()}: Student's T with identity link \item\texttt{\color{Orchid} lognormal()}: LogNormal with identity link \item\texttt{\color{Orchid} Gamma()}: Gamma with log link \item\texttt{\color{Orchid} betar()}: Beta with logit link \item\texttt{\color{Orchid} poisson()}: Poisson with log link \item\texttt{\color{Orchid} nb()}: Negative Binomial with log link \end{itemize} \medskip See \href{https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html}{the introductory vignette} for more guidance on supported families and dynamic processes \medskip \texttt{\color{Orchid} ...}: other arguments such as user-specified \texttt{\color{Orchid} priors}, \texttt{\color{Orchid} newdata} for generating probabilistic forecasts and options to control \texttt{\color{Orchid} Stan} MCMC parameters \medskip \textbf{\color{BrickRed} Prior to modelling}, it is useful to: \begin{itemize} \item Inspect features of the data with \texttt{\color{Orchid} plot\_mvgam\_series()} \item Ensure there are no \texttt{\color{Orchid} NA}'s in predictors (though \texttt{\color{Orchid} NA}'s are allowed in response variables). See \href{https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html}{the data formatting vignette} for guidance on data preparation \item Inspect default priors with \texttt{\color{Orchid} get\_mvgam\_priors()} \item Make any necessary changes to default priors with \texttt{\color{Orchid} prior()} \end{itemize} \medskip \texttt{\color{Orchid} sim\_mvgam()} is useful to generate simple example datasets <>= simdat <- sim_mvgam(n_series = 1) model <- mvgam(formula = y ~ s(season, bs = 'cc'), trend_model = RW(), data = simdat$data_train) @ Use \texttt{\color{Orchid} stancode(model)} to see the auto-generated \texttt{Stan} code \end{block} \end{column} \begin{column}{.03\paperwidth} \end{column} \begin{column}{0.3\paperwidth} \vspace{0.52in} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Diagnostics and Inference}} \smallskip {{\fontsize{11}{11} \selectfont \color{mygray} What effects has the model estimated?}} \medskip \texttt{\color{Orchid} summary(model)} and \texttt{\color{Orchid} coef(model)}: posterior summaries and diagnostics \medskip \texttt{\color{Orchid} fitted(model)}, \texttt{\color{Orchid} logLik(model)} and \texttt{\color{Orchid} residuals(model)}: posterior expectations, pointwise Log-Likelihoods and randomized quantile residuals \medskip \texttt{\color{Orchid} loo(model)} and \texttt{\color{Orchid} loo\_compare(model1, model2, ...)}: calculate approximate leave-one-out information criteria for model comparisons \medskip \texttt{\color{Orchid} mcmc\_plot(model)}: visualize posterior summaries, pairs plots and a wide range of MCMC diagnostics using functionality from the \texttt{Bayesplot} package <>= mcmc_plot(model, variable = '(Intercept)', type = 'combo') @ \medskip Use \texttt{\color{Orchid} as.data.frame(model)}, \texttt{\color{Orchid} as.matrix(model)}, or \texttt{\color{Orchid} as.array(model)} to extract posterior parameter estimates. Use \texttt{\color{Orchid} variables(model)} to determine what parameters are available for extraction \medskip The \texttt{S3} \texttt{\color{Orchid} plot()} function applied to models can visualise smooth functions (\texttt{\color{Orchid} type = 'smooths'}), random effects (\texttt{\color{Orchid} type = 're'}), conditional predictions and trend estimates (\texttt{\color{Orchid} type = 'forecast'} or \texttt{\color{Orchid} type = 'trend'}), uncertainty contributions (\texttt{\color{Orchid} type = 'uncertainty'}) or randomized quantile residual diagnostics (\texttt{\color{Orchid} type = 'residuals'}). Use \texttt{\color{Orchid} trend\_effects = TRUE} to visualise effects from any process model formulae \medskip \texttt{\color{Orchid} conditional\_effects(model)} gives useful conditional effect plots on either the response or the link scale \smallskip <>= conditional_effects(model)[[1]] + xlim(c(1, 12.1)) + theme_classic(base_size = 10, base_family = 'serif') @ For most \texttt{mvgam} models, functions from the \texttt{marginaleffects} package can be used for more targeted prediction-based inference. See \href{https://marginaleffects.com/}{The Marginal Effects Zoo} and \href{https://ecogambler.netlify.app/blog/interpreting-gams/}{How to interpret effects from GAMs} for guidance on inspecting predictions, slopes and comparisons <>= post_contrasts <- marginaleffects::avg_comparisons(model, variables = list(season = c(5, 11))) %>% marginaleffects::posteriordraws() post_contrasts %>% ggplot(aes(x = draw)) + tidybayes::stat_halfeye(fill = "#C79999") + labs(x = "(season = 11) − (season = 5) posterior contrast", y = "Density") + theme_classic(base_size = 10, base_family = 'serif') @ \end{block} \end{column} \begin{column}{.03\paperwidth} \end{column} \begin{column}{0.3\paperwidth} \vspace{0.37in} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Prediction and forecasting}} \smallskip {{\fontsize{11}{11} \selectfont \color{mygray} How good are model predictions?}} \medskip Use \texttt{\color{Orchid} predict(model)} with \texttt{\color{Orchid} newdata} to make predictions for inference purposes. Change the \texttt{\color{Orchid} type} argument for different types of predictions (link scale, expectation or response scale). Or use the \texttt{brms} package equivalents \texttt{\color{Orchid} posterior\_predict(model)}, \texttt{\color{Orchid} posterior\_linpred(model)} or \texttt{\color{Orchid} posterior\_epred(model)}. If generating forecasts for future timepoints, use the \texttt{\color{Orchid} forecast()} function (see below) \medskip Use \texttt{\color{Orchid} ppc(model)} and \texttt{\color{Orchid} pp\_check(model)} to compute conditional or unconditional posterior predictive checks and compare model predictions against the true observations \medskip Extract in-sample posterior predictions with \texttt{\color{Orchid} hindcast(model)}. If validation data exist, generate forecast predictions with \texttt{\color{Orchid} forecast(model, newdata = newdata)}. As above, change the \texttt{\color{Orchid} type} argument for predictions on different scales. Both functions generate an object of class \texttt{mvgam\_forecast}, that can be plotted with an \texttt{S3} \texttt{\color{Orchid} plot()} function. See \href{https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html}{the forecasting vignette} for more details about how to produce forecasts. <>= par(family = "serif", las = 1, mar=c(3,3,2,2), mgp = c(2,0.5,0), bty = "l", cex.axis = 0.9, cex.lab = 0.9, cex.main = 0.9, xaxs = 'r', yaxs = 'r', pch = 16) plot(fc) @ \smallskip Compute probabilistic forecast scores using proper scoring rules with the \texttt{\color{Orchid} score()} function: <>= fc <- forecast(model, newdata = simdat$data_test, type = 'response') crps <- score(fc, score = 'crps') dplyr::glimpse(crps$series_1) @ <>= crps <- score(fc, score = 'crps') dplyr::glimpse(crps$series_1) @ \smallskip Available proper scoring rules in the \texttt{\color{Orchid} score()} function include: \begin{itemize} \item \texttt{\color{Orchid} type = 'crps'}: Continuous Rank Probability Score (univariate) \item \texttt{\color{Orchid} type = 'drps'}: Discrete Rank Probability Score (univariate) \item \texttt{\color{Orchid} type = 'elpd'}: Expected Log Predictive Density (univariate) \item \texttt{\color{Orchid} type = 'sis'}: Scaled Interval Score (univariate) \item \texttt{\color{Orchid} type = 'energy'}: Energy Score (multivariate) \item \texttt{\color{Orchid} type = 'variogram'}: Variogram Score (multivariate) \end{itemize} \medskip Use \texttt{\color{Orchid} lfo\_cv(model)} for approximate leave-future-out cross-validation with an expanding window training technique (see \href{https://www.tandfonline.com/doi/full/10.1080/00949655.2020.1783262}{Bürkner et al. 2020} for details of the algorithm). This generates expected log predictive density scores at user-specified forecast horizons, which can be used to compare different models \end{block} \end{column} \end{columns} \end{frame} \end{document} ================================================ FILE: misc/mvgam_cheatsheet.tex ================================================ \documentclass[final,9pt,fleqn]{beamer}\usepackage[]{graphicx}\usepackage[]{xcolor} % maxwidth is the original width if it is less than linewidth % otherwise use linewidth (to make sure the graphics do not exceed the margin) \makeatletter \def\maxwidth{ % \ifdim\Gin@nat@width>\linewidth \linewidth \else \Gin@nat@width \fi } \makeatother \definecolor{fgcolor}{rgb}{0.345, 0.345, 0.345} \newcommand{\hlnum}[1]{\textcolor[rgb]{0.686,0.059,0.569}{#1}}% \newcommand{\hlstr}[1]{\textcolor[rgb]{0.192,0.494,0.8}{#1}}% \newcommand{\hlcom}[1]{\textcolor[rgb]{0.678,0.584,0.686}{\textit{#1}}}% \newcommand{\hlopt}[1]{\textcolor[rgb]{0,0,0}{#1}}% \newcommand{\hlstd}[1]{\textcolor[rgb]{0.345,0.345,0.345}{#1}}% \newcommand{\hlkwa}[1]{\textcolor[rgb]{0.161,0.373,0.58}{\textbf{#1}}}% \newcommand{\hlkwb}[1]{\textcolor[rgb]{0.69,0.353,0.396}{#1}}% \newcommand{\hlkwc}[1]{\textcolor[rgb]{0.333,0.667,0.333}{#1}}% \newcommand{\hlkwd}[1]{\textcolor[rgb]{0.737,0.353,0.396}{\textbf{#1}}}% \let\hlipl\hlkwb \usepackage{framed} \makeatletter \newenvironment{kframe}{% \def\at@end@of@kframe{}% \ifinner\ifhmode% \def\at@end@of@kframe{\end{minipage}}% \begin{minipage}{\columnwidth}% \fi\fi% \def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep \colorbox{shadecolor}{##1}\hskip-\fboxsep % There is no \\@totalrightmargin, so: \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}% \MakeFramed {\advance\hsize-\width \@totalleftmargin\z@ \linewidth\hsize \@setminipage}}% {\par\unskip\endMakeFramed% \at@end@of@kframe} \makeatother \definecolor{shadecolor}{rgb}{.97, .97, .97} \definecolor{messagecolor}{rgb}{0, 0, 0} \definecolor{warningcolor}{rgb}{1, 0, 1} \definecolor{errorcolor}{rgb}{1, 0, 0} \newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX \usepackage{alltt} \input{BeamOptions.tex} \setbeamertemplate{footline}{\hfill {\footnotesize \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC BY-SA 4.0} $\circ$ Nicholas J. Clark $\circ$ Learn more at \href{https://nicholasjclark.github.io/mvgam/index.html}{https://nicholasjclark.github.io/mvgam/index.html} $\circ$ package version $1.0.9$ $\circ$ updated: \today} \hspace {0.1in} \vspace{0.1in}} \IfFileExists{upquote.sty}{\usepackage{upquote}}{} \begin{document} \begin{frame}[fragile] \frametitle{{\fontsize{41}{43} \selectfont \textcolor{mygray}{mvgam ::}} {\fontsize{25}{25} \textbf{\textcolor{mygray}{CHEATSHEET}}}} \vspace{-0.6in} \begin{columns} \begin{column}{0.02\paperwidth} % left margin space \end{column} \begin{column}{0.3\paperwidth} \begin{block} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} The \texttt{mvgam} package provides tools for fitting and interrogating univariate or multivariate State-Space time series models that can include nonlinear smooth functions of covariates, dynamic temporal processes and random effects. A wide variety of latent dynamic processes can be specified. The package also provides tools for interpreting effects, computing and scoring forecasts, as well as generating model code and data objects for further customisation. Models are fitted using \texttt{Stan} for full Bayesian inference. \end{block} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Modelling with \texttt{\color{Orchid} mvgam()}}} Usage: \texttt{\color{Orchid} mvgam(formula, trend\_formula, data, trend\_model, family, ...)} \medskip \texttt{\color{Orchid} formula}: observation model regression formula, built off the \texttt{mgcv} package. See \texttt{\color{Orchid}?mvgam\_formulae} for more guidance \medskip \texttt{\color{Orchid} trend\_formula}: optional process model formula (see \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{the State-Space model vignette} and \href{https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html}{the shared latent states vignette} for guidance on using trend formulae) \medskip \texttt{\color{Orchid} data}: a \texttt{data.frame} or \texttt{list} containing the response variable(s) and optional predictor variables. See \href{https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html}{the data formatting vignette} for guidance on data preparation \medskip \texttt{\color{Orchid} trend\_model}: optional latent dynamic process. Options include (among others): \begin{itemize} \item\texttt{\color{Orchid} None}: default, no dynamic process \item\texttt{\color{Orchid} RW(ma = FALSE, cor = FALSE)}: random walk \item\texttt{\color{Orchid} AR(p = 1, ma = FALSE, cor = FALSE)}: autoregressive \item\texttt{\color{Orchid} VAR(ma = FALSE, cor = FALSE)}: vector autoregressive \item\texttt{\color{Orchid} PW(growth = 'linear')}: piecewise linear \item\texttt{\color{Orchid} PW(growth = 'logistic')}: piecewise logistic, with max saturation \item\texttt{\color{Orchid} GP()}: squared exponential Gaussian Process \end{itemize} For autoregressive processes (\texttt{\color{Orchid} RW(), AR() or VAR()}), moving average and correlated process errors can also be specified by changing the \texttt{\color{Orchid} ma} and \texttt{\color{Orchid} cor} arguments \medskip \texttt{\color{Orchid} family}: observation distribution. Options include (among others): \begin{itemize} \item\texttt{\color{Orchid} gaussian()}: Gaussian with identity link \item\texttt{\color{Orchid} student-t()}: Student's T with identity link \item\texttt{\color{Orchid} lognormal()}: LogNormal with identity link \item\texttt{\color{Orchid} Gamma()}: Gamma with log link \item\texttt{\color{Orchid} betar()}: Beta with logit link \item\texttt{\color{Orchid} poisson()}: Poisson with log link \item\texttt{\color{Orchid} nb()}: Negative Binomial with log link \end{itemize} \medskip See \href{https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html}{the introductory vignette} for more guidance on supported families and dynamic processes \medskip \texttt{\color{Orchid} ...}: other arguments such as user-specified \texttt{\color{Orchid} priors}, \texttt{\color{Orchid} newdata} for generating probabilistic forecasts and options to control \texttt{\color{Orchid} Stan} MCMC parameters \medskip \textbf{\color{BrickRed} Prior to modelling}, it is useful to: \begin{itemize} \item Inspect features of the data with \texttt{\color{Orchid} plot\_mvgam\_series()} \item Ensure there are no \texttt{\color{Orchid} NA}'s in predictors (though \texttt{\color{Orchid} NA}'s are allowed in response variables). See \href{https://nicholasjclark.github.io/mvgam/articles/data_in_mvgam.html}{the data formatting vignette} for guidance on data preparation \item Inspect default priors with \texttt{\color{Orchid} get\_mvgam\_priors()} \item Make any necessary changes to default priors with \texttt{\color{Orchid} prior()} \end{itemize} \medskip \texttt{\color{Orchid} sim\_mvgam()} is useful to generate simple example datasets \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe} \begin{alltt} \hlstd{simdat} \hlkwb{<-} \hlkwd{sim_mvgam}\hlstd{(}\hlkwc{n_series} \hlstd{=} \hlnum{1}\hlstd{)} \hlstd{model} \hlkwb{<-} \hlkwd{mvgam}\hlstd{(}\hlkwc{formula} \hlstd{= y} \hlopt{~} \hlkwd{s}\hlstd{(season,} \hlkwc{bs} \hlstd{=} \hlstr{'cc'}\hlstd{),} \hlkwc{trend_model} \hlstd{=} \hlkwd{RW}\hlstd{(),} \hlkwc{data} \hlstd{= simdat}\hlopt{$}\hlstd{data_train)} \end{alltt} \end{kframe} \end{knitrout} Use \texttt{\color{Orchid} stancode(model)} to see the auto-generated \texttt{Stan} code \end{block} \end{column} \begin{column}{.03\paperwidth} \end{column} \begin{column}{0.3\paperwidth} \vspace{0.52in} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Diagnostics and Inference}} \smallskip {{\fontsize{11}{11} \selectfont \color{mygray} What effects has the model estimated?}} \medskip \texttt{\color{Orchid} summary(model)} and \texttt{\color{Orchid} coef(model)}: posterior summaries and diagnostics \medskip \texttt{\color{Orchid} fitted(model)}, \texttt{\color{Orchid} logLik(model)} and \texttt{\color{Orchid} residuals(model)}: posterior expectations, pointwise Log-Likelihoods and randomized quantile residuals \medskip \texttt{\color{Orchid} loo(model)} and \texttt{\color{Orchid} loo\_compare(model1, model2, ...)}: calculate approximate leave-one-out information criteria for model comparisons \medskip \texttt{\color{Orchid} mcmc\_plot(model)}: visualize posterior summaries, pairs plots and a wide range of MCMC diagnostics using functionality from the \texttt{Bayesplot} package \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor} {\centering \includegraphics[width=\maxwidth]{figure/unnamed-chunk-4-1} } \end{knitrout} \medskip Use \texttt{\color{Orchid} as.data.frame(model)}, \texttt{\color{Orchid} as.matrix(model)}, or \texttt{\color{Orchid} as.array(model)} to extract posterior parameter estimates. Use \texttt{\color{Orchid} variables(model)} to determine what parameters are available for extraction \medskip The \texttt{S3} \texttt{\color{Orchid} plot()} function applied to models can visualise smooth functions (\texttt{\color{Orchid} type = 'smooths'}), random effects (\texttt{\color{Orchid} type = 're'}), conditional predictions and trend estimates (\texttt{\color{Orchid} type = 'forecast'} or \texttt{\color{Orchid} type = 'trend'}), uncertainty contributions (\texttt{\color{Orchid} type = 'uncertainty'}) or randomized quantile residual diagnostics (\texttt{\color{Orchid} type = 'residuals'}). Use \texttt{\color{Orchid} trend\_effects = TRUE} to visualise effects from any process model formulae \medskip \texttt{\color{Orchid} conditional\_effects(model)} gives useful conditional effect plots on either the response or the link scale \smallskip \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor} {\centering \includegraphics[width=\maxwidth]{figure/unnamed-chunk-5-1} } \end{knitrout} For most \texttt{mvgam} models, functions from the \texttt{marginaleffects} package can be used for more targeted prediction-based inference. See \href{https://marginaleffects.com/}{The Marginal Effects Zoo} and \href{https://ecogambler.netlify.app/blog/interpreting-gams/}{How to interpret effects from GAMs} for guidance on inspecting predictions, slopes and comparisons \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor} {\centering \includegraphics[width=\maxwidth]{figure/unnamed-chunk-6-1} } \end{knitrout} \end{block} \end{column} \begin{column}{.03\paperwidth} \end{column} \begin{column}{0.3\paperwidth} \vspace{0.37in} \noindent\makebox[\linewidth]{\rule{0.3\paperwidth}{0.2pt}} \begin{block}{{\fontsize{21}{21} \selectfont \color{BrickRed} Prediction and forecasting}} \smallskip {{\fontsize{11}{11} \selectfont \color{mygray} How good are model predictions?}} \medskip Use \texttt{\color{Orchid} predict(model)} with \texttt{\color{Orchid} newdata} to make predictions for inference purposes. Change the \texttt{\color{Orchid} type} argument for different types of predictions (link scale, expectation or response scale). Or use the \texttt{brms} package equivalents \texttt{\color{Orchid} posterior\_predict(model)}, \texttt{\color{Orchid} posterior\_linpred(model)} or \texttt{\color{Orchid} posterior\_epred(model)}. If generating forecasts for future timepoints, use the \texttt{\color{Orchid} forecast()} function (see below) \medskip Use \texttt{\color{Orchid} ppc(model)} and \texttt{\color{Orchid} pp\_check(model)} to compute conditional or unconditional posterior predictive checks and compare model predictions against the true observations \medskip Extract in-sample posterior predictions with \texttt{\color{Orchid} hindcast(model)}. If validation data exist, generate forecast predictions with \texttt{\color{Orchid} forecast(model, newdata = newdata)}. As above, change the \texttt{\color{Orchid} type} argument for predictions on different scales. Both functions generate an object of class \texttt{mvgam\_forecast}, that can be plotted with an \texttt{S3} \texttt{\color{Orchid} plot()} function. See \href{https://nicholasjclark.github.io/mvgam/articles/forecast_evaluation.html}{the forecasting vignette} for more details about how to produce forecasts. \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor} {\centering \includegraphics[width=\maxwidth]{figure/unnamed-chunk-7-1} } \end{knitrout} \smallskip Compute probabilistic forecast scores using proper scoring rules with the \texttt{\color{Orchid} score()} function: \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe} \begin{alltt} \hlstd{fc} \hlkwb{<-} \hlkwd{forecast}\hlstd{(model,} \hlkwc{newdata} \hlstd{= simdat}\hlopt{$}\hlstd{data_test,} \hlkwc{type} \hlstd{=} \hlstr{'response'}\hlstd{)} \hlstd{crps} \hlkwb{<-} \hlkwd{score}\hlstd{(fc,} \hlkwc{score} \hlstd{=} \hlstr{'crps'}\hlstd{)} \hlstd{dplyr}\hlopt{::}\hlkwd{glimpse}\hlstd{(crps}\hlopt{$}\hlstd{series_1)} \end{alltt} \end{kframe} \end{knitrout} \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe} \begin{verbatim} ## Rows: 25 ## Columns: 5 ## $ score 0.2315, 0.3944, 0.7198, 0~ ## $ in_interval 1, 1, 1, 1, 1, 1, 1, 0, 1~ ## $ interval_width 0.9, 0.9, 0.9, 0.9, 0.9, ~ ## $ eval_horizon 1, 2, 3, 4, 5, 6, 7, 8, 9~ ## $ score_type "crps", "crps", "crps", "~ \end{verbatim} \end{kframe} \end{knitrout} \smallskip Available proper scoring rules in the \texttt{\color{Orchid} score()} function include: \begin{itemize} \item \texttt{\color{Orchid} type = 'crps'}: Continuous Rank Probability Score (univariate) \item \texttt{\color{Orchid} type = 'drps'}: Discrete Rank Probability Score (univariate) \item \texttt{\color{Orchid} type = 'elpd'}: Expected Log Predictive Density (univariate) \item \texttt{\color{Orchid} type = 'sis'}: Scaled Interval Score (univariate) \item \texttt{\color{Orchid} type = 'energy'}: Energy Score (multivariate) \item \texttt{\color{Orchid} type = 'variogram'}: Variogram Score (multivariate) \end{itemize} \medskip Use \texttt{\color{Orchid} lfo\_cv(model)} for approximate leave-future-out cross-validation with an expanding window training technique (see \href{https://www.tandfonline.com/doi/full/10.1080/00949655.2020.1783262}{Bürkner et al. 2020} for details of the algorithm). This generates expected log predictive density scores at user-specified forecast horizons, which can be used to compare different models \end{block} \end{column} \end{columns} \end{frame} \end{document} ================================================ FILE: pkgdown/_pkgdown.yml ================================================ url: https://nicholasjclark.github.io/mvgam/ code: run_dont_run: true authors: Nicholas J Clark: href: https://researchers.uq.edu.au/researcher/15140 template: bootstrap: 5 bootswatch: pulse navbar: structure: left: [intro, articles, reference, news] right: [search, github] ================================================ FILE: pkgdown/extra.css ================================================ table.ref-index{ overflow-x: visible; overflow-y: auto; } ================================================ FILE: pkgdown/extra.scss ================================================ $font-size-base: 0.8rem !default; // Assumes the browser default, typically `16px` ================================================ FILE: src/.gitignore ================================================ mvgam.dll ================================================ FILE: src/Makevars ================================================ PKG_CXXFLAGS = -DARMA_USE_CURRENT PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) ================================================ FILE: src/Makevars.win ================================================ PKG_CXXFLAGS = -DARMA_USE_CURRENT PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) ================================================ FILE: src/RcppExports.cpp ================================================ // Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include #include using namespace Rcpp; #ifdef RCPP_USE_GLOBAL_ROSTREAM Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); #endif // ar3_recursC Rcpp::NumericVector ar3_recursC(double drift, double ar1, double ar2, double ar3, Rcpp::NumericVector linpreds, Rcpp::NumericVector errors, Rcpp::NumericVector last_trends, int h); RcppExport SEXP _mvgam_ar3_recursC(SEXP driftSEXP, SEXP ar1SEXP, SEXP ar2SEXP, SEXP ar3SEXP, SEXP linpredsSEXP, SEXP errorsSEXP, SEXP last_trendsSEXP, SEXP hSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< double >::type drift(driftSEXP); Rcpp::traits::input_parameter< double >::type ar1(ar1SEXP); Rcpp::traits::input_parameter< double >::type ar2(ar2SEXP); Rcpp::traits::input_parameter< double >::type ar3(ar3SEXP); Rcpp::traits::input_parameter< Rcpp::NumericVector >::type linpreds(linpredsSEXP); Rcpp::traits::input_parameter< Rcpp::NumericVector >::type errors(errorsSEXP); Rcpp::traits::input_parameter< Rcpp::NumericVector >::type last_trends(last_trendsSEXP); Rcpp::traits::input_parameter< int >::type h(hSEXP); rcpp_result_gen = Rcpp::wrap(ar3_recursC(drift, ar1, ar2, ar3, linpreds, errors, last_trends, h)); return rcpp_result_gen; END_RCPP } // var1_recursC arma::mat var1_recursC(arma::mat A, arma::mat linpreds, arma::mat errors, arma::rowvec drift, arma::rowvec last_trends, int h); RcppExport SEXP _mvgam_var1_recursC(SEXP ASEXP, SEXP linpredsSEXP, SEXP errorsSEXP, SEXP driftSEXP, SEXP last_trendsSEXP, SEXP hSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< arma::mat >::type A(ASEXP); Rcpp::traits::input_parameter< arma::mat >::type linpreds(linpredsSEXP); Rcpp::traits::input_parameter< arma::mat >::type errors(errorsSEXP); Rcpp::traits::input_parameter< arma::rowvec >::type drift(driftSEXP); Rcpp::traits::input_parameter< arma::rowvec >::type last_trends(last_trendsSEXP); Rcpp::traits::input_parameter< int >::type h(hSEXP); rcpp_result_gen = Rcpp::wrap(var1_recursC(A, linpreds, errors, drift, last_trends, h)); return rcpp_result_gen; END_RCPP } // varma_recursC arma::mat varma_recursC(arma::mat A, arma::mat A2, arma::mat A3, arma::mat theta, arma::mat linpreds, arma::mat errors, arma::rowvec drift, arma::mat last_trends, int h); RcppExport SEXP _mvgam_varma_recursC(SEXP ASEXP, SEXP A2SEXP, SEXP A3SEXP, SEXP thetaSEXP, SEXP linpredsSEXP, SEXP errorsSEXP, SEXP driftSEXP, SEXP last_trendsSEXP, SEXP hSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< arma::mat >::type A(ASEXP); Rcpp::traits::input_parameter< arma::mat >::type A2(A2SEXP); Rcpp::traits::input_parameter< arma::mat >::type A3(A3SEXP); Rcpp::traits::input_parameter< arma::mat >::type theta(thetaSEXP); Rcpp::traits::input_parameter< arma::mat >::type linpreds(linpredsSEXP); Rcpp::traits::input_parameter< arma::mat >::type errors(errorsSEXP); Rcpp::traits::input_parameter< arma::rowvec >::type drift(driftSEXP); Rcpp::traits::input_parameter< arma::mat >::type last_trends(last_trendsSEXP); Rcpp::traits::input_parameter< int >::type h(hSEXP); rcpp_result_gen = Rcpp::wrap(varma_recursC(A, A2, A3, theta, linpreds, errors, drift, last_trends, h)); return rcpp_result_gen; END_RCPP } static const R_CallMethodDef CallEntries[] = { {"_mvgam_ar3_recursC", (DL_FUNC) &_mvgam_ar3_recursC, 8}, {"_mvgam_var1_recursC", (DL_FUNC) &_mvgam_var1_recursC, 6}, {"_mvgam_varma_recursC", (DL_FUNC) &_mvgam_varma_recursC, 9}, {NULL, NULL, 0} }; RcppExport void R_init_mvgam(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } ================================================ FILE: src/trend_funs.cpp ================================================ // [[Rcpp::depends("RcppArmadillo")]] #include #include #define _USE_MATH_DEFINES #include using namespace Rcpp; // function for recursively extending an AR3 process //' @noRd // [[Rcpp::export]] Rcpp::NumericVector ar3_recursC(double drift, double ar1, double ar2, double ar3, Rcpp::NumericVector linpreds, Rcpp::NumericVector errors, Rcpp::NumericVector last_trends, int h) { int T = h + 3; Rcpp::NumericVector states(T); states(0) = last_trends(0); states(1) = last_trends(1); states(2) = last_trends(2); for(int t = 3; t < T; ++t) { states(t) = drift + ar1 * (states(t - 1) - linpreds(t - 1)) + ar2 * (states(t - 2) - linpreds(t - 2)) + ar3 * (states(t - 3) - linpreds(t - 3)) + linpreds(t) + errors(t); } return states[Rcpp::Range(3, T-1)]; } // function for recursively extending a VAR1 process //' @noRd // [[Rcpp::export]] arma::mat var1_recursC(arma::mat A, arma::mat linpreds, arma::mat errors, arma::rowvec drift, arma::rowvec last_trends, int h) { int T = h + 1; int n_series = A.n_rows; arma::mat states(T, n_series); states.row(0) = last_trends; for (int t = 1; t < T; t++) { states.row(t) = (states.row(t-1) - linpreds.row(t-1)) * trans(A) + linpreds.row(t) + drift + errors.row(t); } return states.rows(1, h); } // function for recursively extending a VARMA(1-3,0-1) process //' @noRd // [[Rcpp::export]] arma::mat varma_recursC( arma::mat A, arma::mat A2, arma::mat A3, arma::mat theta, arma::mat linpreds, arma::mat errors, arma::rowvec drift, arma::mat last_trends, int h) { // total number of timepoints int T = h + 3; // total number of series int n_series = A.n_rows; // states arma::mat states(T, n_series); // initialise states states.row(0) = last_trends.row(0); states.row(1) = last_trends.row(1); states.row(2) = last_trends.row(2); // VARMA(3,1) process for (int t = 3; t < T; t++) { states.row(t) = // autoregressive means (states.row(t-1) - linpreds.row(t-1)) * trans(A) + (states.row(t-2) - linpreds.row(t-2)) * trans(A2) + (states.row(t-3) - linpreds.row(t-3)) * trans(A3) + // moving averages errors.row(t-1) * trans(theta) + // linear predictor contributions linpreds.row(t) + // drift terms drift + // stochastic errors errors.row(t); } return states.rows(3, T-1); } ================================================ FILE: tests/local/setup_tests_local.R ================================================ # Setup models for tests locally library("testthat") library("mvgam") set.seed(123) expect_match2 <- function(object, regexp) { any(grepl(regexp, object, fixed = TRUE)) } expect_range <- function(object, lower = -Inf, upper = Inf, ...) { testthat::expect_true(all(object >= lower & object <= upper), ...) } expect_ggplot <- function(object, ...) { testthat::expect_true(is(object, "ggplot"), ...) } SM <- suppressMessages SW <- suppressWarnings context("local tests") ================================================ FILE: tests/local/tests-models1.R ================================================ source("setup_tests_local.R") #### Simulated data to test post-processing #### gaus_data <- sim_mvgam( family = gaussian(), T = 60, trend_model = 'AR1', seasonality = 'shared', mu = c(-1, 0, 1), prop_trend = 0.5, prop_missing = 0.2 ) pois_data <- sim_mvgam( family = poisson(), trend_model = AR(), prop_trend = 0.5, mu = c(1, 3, 5), T = 60 ) beta_data <- sim_mvgam( family = betar(), trend_model = GP(), mu = c(-1.5, 0, 1.5), prop_trend = 0.75, T = 60 ) #### Simple models, trying meanfield and sampling #### gaus_ar <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 5) - 1, trend_model = AR(), data = gaus_data$data_train, family = gaussian(), algorithm = 'meanfield', samples = 200 ) gaus_arfc <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 5) - 1, trend_model = AR(), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), algorithm = 'meanfield', samples = 200 ) pois_ar <- mvgam( y ~ series, trend_formula = ~ s(season, bs = 'cc', k = 5), trend_model = AR(), data = pois_data$data_train, family = poisson(), samples = 200 ) pois_arfc <- mvgam( y ~ series, trend_formula = ~ s(season, bs = 'cc', k = 5), trend_model = AR(), data = pois_data$data_train, newdata = pois_data$data_test, family = poisson(), samples = 200 ) beta_gp <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ gp(time, by = trend), data = beta_data$data_train, family = betar(), priors = prior(normal(0, 0.1), class = ar1), trend_model = AR(), samples = 200 ) #### Tests for the simple models #### test_that("lfo_cv working properly", { lfcv <- lfo_cv(gaus_arfc, min_t = 42) expect_true(inherits(lfcv, 'mvgam_lfo')) expect_true(all.equal(lfcv$eval_timepoints, c(43, 44))) }) test_that('mvgam poisson forecasts agree with Stan', { # Forecasts made from within Stan should broadly agree with forecasts # made from forecast() score_stan <- plot_mvgam_fc( pois_arfc, series = 1, newdata = gaus_data$data_test, return_score = TRUE ) score_mvgam <- plot_mvgam_fc( pois_ar, series = 1, newdata = gaus_data$data_test, return_score = TRUE ) expect_equal(score_mvgam$score, score_stan$score, tolerance = 3) }) test_that("loo working properly", { loomod <- SW(loo(pois_arfc)) expect_true(inherits(loomod, 'psis_loo')) loomod <- SW(loo(gaus_arfc)) expect_true(inherits(loomod, 'psis_loo')) }) test_that("gp model gives correct predictions", { p <- conditional_effects(beta_gp) expect_true(inherits(p, 'mvgam_conditional_effects')) expect_ggplot(conditional_effects(beta_gp)[[1]]) expect_ggplot(conditional_effects(beta_gp)[[2]]) post_modes <- coef(beta_gp)[, 2] expect_true(post_modes[2] < post_modes[3]) expect_true(post_modes[3] < post_modes[4]) ar <- colMeans(as.matrix(beta_gp, variable = "ar1", regex = TRUE)) expect_range(ar[1], -0.15, 0.15) expect_range(ar[2], -0.15, 0.15) expect_range(ar[3], -0.15, 0.15) expect_equal(dim(fitted(beta_gp)), c(NROW(beta_data$data_train), 4)) }) #### A continuous time AR example #### sim_corcar1 = function(n = 120, phi = 0.5, sigma = 1, sigma_obs = 0.75) { # Sample irregularly spaced time intervals time_dis <- c(0, runif(n - 1, -0.1, 1)) time_dis[time_dis < 0] <- 0 time_dis <- time_dis * 5 # Set up the latent dynamic process x <- vector(length = n) x[1] <- -0.3 for (i in 2:n) { # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if (time_dis[i] == 0) { x[i] <- rnorm(1, mean = (phi^1e-12) * x[i - 1], sd = sigma) } else { x[i] <- rnorm(1, mean = (phi^time_dis[i]) * x[i - 1], sd = sigma) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1:n) / 12) cov2 <- cos(2 * pi * (1:n) / 12) beta1 <- runif(1, 0.3, 0.7) beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame( y = rnorm(n, mean = x + seasonality, sd = sigma), season = rep(1:12, 20)[1:n], time = cumsum(time_dis) ) } # Sample two time series dat <- rbind( dplyr::bind_cols( sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1') ), dplyr::bind_cols( sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2') ) ) %>% dplyr::mutate(series = as.factor(series)) test_that("CAR model runs properly", { # mvgam with CAR(1) trends and series-level seasonal smooths mod <- mvgam( formula = y ~ s(season, bs = 'cc', k = 5, by = series), trend_model = CAR(), data = dat, family = gaussian(), samples = 200 ) expect_true(inherits(mod, 'mvgam')) p <- conditional_effects(mod) expect_true(inherits(p, 'mvgam_conditional_effects')) expect_ggplot(conditional_effects(mod)[[1]]) ar <- colMeans(as.matrix(mod, variable = "ar1", regex = TRUE)) expect_range(ar[1], 0.35, 0.75) expect_range(ar[1], 0.55, 0.95) expect_equal(dim(fitted(mod)), c(NROW(dat), 4)) # State-space formulation should also work mod <- mvgam( formula = y ~ 1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), data = dat, family = gaussian() ) expect_true(inherits(mod, 'mvgam')) p <- conditional_effects(mod) expect_true(inherits(p, 'mvgam_conditional_effects')) expect_ggplot(conditional_effects(mod)[[1]]) ar <- colMeans(as.matrix(mod, variable = "ar1", regex = TRUE)) expect_range(ar[1], 0.35, 0.75) expect_range(ar[1], 0.55, 0.95) expect_equal(dim(fitted(mod)), c(NROW(dat), 4)) }) #### A monotonic smooth example #### # 'by' terms that produce a different smooth for each level of the 'by' # factor set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c(f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) test_that("monotonic smooths behave properly", { # Fit a model with different smooths per factor level mod <- mvgam( y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), samples = 200 ) expect_true(inherits(mod$mgcv_model$smooth[[1]], 'moi.smooth')) expect_ggplot(SM(pp_check(mod))) # First derivatives (on link scale) should never be # negative for either factor level derivs <- slopes(mod, variables = 'x', by = c('x', 'fac'), type = 'link') expect_true(all(derivs$estimate > 0)) }) ================================================ FILE: tests/mvgam_examples.R ================================================ # Small mvgam examples for testing post-fitting functions such as # predict, forecast, hindcast etc... testthat::skip_on_cran() library(mvgam) mvgam_examp_dat <- list( data_train = structure( list( y = c( -1.6435760529886, 0.0576506632876403, -0.398982741359959, 0.166263635072232, NA, -0.178792865387502, 0.0378992006898741, -0.46704324582468, -0.20005752901963, NA, -0.7648331324566, -1.95818875683478, -0.489141832766607, NA, -0.781926449502298, -0.173065622618926, NA, -0.431888938737423, -1.33563987611521, -0.30668079493666, -1.59343527302515, -2.08089938293457 ), season = c( 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L ), year = c( 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), series = structure( c( 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L ), levels = c("series_1", "series_2"), class = "factor" ), time = c( 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L ) ), class = "data.frame", row.names = c(NA, -22L) ), data_test = structure( list( y = c( -0.825903793273796, NA, -0.409364591883054, -0.801934825421605, NA, 0.993612304219531, 0.465708559827663, -0.268653159692507 ), season = c(12L, 12L, 1L, 1L, 2L, 2L, 3L, 3L), year = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), series = structure( c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), levels = c("series_1", "series_2"), class = "factor" ), time = c(12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L) ), class = "data.frame", row.names = c(NA, -8L) ) ) # Univariate process without trend_formula mvgam_example1 <- mvgam( y ~ s(season, k = 4), trend_model = RW(), family = gaussian(), data = mvgam_examp_dat$data_train, burnin = 50, samples = 5, chains = 1, backend = 'rstan' ) # Univariate process with trend_formula, trend_map and correlated process errors trend_map <- data.frame( series = unique(mvgam_examp_dat$data_train$series), trend = c(1, 1) ) mvgam_example2 <- mvgam( y ~ 1, trend_formula = ~ s(season, k = 4), trend_model = RW(cor = TRUE), trend_map = trend_map, family = gaussian(), data = mvgam_examp_dat$data_train, burnin = 50, samples = 5, chains = 1, backend = 'rstan' ) # Multivariate process without trend_formula mvgam_example3 <- mvgam( y ~ s(season, k = 4), trend_model = VAR(cor = TRUE), family = gaussian(), data = mvgam_examp_dat$data_train, burnin = 50, samples = 5, chains = 1, backend = 'rstan', lfo = TRUE ) # GP dynamic factors (use list format to ensure it works in tests) list_data <- list() for (i in 1:NCOL(mvgam_examp_dat$data_train)) { list_data[[i]] <- mvgam_examp_dat$data_train[, i] } names(list_data) <- colnames(mvgam_examp_dat$data_train) mvgam_example4 <- mvgam( y ~ series + s(season, k = 4), trend_model = GP(), family = gaussian(), use_lv = TRUE, n_lv = 2, data = list_data, burnin = 50, samples = 5, chains = 1, backend = 'rstan', lfo = TRUE ) # Save examples as internal data usethis::use_data( mvgam_examp_dat, mvgam_example1, mvgam_example2, mvgam_example3, mvgam_example4, internal = TRUE, overwrite = TRUE, compress = 'xz' ) ================================================ FILE: tests/testthat/_snaps/tidier_methods.md ================================================ # `tidy()` snapshot value of `mvgam_example1` { "type": "list", "attributes": { "names": { "type": "character", "attributes": {}, "value": ["parameter", "type", "mean", "sd", "2.5%", "50%", "97.5%"] }, "row.names": { "type": "integer", "attributes": {}, "value": [1, 2, 3, 4, 5] }, "class": { "type": "character", "attributes": {}, "value": ["tbl_df", "tbl", "data.frame"] } }, "value": [ { "type": "character", "attributes": {}, "value": ["sigma_obs[1]", "sigma_obs[2]", "(Intercept)", "sigma[1]", "sigma[2]"] }, { "type": "character", "attributes": {}, "value": ["observation_family_extra_param", "observation_family_extra_param", "observation_beta", "trend_model_param", "trend_model_param"] }, { "type": "double", "attributes": {}, "value": [0.73, 0.22, -2.4, 0.53, 1] }, { "type": "double", "attributes": {}, "value": [0.24, 0.081, 0.064, 0.18, 0.24] }, { "type": "double", "attributes": {}, "value": [0.5, 0.14, -2.4, 0.39, 0.69] }, { "type": "double", "attributes": {}, "value": [0.7, 0.2, -2.4, 0.45, 0.96] }, { "type": "double", "attributes": {}, "value": [1, 0.33, -2.3, 0.79, 1.3] } ] } # `tidy()` snapshot value of `mvgam_example2` { "type": "list", "attributes": { "names": { "type": "character", "attributes": {}, "value": ["parameter", "type", "mean", "sd", "2.5%", "50%", "97.5%"] }, "row.names": { "type": "integer", "attributes": {}, "value": [1, 2, 3, 4, 5] }, "class": { "type": "character", "attributes": {}, "value": ["tbl_df", "tbl", "data.frame"] } }, "value": [ { "type": "character", "attributes": {}, "value": ["sigma_obs[1]", "sigma_obs[2]", "(Intercept)", "Sigma[1,1]", "(Intercept)_trend"] }, { "type": "character", "attributes": {}, "value": ["observation_family_extra_param", "observation_family_extra_param", "observation_beta", "trend_model_param", "trend_beta"] }, { "type": "double", "attributes": {}, "value": [0.52, 0.87, -1, 0.61, 0.23] }, { "type": "double", "attributes": {}, "value": [0.12, 0.19, 0.31, 0.28, 0.74] }, { "type": "double", "attributes": {}, "value": [0.34, 0.65, -1.3, 0.27, -0.7] }, { "type": "double", "attributes": {}, "value": [0.57, 0.81, -1.1, 0.53, 0.14] }, { "type": "double", "attributes": {}, "value": [0.61, 1.1, -0.56, 0.91, 1.2] } ] } # `tidy()` snapshot value of `mvgam_example3` { "type": "list", "attributes": { "names": { "type": "character", "attributes": {}, "value": ["parameter", "type", "mean", "sd", "2.5%", "50%", "97.5%"] }, "row.names": { "type": "integer", "attributes": {}, "value": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] }, "class": { "type": "character", "attributes": {}, "value": ["tbl_df", "tbl", "data.frame"] } }, "value": [ { "type": "character", "attributes": {}, "value": ["sigma_obs[1]", "sigma_obs[2]", "(Intercept)", "A[1,1]", "A[2,1]", "A[1,2]", "A[2,2]", "Sigma[1,1]", "Sigma[2,1]", "Sigma[1,2]", "Sigma[2,2]"] }, { "type": "character", "attributes": {}, "value": ["observation_family_extra_param", "observation_family_extra_param", "observation_beta", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param"] }, { "type": "double", "attributes": {}, "value": [0.41, 0.55, -0.78, 0.47, -0.28, -0.044, 0.32, 0.17, -0.0057, -0.0057, 0.71] }, { "type": "double", "attributes": {}, "value": [0.064, 0.23, 0.19, 0.35, 0.42, 0.17, 0.51, 0.061, 0.2, 0.2, 0.73] }, { "type": "double", "attributes": {}, "value": [0.34, 0.3, -0.96, 0.061, -0.78, -0.29, -0.25, 0.094, -0.31, -0.31, 0.16] }, { "type": "double", "attributes": {}, "value": [0.4, 0.51, -0.88, 0.55, -0.37, -0.017, 0.66, 0.17, 0.086, 0.086, 0.5] }, { "type": "double", "attributes": {}, "value": [0.5, 0.87, -0.52, 0.82, 0.24, 0.13, 0.71, 0.25, 0.14, 0.14, 1.8] } ] } # `tidy()` snapshot value of `mvgam_example4` { "type": "list", "attributes": { "names": { "type": "character", "attributes": {}, "value": ["parameter", "type", "mean", "sd", "2.5%", "50%", "97.5%"] }, "row.names": { "type": "integer", "attributes": {}, "value": [1, 2, 3, 4, 5, 6, 7, 8] }, "class": { "type": "character", "attributes": {}, "value": ["tbl_df", "tbl", "data.frame"] } }, "value": [ { "type": "character", "attributes": {}, "value": ["sigma_obs[1]", "sigma_obs[2]", "(Intercept)", "seriesseries_2", "rho_gp[1]", "rho_gp[2]", "alpha_gp[1]", "alpha_gp[2]"] }, { "type": "character", "attributes": {}, "value": ["observation_family_extra_param", "observation_family_extra_param", "observation_beta", "observation_beta", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param"] }, { "type": "double", "attributes": {}, "value": [0.73, 0.86, -0.99, 0.51, 12, 7.7, 0.25, 0.25] }, { "type": "double", "attributes": {}, "value": [0.18, 0.12, 0.14, 0.15, 7.3, 5.6, 0, 0] }, { "type": "double", "attributes": {}, "value": [0.51, 0.8, -1.1, 0.38, 7.6, 3.4, 0.25, 0.25] }, { "type": "double", "attributes": {}, "value": [0.75, 0.82, -1, 0.45, 9.2, 4.7, 0.25, 0.25] }, { "type": "double", "attributes": {}, "value": [0.93, 1, -0.84, 0.7, 24, 16, 0.25, 0.25] } ] } # `tidy()` snapshot value of `mvgam_example6` { "type": "list", "attributes": { "names": { "type": "character", "attributes": {}, "value": ["parameter", "type"] }, "row.names": { "type": "integer", "attributes": {}, "value": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65] }, "class": { "type": "character", "attributes": {}, "value": ["tbl_df", "tbl", "data.frame"] } }, "value": [ { "type": "character", "attributes": {}, "value": ["(Intercept)", "speciesspecies_2", "speciesspecies_3", "speciesspecies_4", "ar1[1]", "ar1[2]", "ar1[3]", "ar1[4]", "ar1[5]", "ar1[6]", "ar1[7]", "ar1[8]", "ar1[9]", "ar1[10]", "ar1[11]", "ar1[12]", "alpha_cor", "Sigma_1[1,1]", "Sigma_1[2,1]", "Sigma_1[3,1]", "Sigma_1[4,1]", "Sigma_1[1,2]", "Sigma_1[2,2]", "Sigma_1[3,2]", "Sigma_1[4,2]", "Sigma_1[1,3]", "Sigma_1[2,3]", "Sigma_1[3,3]", "Sigma_1[4,3]", "Sigma_1[1,4]", "Sigma_1[2,4]", "Sigma_1[3,4]", "Sigma_1[4,4]", "Sigma_2[1,1]", "Sigma_2[2,1]", "Sigma_2[3,1]", "Sigma_2[4,1]", "Sigma_2[1,2]", "Sigma_2[2,2]", "Sigma_2[3,2]", "Sigma_2[4,2]", "Sigma_2[1,3]", "Sigma_2[2,3]", "Sigma_2[3,3]", "Sigma_2[4,3]", "Sigma_2[1,4]", "Sigma_2[2,4]", "Sigma_2[3,4]", "Sigma_2[4,4]", "Sigma_3[1,1]", "Sigma_3[2,1]", "Sigma_3[3,1]", "Sigma_3[4,1]", "Sigma_3[1,2]", "Sigma_3[2,2]", "Sigma_3[3,2]", "Sigma_3[4,2]", "Sigma_3[1,3]", "Sigma_3[2,3]", "Sigma_3[3,3]", "Sigma_3[4,3]", "Sigma_3[1,4]", "Sigma_3[2,4]", "Sigma_3[3,4]", "Sigma_3[4,4]"] }, { "type": "character", "attributes": {}, "value": ["observation_beta", "observation_beta", "observation_beta", "observation_beta", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param", "trend_model_param"] } ] } ================================================ FILE: tests/testthat/setup.R ================================================ # Setup models for tests library("testthat") library("mvgam") expect_match2 <- function(object, regexp) { any(grepl(regexp, object, fixed = TRUE)) } expect_character <- function(object, ...) { testthat::expect_true(is(object, "character"), ...) } expect_list <- function(object, ...) { testthat::expect_true(is(object, "list"), ...) } expect_ggplot <- function(object, ...) { testthat::expect_true(is(object, "ggplot"), ...) } expect_loo <- function(object, ...) { testthat::expect_true(is(object, "psis_loo"), ...) } expect_range <- function(object, lower = -Inf, upper = Inf, ...) { testthat::expect_true(all(object >= lower & object <= upper), ...) } SM <- suppressMessages SW <- suppressWarnings set.seed(100) beta_data <- sim_mvgam( family = betar(), trend_model = 'GP', trend_rel = 0.5, T = 60 ) gaus_data <- sim_mvgam( family = gaussian(), T = 60, trend_model = 'AR1', seasonality = 'shared', mu = c(-1, 0, 1), trend_rel = 0.5, prop_missing = 0.2 ) ================================================ FILE: tests/testthat/test-RW.R ================================================ context("RW,AR,VAR,CAR") skip_on_cran() test_that("ma and cor options should work for trends other than VAR", { test <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = AR(p = 1, ma = TRUE), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(test, 'mvgam_prefit')) expect_true(any(grepl( 'vector[n_series]theta;', gsub(' ', '', test$model_file), fixed = TRUE ))) expect_true(attr(test$model_data, 'trend_model') == 'AR1') # Correlation works test <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = AR(p = 1, cor = TRUE), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(test, 'mvgam_prefit')) expect_true(any(grepl( 'error[i]~multi_normal_cholesky(trend_zeros,L_Sigma);', gsub(' ', '', test$model_file), fixed = TRUE ))) expect_true(attr(test$model_data, 'trend_model') == 'AR1') test <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = RW(ma = TRUE), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(test, 'mvgam_prefit')) expect_true(any(grepl( 'vector[n_series]theta;', gsub(' ', '', test$model_file), fixed = TRUE ))) expect_true(attr(test$model_data, 'trend_model') == 'RW') test <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = RW(cor = TRUE), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(test, 'mvgam_prefit')) expect_true(any(grepl( 'error[i]~multi_normal_cholesky(trend_zeros,L_Sigma);', gsub(' ', '', test$model_file), fixed = TRUE ))) expect_true(attr(test$model_data, 'trend_model') == 'RW') }) test_that("VARMAs are set up correctly", { var <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = VAR(), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(var, 'mvgam_prefit')) var <- SW(mvgam( y ~ s(series, bs = 're') + gp(time, c = 5 / 4, k = 20) - 1, trend_model = VAR(), data = gaus_data$data_train, family = gaussian(), run_model = FALSE )) expect_true(inherits(var, 'mvgam_prefit')) varma <- SW(mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, trend_model = 'VARMA', data = gaus_data$data_train, family = gaussian(), run_model = FALSE )) expect_true(any(grepl( '// unconstrained ma inverse partial autocorrelations', varma$model_file, fixed = TRUE ))) varma <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ gp(time, by = trend, c = 5 / 4, k = 15), trend_model = VAR(ma = TRUE), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(any(grepl( '// unconstrained ma inverse partial autocorrelations', varma$model_file, fixed = TRUE ))) }) test_that("hierarchical cors are set up correctly", { # Simulate hierarchical data simdat1 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat2 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat3 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) all_dat <- rbind( simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic') ) %>% dplyr::mutate( species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region) ) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) # Check that all model types can be constructed # Random Walk mod <- mvgam( formula = y ~ species, trend_model = RW(gr = region, subgr = species), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// hierarchical process error correlations', mod$model_file, fixed = TRUE ))) mod <- mvgam( formula = y ~ -1, trend_formula = ~species, trend_model = RW(gr = region, subgr = species), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// hierarchical process error correlations', mod$model_file, fixed = TRUE ))) # AR mod <- mvgam( formula = y ~ species, trend_model = AR(gr = region, subgr = species, p = 2), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// hierarchical process error correlations', mod$model_file, fixed = TRUE ))) mod <- mvgam( formula = y ~ -1, trend_formula = ~species, trend_model = AR(gr = region, subgr = species, p = 3), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// hierarchical process error correlations', mod$model_file, fixed = TRUE ))) # VAR mod <- mvgam( formula = y ~ species, trend_model = VAR(gr = region, subgr = species), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// derived group-level VAR covariance matrices', mod$model_file, fixed = TRUE ))) mod <- mvgam( formula = y ~ -1, trend_formula = ~species, trend_model = VAR(gr = region, subgr = species), data = all_dat, run_model = FALSE, autoformat = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'Sigma[group_inds[g], group_inds[g]] = multiply_lower_tri_self_transpose(L_Sigma_group[g]);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// derived group-level VAR covariance matrices', mod$model_file, fixed = TRUE ))) }) test_that("site variable must be numeric for ZMVN", { site_dat <- data.frame( site = as.character(rep(1:10, 4)), species = as.factor(c(sort(rep(letters[1:4], 9)), 'q', 'q', 'q', 'q')), y = rpois(40, 3) ) trend_model <- ZMVN(unit = site, subgr = species) expect_error( mvgam:::validate_series_time(data = site_dat, trend_model = trend_model), 'Variable "site" must be either numeric or integer type' ) }) test_that("Each subgroup must exist within each site for ZMVN", { site_dat <- data.frame( site = rep(1:10, 4), species = as.factor(c(sort(rep(letters[1:4], 9)), 'q', 'q', 'q', 'q')), y = rpois(40, 3) ) trend_model <- ZMVN(unit = site, subgr = species) expect_error( mvgam:::validate_series_time(data = site_dat, trend_model = trend_model), 'One or more series in data is missing observations for one or more timepoints' ) # Should work if all species were recorded in all sites (even if NA) site_dat <- data.frame( site = rep(1:10, 4), species = as.factor(c(sort(rep(letters[1:4], 10)))), y = c(NA, rpois(39, 3)) ) trend_model <- ZMVN(unit = site, subgr = species) expect_no_error(mvgam:::validate_series_time( data = site_dat, trend_model = trend_model )) mod <- mvgam( formula = y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, run_model = FALSE ) expect_equal(attr(mod$obs_data, 'implicit_vars'), c('series', 'time')) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'trend_raw[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma);', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', mod$model_file, fixed = TRUE ))) mod <- mvgam( formula = y ~ -1, trend_formula = ~species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'LV_raw[i] ~ multi_normal_cholesky(trend_mus[ytimes_trend[i, 1 : n_lv]]', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( 'L_Sigma = diag_pre_multiply(sigma, L_Omega);', mod$model_file, fixed = TRUE ))) }) # Replicate CAR1 example # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 120, phi = 0.5, sigma = 1, sigma_obs = 0.75) { # Sample irregularly spaced time intervals time_dis <- c(1, runif(n - 1, 0, 5)) # Set up the latent dynamic process x <- vector(length = n) x[1] <- -0.3 for (i in 2:n) { # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if (time_dis[i] == 0) { x[i] <- rnorm( 1, mean = (phi^1e-3) * x[i - 1], sd = sigma * (1 - phi^(2 * 1e-3)) / (1 - phi^2) ) } else { x[i] <- rnorm( 1, mean = (phi^time_dis[i]) * x[i - 1], sd = sigma * (1 - phi^(2 * time_dis[i])) / (1 - phi^2) ) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1:n) / 12) cov2 <- cos(2 * pi * (1:n) / 12) beta1 <- runif(1, 0.3, 0.7) beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame( y = rnorm(n, mean = x + seasonality, sd = sigma), season = rep(1:12, 20)[1:n], time = cumsum(time_dis) ) } # Sample two time series dat <- rbind( dplyr::bind_cols( sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1') ), dplyr::bind_cols( sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2') ) ) %>% dplyr::mutate(series = as.factor(series)) dat_train <- dat %>% dplyr::group_by(series) %>% dplyr::arrange(time) %>% dplyr::slice_head(n = 110) %>% dplyr::ungroup() dat_test <- dat %>% dplyr::group_by(series) %>% dplyr::arrange(time) %>% dplyr::slice_tail(n = 10) %>% dplyr::ungroup() test_that("CAR1 sets up correctly", { # mvgam with CAR(1) trends and series-level seasonal smooths mod <- mvgam( formula = y ~ s(season, bs = 'cc', k = 5, by = series), trend_model = CAR(), noncentred = FALSE, data = dat_train, family = gaussian(), run_model = FALSE, backend = 'cmdstanr' ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(exists('time_dis', mod$model_data)) expect_true(exists('index..time..index', mod$obs_data)) expect_true(attr(mod$model_data, 'trend_model') == 'CAR1') expect_true(any(grepl( 'vector[n_series]ar1;', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Will work for regularly-spaced data as well mod <- mvgam( formula = y ~ s(season, bs = 'cc', k = 5, by = series), trend_model = CAR(), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(exists('time_dis', mod$model_data)) expect_true(all.equal( mean(mod$model_data$time_dis[2:NROW(mod$model_data$time_dis), ]), max(mod$model_data$time_dis[2:NROW(mod$model_data$time_dis), ]), min(mod$model_data$time_dis[2:NROW(mod$model_data$time_dis), ]), 1L )) expect_true(exists('index..time..index', mod$obs_data)) expect_true(attr(mod$model_data, 'trend_model') == 'CAR1') }) ================================================ FILE: tests/testthat/test-backends.R ================================================ context("backends") skip_on_cran() test_that("variationals are converted to stanfit appropriately", { # dat <- data.frame(y = rnorm(100, 0, 1), # series = as.factor('series1')) # # mod <- SM(mvgam(y ~ 1, # # data = dat, # # family = gaussian(), # # algorithm = 'meanfield', # # silent = 2)) # # expect_true(inherits(mod, 'mvgam')) # # mod <- SM(mvgam(y ~ 1, # data = dat, # family = gaussian(), # algorithm = 'fullrank', # silent = 2)) # expect_true(inherits(mod, 'mvgam')) # # mod <- SM(mvgam(y ~ 1, # data = dat, # family = gaussian(), # algorithm = 'laplace', # silent = 2)) # expect_true(inherits(mod, 'mvgam')) }) ================================================ FILE: tests/testthat/test-binomial.R ================================================ context("binomial") # Simulations take a bit of time to set up skip_on_cran() # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials ) ) %>% dplyr::mutate(series = as.factor(series)) %>% dplyr::arrange(time, series) # Throw in some NAs dat$y[c(1, 5, 9)] <- NA # Training and testing splits dat_train <- dat %>% dplyr::filter(time <= 40) dat_test <- dat %>% dplyr::filter(time > 40) test_that("cbind() syntax required for binomial()", { # Initial warning should be issued when calling binomial or beta-binomial expect_warning(mvgam( formula = cbind(y, ntrials) ~ s(series, bs = 're') + gp(x, by = series, c = 5 / 4, k = 5), family = binomial(), data = dat_train, run_model = FALSE )) expect_error( mvgam( y ~ series + s(x, by = series), family = binomial(), data = dat_train, run_model = FALSE ), 'Binomial family requires cbind() syntax in the formula left-hand side', fixed = TRUE ) # Should work if correctly specified mod <- mvgam( cbind(y, ntrials) ~ s(series, bs = 're') + gp(x, by = series, c = 5 / 4, k = 5), family = binomial(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl('flat_ys ~ binomial(', mod$model_file, fixed = TRUE))) # Also with a trend_formula mod <- mvgam( cbind(y, ntrials) ~ series, trend_formula = ~ s(x, by = trend), family = binomial(), trend_model = AR(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl('flat_ys ~ binomial(', mod$model_file, fixed = TRUE))) # Also with no predictors mod <- mvgam( cbind(y, ntrials) ~ 1, family = binomial(), trend_model = AR(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl('flat_ys ~ binomial(', mod$model_file, fixed = TRUE))) }) test_that("binomial() post-processing works", { mod <- SW(mvgam( cbind(y, ntrials) ~ series, trend_formula = ~ s(x, by = trend), family = binomial(), trend_model = AR(), noncentred = TRUE, data = dat_train, burnin = 500, samples = 200, chains = 2, silent = 2 )) expect_no_error(capture_output(summary(mod))) expect_no_error(capture_output(code(mod))) expect_no_error(capture_output(print(mod))) preds <- predict(mod, summary = FALSE, type = 'response') expect_true(NCOL(preds) == NROW(dat_train)) expect_true(all(preds >= 0L)) preds <- predict(mod, newdata = dat_test, summary = FALSE) expect_true(NCOL(preds) == NROW(dat_test)) expect_no_error(ppc(mod)) expect_no_error(ppc(mod, type = 'density')) expect_no_error(ppc(mod, type = 'mean')) expect_no_error(ppc(mod, type = 'pit')) expect_no_error(ppc(mod, type = 'cdf')) expect_no_error(ppc(mod, type = 'rootogram')) expect_no_error(plot(mod, type = 'residuals')) expect_no_error(plot_mvgam_series(object = mod)) expect_no_error(plot_mvgam_series(object = mod, series = 'all')) expect_no_error(plot(mod, type = 'forecast')) expect_no_error(plot(mod, type = 'forecast', newdata = dat_test)) expect_no_error(plot(mod, type = 'trend')) expect_no_error(plot(mod, type = 'trend', realisations = TRUE)) expect_no_error(plot(mod, type = 'trend', newdata = dat_test)) expect_true(inherits(hindcast(mod), 'mvgam_forecast')) fc <- forecast(mod, newdata = dat_test) expect_true(inherits(fc, 'mvgam_forecast')) expect_no_error(plot(fc)) expect_no_error(plot(fc, realisations = TRUE)) expect_list(score(fc, score = 'drps')) expect_error( score(fc, score = 'brier'), 'cannot evaluate brier scores unless probability predictions are supplied. Use "type == expected" when forecasting instead' ) fc <- forecast(mod, newdata = dat_test, type = 'expected') expect_error( score(fc, score = 'brier'), 'brier score only applicable for Bernoulli forecasts' ) expect_no_error(SW(plot(mod, type = 'smooths', trend_effects = TRUE))) expect_no_error(plot( mod, type = 'smooths', realisations = TRUE, trend_effects = TRUE )) expect_no_error(plot( mod, type = 'smooths', residuals = TRUE, trend_effects = TRUE )) expect_no_error(plot(mod, type = 're', trend_effects = TRUE)) expect_no_error(plot(mod, type = 'pterms')) expect_true(inherits( SM(conditional_effects(mod)), 'mvgam_conditional_effects' )) expect_true(inherits( SM(conditional_effects(mod, type = 'link')), 'mvgam_conditional_effects' )) options(mc.cores = 1) expect_loo(SW(loo(mod))) dat_test2 <- dat_test dat_test2$ntrials <- NULL expect_error( plot(mod, type = 'trend', newdata = dat_test2), 'Variable ntrials not found in newdata' ) expect_error( forecast(mod, newdata = dat_test2), 'Variable ntrials not found in newdata' ) mod <- SW(mvgam( cbind(y, ntrials) ~ series, trend_formula = ~ s(x, by = trend), family = binomial(), trend_model = AR(), noncentred = TRUE, data = dat_train, newdata = dat_test, burnin = 200, samples = 200, chains = 2, silent = 2 )) fc <- forecast(mod) expect_true(inherits(fc, 'mvgam_forecast')) expect_error(plot_mvgam_uncertainty(mod)) expect_error(stability(mod)) }) # All tests should apply to beta_binomial as well test_that("cbind() syntax required for beta_binomial()", { expect_error( SW(mvgam( y ~ series + s(x, by = series), family = beta_binomial(), data = dat_train, run_model = FALSE )), 'Binomial family requires cbind() syntax in the formula left-hand side', fixed = TRUE ) # Should work if correctly specified mod <- mvgam( cbind(y, ntrials) ~ s(series, bs = 're') + gp(x, by = series, c = 5 / 4, k = 5), family = beta_binomial(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'flat_ys ~ beta_binomial(', mod$model_file, fixed = TRUE ))) # Also with a trend_formula mod <- mvgam( cbind(y, ntrials) ~ series, trend_formula = ~ s(x, by = trend), family = beta_binomial(), trend_model = AR(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'flat_ys ~ beta_binomial(', mod$model_file, fixed = TRUE ))) # Also with no predictors and with a prior on phi mod <- mvgam( cbind(y, ntrials) ~ 0, family = beta_binomial(), priors = prior(normal(0, 3), class = phi), trend_model = AR(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl('beta_binomial(', mod$model_file, fixed = TRUE))) expect_true(any(grepl("b[1] = 0;", mod$model_file, fixed = TRUE))) expect_true(any(grepl("phi ~ normal(0, 3);", mod$model_file, fixed = TRUE))) }) test_that("trials variable must be in data for binomial()", { expect_error( mvgam( cbind(y, mytrials) ~ series + s(x, by = series), family = binomial(), data = dat_train, run_model = FALSE ), 'variable mytrials not found in data', fixed = TRUE ) }) # Simulate two time series of Bernoulli draws x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = 1, prob = detprob1), time = 1:50, series = 'series1', x = x ), data.frame( y = rbinom(n = 50, size = 1, prob = detprob2), time = 1:50, series = 'series2', x = x ) ) %>% dplyr::mutate(series = as.factor(series)) %>% dplyr::arrange(time, series) # Throw in some NAs dat$y[c(1, 5, 9)] <- NA # Training and testing splits dat_train <- dat %>% dplyr::filter(time <= 40) dat_test <- dat %>% dplyr::filter(time > 40) test_that("bernoulli() behaves appropriately", { expect_error( mvgam( y ~ series + s(x, by = series), family = bernoulli(), data = gaus_data$data_train, run_model = FALSE ), 'y values must be 0 <= y <= 1', fixed = TRUE ) mod <- mvgam( y ~ s(series, bs = 're') + gp(x, by = series, c = 5 / 4, k = 5), family = bernoulli(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'flat_ys ~ bernoulli_logit_glm(', mod$model_file, fixed = TRUE ))) # Also with a trend_formula mod <- mvgam( y ~ series, trend_formula = ~ gp(x, by = trend, c = 5 / 4, k = 5), trend_model = AR(), family = bernoulli(), data = dat_train, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'flat_ys ~ bernoulli_logit_glm(', mod$model_file, fixed = TRUE ))) }) test_that("bernoulli() post-processing works", { mod <- SW(mvgam( y ~ s(series, bs = 're') + gp(x, by = series, c = 5 / 4, k = 5), trend_model = AR(), priors = prior(normal(0, 0.1), class = ar1), noncentred = TRUE, family = bernoulli(), data = dat_train, burnin = 200, samples = 200, chains = 2, silent = 2 )) expect_no_error(capture_output(summary(mod))) expect_no_error(capture_output(print(mod))) preds <- predict(mod, summary = FALSE, type = 'response') expect_true(NCOL(preds) == NROW(dat_train)) expect_true(all(preds >= 0L)) preds <- predict(mod, newdata = dat_test, summary = FALSE) expect_true(NCOL(preds) == NROW(dat_test)) expect_no_error(ppc(mod)) expect_no_error(ppc(mod, type = 'density')) expect_no_error(ppc(mod, type = 'pit')) expect_no_error(ppc(mod, type = 'cdf')) expect_no_error(plot(mod, type = 'residuals')) expect_no_error(plot_mvgam_series(object = mod, lines = FALSE)) expect_no_error(plot_mvgam_series(object = mod, series = 'all')) expect_no_error(plot(mod, type = 'forecast')) expect_no_error(plot(mod, type = 'forecast', newdata = dat_test)) expect_no_error(plot(mod, type = 'trend')) expect_no_error(plot(mod, type = 'trend', newdata = dat_test)) expect_true(inherits(hindcast(mod), 'mvgam_forecast')) expect_true(inherits(hindcast(mod, type = 'expected'), 'mvgam_forecast')) expect_no_error(plot(mod, type = 're')) expect_no_error(plot(mod, type = 'smooths')) expect_no_error(plot(mod, type = 'smooths', realisations = TRUE)) expect_no_error(plot(mod, type = 'smooths', residuals = TRUE)) expect_true(inherits( SM(conditional_effects(mod)), 'mvgam_conditional_effects' )) expect_true(inherits( SM(conditional_effects(mod, type = 'link')), 'mvgam_conditional_effects' )) options(mc.cores = 1) expect_loo(SW(loo(mod))) }) ================================================ FILE: tests/testthat/test-dynamic.R ================================================ context("dynamic") skip_on_cran() test_that("dynamic to gp spline is working properly", { expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate, rho = 1, stationary = FALSE), N = 100, family = gaussian() )), 'term.labels' ), 's(time, by = covariate, bs = "gp", m = c(2, 1, 2), k = 50)', fixed = TRUE ) # k will decrease as rho increases expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate, rho = 11), N = 100, family = gaussian() )), 'term.labels' ), 's(time, by = covariate, bs = "gp", m = c(-2, 11, 2), k = 11)', fixed = TRUE ) # k will be fixed at N if N <= 8 expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate, rho = 5), N = 7, family = gaussian() )), 'term.labels' ), 's(time, by = covariate, bs = "gp", m = c(-2, 5, 2), k = 7)', fixed = TRUE ) }) test_that("dynamic to gp Hilbert is working properly", { expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate), N = 100, family = gaussian() )), 'term.labels' ), 'gp(time, by = covariate, c = 5/4, k = 40, scale = TRUE)', fixed = TRUE ) # k should come across just fine expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate, k = 17), N = 100, family = gaussian() )), 'term.labels' ), 'gp(time, by = covariate, c = 5/4, k = 17, scale = TRUE)', fixed = TRUE ) # k will be fixed at N-1 if N <= 8 expect_match( attr( terms(mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate), N = 7, family = gaussian() )), 'term.labels' ), 'gp(time, by = covariate, c = 5/4, k = 6, scale = TRUE)', fixed = TRUE ) }) test_that("rho argument must be positive numeric", { data = data.frame( out = rnorm(100), temp = rnorm(100), time = 1:100, series = as.factor('series1') ) expect_error( mod <- mvgam( formula = out ~ dynamic(temp, rho = -1), data = data, family = gaussian(), run_model = FALSE ), 'Argument "rho" in dynamic() must be a positive value', fixed = TRUE ) }) test_that("rho argument cannot be larger than N - 1", { data = data.frame( out = rnorm(100), temp = rnorm(100), time = 1:100, series = as.factor('series1') ) expect_error( mod <- mvgam( formula = out ~ dynamic(temp, rho = 110), data = data, family = gaussian(), run_model = FALSE ), 'Argument "rho" in dynamic() cannot be larger than (max(time) - 1)', fixed = TRUE ) expect_error( mvgam:::interpret_mvgam( formula = y ~ dynamic(covariate, rho = 120), N = 100, family = gaussian() ), 'Argument "rho" in dynamic() cannot be larger than (max(time) - 1)', fixed = TRUE ) }) test_that("dynamic to spline works for trend_formulas", { beta_data$data_train$random <- rnorm(NROW(beta_data$data_train)) mod <- mvgam( y ~ dynamic(random, rho = 5), trend_formula = ~ dynamic(random, rho = 15), trend_model = RW(), data = beta_data$data_train, family = betar(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) # trend_idx should be in the model file and in the model data expect_true(any(grepl('trend_idx', mod$model_file))) expect_true(!is.null(mod$model_data$trend_idx1)) }) test_that("dynamic to Hilbert works for trend_formulas", { beta_data$data_train$random <- rnorm(NROW(beta_data$data_train)) mod <- suppressWarnings(mvgam( y ~ dynamic(random), trend_formula = ~ dynamic(random, k = 22), trend_model = RW(), data = beta_data$data_train, family = betar(), run_model = FALSE, autoformat = FALSE )) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(code(mod)) # Model file should have prior lines for observationgp terms expect_true(any(grepl( '// prior for gp(time):random...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "b[b_idx_gp_time_byrandom] = sqrt(spd_gp_exp_quad(", mod$model_file, fixed = TRUE ))) # Model file should have prior lines for trend gp terms expect_true(any(grepl( '// prior for gp(time):random_trend...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "b_trend[b_trend_idx_gp_time_byrandom] = sqrt(spd_gp_exp_quad(", mod$model_file, fixed = TRUE ))) # Observation-level Gp data structures should be in the model_data expect_true("l_gp_time_byrandom" %in% names(mod$model_data)) expect_true("b_idx_gp_time_byrandom" %in% names(mod$model_data)) expect_true("k_gp_time_byrandom" %in% names(mod$model_data)) # Trend-level Gp data structures should be in the model_data expect_true("l_gp_trend_time_byrandom" %in% names(mod$model_data)) expect_true("b_trend_idx_gp_time_byrandom" %in% names(mod$model_data)) expect_true("k_gp_trend_time_byrandom" %in% names(mod$model_data)) }) ================================================ FILE: tests/testthat/test-example_processing.R ================================================ context("example post-processing") # Skip example testing as they are a bit time-consuming skip_on_cran() test_that("fitted() gives correct dimensions", { expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(fitted(mvgam:::mvgam_example1)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(fitted(mvgam:::mvgam_example2)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(fitted(mvgam:::mvgam_example3)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(fitted(mvgam:::mvgam_example4)) ) }) test_that("residuals() gives correct dimensions", { expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(residuals(mvgam:::mvgam_example1)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(residuals(mvgam:::mvgam_example2)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(residuals(mvgam:::mvgam_example3)) ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train), NROW(residuals(mvgam:::mvgam_example4, robust = TRUE)) ) }) test_that("stability() gives correct outputs", { metrics <- stability(mvgam:::mvgam_example3) expect_range(metrics$prop_int_offdiag, lower = 0, upper = 1) expect_range(metrics$prop_int_diag, lower = 0, upper = 1) expect_range(metrics$prop_cov_offdiag, lower = 0, upper = 1) expect_range(metrics$prop_cov_diag, lower = 0, upper = 1) }) test_that("irf() gives correct outputs", { irfs <- irf(mvgam:::mvgam_example3, h = 12) expect_true(length(irfs) == 5) expect_true(NROW(irfs[[1]]$process_1) == 12) expect_no_error(plot(irfs)) irfs <- irf( mvgam:::mvgam_example3, h = 6, cumulative = TRUE, orthogonal = TRUE ) expect_true(length(irfs) == 5) expect_true(NROW(irfs[[1]]$process_1) == 6) expect_no_error(plot(irfs)) expect_s3_class(summary(irfs), 'tbl_df') expect_error( summary(irfs, probs = c(1)), "argument 'probs' must be a vector of length 2" ) }) test_that("fevd() gives correct outputs", { fevds <- fevd(mvgam:::mvgam_example3, h = 12) expect_true(length(fevds) == 5) expect_true(NROW(fevds[[1]]$process_1) == 12) expect_no_error(plot(fevds)) }) test_that("variable extraction works correctly", { expect_true(inherits( as.matrix(mvgam:::mvgam_example4, 'rho_gp', regex = TRUE), 'matrix' )) expect_true(inherits( as_draws(mvgam:::mvgam_example4, 'rho_gp', regex = TRUE), 'draws' )) expect_true(inherits( as_draws(mvgam:::mvgam_example1, 'obs_params', regex = TRUE), 'draws' )) expect_true(inherits( as_draws_df(mvgam:::mvgam_example1, 'obs_params', regex = TRUE), 'draws' )) expect_true(inherits( as_draws_matrix(mvgam:::mvgam_example4, 'obs_params'), 'draws' )) expect_true(inherits( as_draws_matrix(mvgam:::mvgam_example4, 'trend_params'), 'draws' )) expect_true(inherits(as_draws_list(mvgam:::mvgam_example2, 'betas'), 'draws')) expect_true(inherits( as_draws_rvars(mvgam:::mvgam_example2, 'trend_betas'), 'draws' )) }) test_that("hindcast() works correctly", { hc <- hindcast(mvgam:::mvgam_example1) expect_true(inherits(hc$hindcasts, 'list')) expect_true(inherits(summary(hc), 'data.frame')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train) / NCOL(mvgam:::mvgam_example1$ytimes), NCOL(hc$hindcasts$series_1) ) hc <- hindcast(mvgam:::mvgam_example1, type = 'expected') expect_true(inherits(hc$hindcasts, 'list')) expect_true(inherits(summary(hc), 'data.frame')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train) / NCOL(mvgam:::mvgam_example1$ytimes), NCOL(hc$hindcasts$series_1) ) hc <- hindcast(mvgam:::mvgam_example4) expect_true(inherits(hc$hindcasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train) / NCOL(mvgam:::mvgam_example4$ytimes), NCOL(hc$hindcasts$series_1) ) hc <- hindcast(mvgam:::mvgam_example4, type = 'expected') expect_true(inherits(hc$hindcasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train) / NCOL(mvgam:::mvgam_example4$ytimes), NCOL(hc$hindcasts$series_1) ) hc <- hindcast(mvgam:::mvgam_example3, type = 'trend') expect_true(inherits(hc$hindcasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_train) / NCOL(mvgam:::mvgam_example3$ytimes), NCOL(hc$hindcasts$series_1) ) }) test_that("predict() works correctly", { expect_equal( dim(predict( mvgam:::mvgam_example1, type = 'expected', process_error = FALSE )), dim(predict( mvgam:::mvgam_example2, type = 'expected', process_error = FALSE )) ) expect_equal( dim(predict( mvgam:::mvgam_example1, type = 'expected', process_error = TRUE )), dim(predict( mvgam:::mvgam_example2, type = 'expected', process_error = TRUE )) ) expect_equal( dim(posterior_linpred( mvgam:::mvgam_example1, type = 'expected', process_error = FALSE )), dim(posterior_linpred( mvgam:::mvgam_example2, type = 'expected', process_error = FALSE )) ) expect_equal( dim(posterior_linpred( mvgam:::mvgam_example1, type = 'expected', process_error = FALSE, ndraws = 33 )), dim(posterior_linpred( mvgam:::mvgam_example2, type = 'expected', process_error = FALSE, ndraws = 33 )) ) expect_equal( dim(predict( mvgam:::mvgam_example3, type = 'expected', process_error = FALSE )), dim(predict( mvgam:::mvgam_example4, type = 'expected', process_error = FALSE )) ) expect_equal( NROW(predict( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_examp_dat$data_test, process_error = FALSE )), NROW(mvgam:::mvgam_examp_dat$data_test) ) expect_equal( NROW(predict( mvgam:::mvgam_example3, newdata = mvgam:::mvgam_examp_dat$data_test, process_error = TRUE )), NROW(mvgam:::mvgam_examp_dat$data_test) ) expect_equal( dim(predict( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_examp_dat$data_test, process_error = FALSE )), dim(predict( mvgam:::mvgam_example2, newdata = mvgam:::mvgam_examp_dat$data_test, process_error = FALSE )) ) expect_error( predict(mvgam:::mvgam_example1, newdata = data.frame(time = 1)), "the following required variables are missing from newdata:\n season, series" ) expect_error( predict(mvgam:::mvgam_example2, newdata = data.frame(season = 1)), "the following required variables are missing from newdata:\n time, series" ) }) test_that("mcmc_plot() works correctly", { expect_ggplot(mcmc_plot(mvgam:::mvgam_example1)) expect_ggplot(mcmc_plot(mvgam:::mvgam_example2)) expect_ggplot(mcmc_plot(mvgam:::mvgam_example3)) expect_ggplot(mcmc_plot(mvgam:::mvgam_example4)) expect_ggplot(mcmc_plot(mvgam:::mvgam_example1, variable = 'trend_params')) expect_ggplot(mcmc_plot(mvgam:::mvgam_example2, variable = 'trend_params')) expect_ggplot(mcmc_plot(mvgam:::mvgam_example3, variable = 'trend_params')) expect_ggplot(mcmc_plot(mvgam:::mvgam_example4, variable = 'trend_params')) }) test_that("marginaleffects works correctly", { expect_ggplot(marginaleffects::plot_slopes( mvgam:::mvgam_example1, variables = 'season', condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_slopes( mvgam:::mvgam_example2, variables = 'season', condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_slopes( mvgam:::mvgam_example3, variables = 'season', condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_slopes( mvgam:::mvgam_example4, variables = 'season', condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_predictions( mvgam:::mvgam_example1, condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_predictions( mvgam:::mvgam_example2, condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_predictions( mvgam:::mvgam_example3, condition = 'season', type = 'link' )) expect_ggplot(marginaleffects::plot_predictions( mvgam:::mvgam_example4, condition = 'season', type = 'link' )) }) test_that("plot_mvgam... functions work properly", { expect_no_error(plot_mvgam_fc(mvgam:::mvgam_example1)) expect_no_error(plot_mvgam_fc(mvgam:::mvgam_example2)) expect_no_error(plot(mvgam:::mvgam_example4, type = 'forecast')) expect_no_error(SW(plot(mvgam:::mvgam_example3, type = 'smooths'))) expect_no_error(SW(plot( mvgam:::mvgam_example3, type = 'smooths', realisations = TRUE ))) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example1, smooth = 1, derivatives = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example1, smooth = 1, residuals = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example1, smooth = 1, realisations = TRUE )) expect_error(plot_mvgam_smooth(mvgam:::mvgam_example2, smooth = 1)) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example2, smooth = 1, trend_effects = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example2, smooth = 1, derivatives = TRUE, trend_effects = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example2, derivatives = TRUE, residuals = TRUE, trend_effects = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example1, realisations = TRUE )) expect_no_error(plot_mvgam_smooth( mvgam:::mvgam_example4, realisations = TRUE, newdata = mvgam:::mvgam_examp_dat$data_test )) expect_message( plot(mvgam:::mvgam_example3, type = 'pterms'), 'No parametric terms in model formula' ) expect_message(plot(mvgam:::mvgam_example1, type = 're')) expect_error(plot(mvgam:::mvgam_example1, type = 'factors')) expect_no_error(plot(mvgam:::mvgam_example4, type = 'factors')) expect_ggplot(plot_mvgam_trend(mvgam:::mvgam_example1)) expect_ggplot(plot_mvgam_trend(mvgam:::mvgam_example4)) expect_ggplot(plot_mvgam_trend(mvgam:::mvgam_example4, derivatives = TRUE)) expect_ggplot(plot_mvgam_trend( mvgam:::mvgam_example1, realisations = TRUE, n_realisations = 2 )) expect_ggplot(plot_mvgam_trend(mvgam:::mvgam_example3, derivatives = TRUE)) expect_ggplot(plot_mvgam_trend( mvgam:::mvgam_example2, realisations = TRUE, n_realisations = 2 )) expect_ggplot(plot_mvgam_series(object = mvgam:::mvgam_example4)) expect_ggplot( SW(pp_check( object = mvgam:::mvgam_example1, x = "season", type = "resid_ribbon", ndraws = 3 )) ) expect_ggplot( SW(pp_check( object = mvgam:::mvgam_example2, x = "season", type = "resid_ribbon", ndraws = 3 )) ) expect_ggplot( SW(pp_check( object = mvgam:::mvgam_example2, x = "season", group = "series", type = "resid_ribbon_grouped", ndraws = 3 )) ) }) test_that("dynamic factor investigations work", { lvcors <- lv_correlations(mvgam:::mvgam_example4) expect_true(inherits(lvcors, 'list')) expect_true(all.equal( dim(lvcors$mean_correlations), c( nlevels(mvgam:::mvgam_example4$obs_data$series), nlevels(mvgam:::mvgam_example4$obs_data$series) ) )) expect_true(mvgam:::mvgam_example4$use_lv) expect_no_error(plot_mvgam_factors(mvgam:::mvgam_example4)) facconts <- plot_mvgam_factors(mvgam:::mvgam_example4, plot = FALSE) expect_true(inherits(facconts, 'data.frame')) expect_true(inherits(facconts, 'tbl_df')) lvcors <- residual_cor(mvgam:::mvgam_example4) expect_true(inherits(lvcors, "mvgam_residcor")) expect_no_error(plot(lvcors)) lvcors <- residual_cor(mvgam:::mvgam_example2) expect_true(inherits(lvcors, "mvgam_residcor")) expect_equal(rep(1, 4), as.vector(lvcors$cor)) expect_error( residual_cor(mvgam:::mvgam_example1), 'Cannot compute residual correlations if no latent factors or correlated process errors were modelled' ) }) test_that("evaluate() functions working", { mod <- mvgam:::mvgam_example1 out <- eval_mvgam(mod, fc_horizon = 6, n_samples = 100, n_cores = 1) expect_true(inherits(out, 'list')) expect_true(all(names(out) == levels(mod$obs_data$series))) expect_true(NROW(out[[1]]) == 6) mod <- mvgam:::mvgam_example3 out <- eval_mvgam(mod, fc_horizon = 2, n_samples = 100, n_cores = 1) expect_true(inherits(out, 'list')) expect_true(all(names(out) == levels(mod$obs_data$series))) expect_true(NROW(out[[1]]) == 2) mod <- mvgam:::mvgam_example4 out <- eval_mvgam(mod, fc_horizon = 2, n_samples = 100, n_cores = 1) expect_true(inherits(out, 'list')) expect_true(all(names(out) == levels(mod$obs_data$series))) expect_true(NROW(out[[1]]) == 2) expect_no_error(compare_mvgams( mvgam:::mvgam_example2, mvgam:::mvgam_example4, n_samples = 100, n_evaluations = 2, n_cores = 1 )) }) test_that("lfo_cv() working", { lfs <- SW(lfo_cv( mvgam:::mvgam_example1, min_t = 8, fc_horizon = 1, silent = 2 )) expect_true(inherits(lfs, 'mvgam_lfo')) expect_ggplot(SW(plot(lfs))) lfs <- SW(lfo_cv( mvgam:::mvgam_example4, min_t = 8, fc_horizon = 1, silent = 2 )) expect_true(inherits(lfs, 'mvgam_lfo')) expect_ggplot(SW(plot(lfs))) }) test_that("forecast() works correctly", { fc <- forecast( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_examp_dat$data_test ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) fc_summary <- summary(fc) expect_true(inherits(fc_summary, 'data.frame')) expect_equal( NROW(fc_summary), NROW(rbind( mvgam:::mvgam_examp_dat$data_train, mvgam:::mvgam_examp_dat$data_test )) ) expect_equal( c( mvgam:::mvgam_examp_dat$data_train$y[ which(mvgam:::mvgam_examp_dat$data_train$series == 'series_2') ], mvgam:::mvgam_examp_dat$data_test$y[ which(mvgam:::mvgam_examp_dat$data_test$series == 'series_2') ] ), fc_summary$truth[ which(fc_summary$series == 'series_2') ] ) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example1$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) sc <- score(fc) expect_true(inherits(sc, 'list')) expect_true(all.equal( names(sc), c(levels(mvgam:::mvgam_examp_dat$data_test$series), 'all_series') )) expect_error(score(fc, score = 'elpd')) expect_no_error(score(fc, score = 'energy')) expect_no_error(score(fc, score = 'variogram')) expect_no_error(score(fc, score = 'sis')) fc <- forecast( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_examp_dat$data_test, type = 'expected' ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example1$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) fc <- forecast( mvgam:::mvgam_example2, newdata = mvgam:::mvgam_examp_dat$data_test, type = 'link' ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example2$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) sc <- score(fc, score = 'elpd') expect_true(inherits(sc, 'list')) expect_true(all.equal( names(sc), c(levels(mvgam:::mvgam_examp_dat$data_test$series), 'all_series') )) fc <- forecast( mvgam:::mvgam_example2, newdata = mvgam:::mvgam_examp_dat$data_test, type = 'expected' ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example2$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) fc <- forecast( mvgam:::mvgam_example3, newdata = mvgam:::mvgam_examp_dat$data_test ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example3$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) fc <- forecast( mvgam:::mvgam_example3, newdata = mvgam:::mvgam_examp_dat$data_test, type = 'expected' ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example3$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) fc <- forecast( mvgam:::mvgam_example4, newdata = mvgam:::mvgam_examp_dat$data_test ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_no_error(plot(fc)) expect_no_error(plot(fc, hide_xlabels = TRUE)) expect_no_error(plot(fc, ylab = 'banana')) expect_no_error(plot(fc, realisations = TRUE, n_realisations = 3)) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example4$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) fc <- forecast( mvgam:::mvgam_example4, newdata = mvgam:::mvgam_examp_dat$data_test, type = 'expected' ) expect_true(inherits(fc$hindcasts, 'list')) expect_true(inherits(fc$forecasts, 'list')) expect_no_error(plot(fc, hide_xlabels = TRUE)) expect_no_error(plot(fc, ylab = 'banana')) expect_no_error(plot(fc, realisations = TRUE, n_realisations = 2)) expect_equal( NROW(mvgam:::mvgam_examp_dat$data_test) / NCOL(mvgam:::mvgam_example4$ytimes), NCOL(fc$forecasts$series_1), length(fc$test_observations$series_1) ) }) test_that("loo() works correctly", { options(mc.cores = 1) expect_loo(SW(loo(mvgam:::mvgam_example1))) expect_loo(SW(loo(mvgam:::mvgam_example2))) expect_loo(SW(loo(mvgam:::mvgam_example3))) expect_loo(SW(loo(mvgam:::mvgam_example4))) p <- SW(loo_compare( mvgam:::mvgam_example1, mvgam:::mvgam_example2, model_names = c('banana') )) expect_true(inherits(p, 'compare.loo')) p <- SW(loo_compare( mvgam:::mvgam_example1, mvgam:::mvgam_example2, mvgam:::mvgam_example3, mvgam:::mvgam_example4 )) expect_true(inherits(p, 'compare.loo')) }) test_that("how_to_cite() works correctly", { description <- how_to_cite(mvgam:::mvgam_example3) expect_true( grepl('To encourage stability', description$methods_text) ) description <- how_to_cite(mvgam:::mvgam_example4) expect_true( grepl('Gaussian Process functional', description$methods_text) ) }) ================================================ FILE: tests/testthat/test-families.R ================================================ context("Tests for family functions") test_that("distributions work correctly", { fam <- tweedie() expect_true(inherits(fam, 'family')) expect_true(inherits(fam, 'extended.family')) expect_true(fam$link == 'log') fam <- student_t() expect_true(inherits(fam, 'family')) expect_true(inherits(fam, 'extended.family')) expect_true(fam$link == 'identity') fam <- nmix() expect_true(inherits(fam, 'family')) expect_true(inherits(fam, 'extended.family')) expect_true(fam$link == 'log') }) test_that("nmix predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'nmix' betas <- rnorm(10) latent_lambdas <- runif(10, 2, 5) cap <- rep(12, 10) family_pars <- list() expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = latent_lambdas, cap = cap, type = 'link', family_pars = family_pars ) > 0 )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = latent_lambdas, cap = cap, type = 'expected', family_pars = family_pars ) > 0 )) detects <- mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = latent_lambdas, cap = cap, type = 'detection', family_pars = family_pars ) expect_true(all(detects > 0) & all(detects < 1)) truth <- rpois(10, 6) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = latent_lambdas, cap = cap, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) # Test that the residual calculation is working preds <- runif(10, 0.1, 10) N <- rpois(10, 20) p <- runif(10, 0.3, 0.6) resids <- mvgam:::ds_resids_nmix( truth = truth, fitted = 1, draw = 1, N = as.vector(N), p = as.vector(p) ) expect_true(all(!is.na(resids))) }) test_that("beta predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'beta' betas <- rnorm(10) family_pars <- list(phi = rep(1, 10)) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expecteds <- mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) expect_true(all(expecteds >= 0) & all(expecteds <= 1)) preds <- mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) expect_true(all(preds >= 0) & all(preds <= 1)) truth <- runif(10, 0.01, 0.99) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- runif(10, 0.01, 0.99) resids <- mvgam:::ds_resids_beta( truth = truth, fitted = as.vector(preds), draw = 1, precision = rep(1, 10) ) expect_true(all(!is.na(resids))) }) test_that("beta-binomial predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'beta_binomial' betas <- rnorm(10) family_pars <- list(phi = rep(1, 10), trials = rep(20, 10)) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) > 0 )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) >= 0 )) truth <- rpois(10, 8) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) p <- runif(10, 0.3, 0.6) resids <- mvgam:::ds_resids_beta_binomial( truth = truth, fitted = p, draw = 1, N = rep(20, 10), phi = rep(1, 10) ) expect_true(all(!is.na(resids))) }) test_that("negative binomial predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'negative binomial' betas <- rnorm(10) family_pars <- list(phi = rep(1, 10)) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) > 0 )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) >= 0 )) truth <- rpois(10, 8) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- rpois(10, 8) resids <- mvgam:::ds_resids_nb( truth = truth, fitted = preds, draw = 1, size = rep(1, 10) ) expect_true(all(!is.na(resids))) }) test_that("lognormal predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'lognormal' betas <- rnorm(10) family_pars <- list(sigma_obs = 1) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) > 0 )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) > 0 )) truth <- runif(10, 0.1, 20) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- runif(10, 0.1, 20) resids <- mvgam:::ds_resids_lnorm( truth = truth, fitted = preds, draw = 1, sigma = rep(1, 10) ) expect_true(all(!is.na(resids))) }) test_that("gamma predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'Gamma' betas <- rnorm(10) family_pars <- list(shape = 1) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) > 0 )) expect_true(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) > 0 )) truth <- runif(10, 0.1, 20) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- runif(10, 0.1, 20) resids <- mvgam:::ds_resids_gamma( truth = truth, fitted = preds, draw = 1, shape = rep(1, 10) ) expect_true(all(!is.na(resids))) }) test_that("student-t predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'student' betas <- rnorm(10) family_pars <- list(sigma_obs = 1, nu = 3) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars )) truth <- rnorm(10) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- rnorm(10) resids <- mvgam:::ds_resids_student( truth = truth, fitted = preds, draw = 1, sigma = rep(1, 10), nu = rep(5, 10) ) expect_true(all(!is.na(resids))) }) test_that("tweedie predictions work correctly", { set.seed(1) Xp <- matrix(rnorm(100), ncol = 10, nrow = 10) attr(Xp, 'model.offset') <- 0 family <- 'tweedie' betas <- rnorm(10) family_pars <- list(phi = 1) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', family_pars = family_pars )) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'variance', family_pars = family_pars )) expect_no_error(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'expected', family_pars = family_pars ) > 0 )) expect_no_error(all( mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'response', family_pars = family_pars ) >= 0 )) truth <- rpois(10, 5) expect_no_error(mvgam:::mvgam_predict( Xp = Xp, family = family, betas = betas, latent_lambdas = NULL, cap = NULL, type = 'link', density = TRUE, truth = truth, family_pars = family_pars )) preds <- rpois(10, 5) resids <- mvgam:::ds_resids_tw(truth = truth, fitted = preds, draw = 1) expect_true(all(!is.na(resids))) }) # Skip actual model setups on CRAN as they take some time test_that("family setups work correctly", { skip_on_cran() simdat <- sim_mvgam(family = poisson()) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = poisson(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam(family = nb(), n_lv = 2) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = nb(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam(family = lognormal(), trend_model = VAR()) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = lognormal(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam(family = bernoulli(), trend_model = VAR(cor = TRUE)) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = bernoulli(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam(family = student_t(), trend_model = GP()) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = student_t(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam( family = Gamma(), seasonality = 'shared', trend_model = AR(p = 3) ) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = Gamma(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) simdat <- sim_mvgam( family = betar(), seasonality = 'hierarchical', trend_model = AR(p = 2) ) mod <- mvgam( y ~ -1, trend_model = PW(), data = simdat$data_train, family = betar(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_no_error(capture_output(print(mod))) }) ================================================ FILE: tests/testthat/test-gp.R ================================================ context("gp") skip_on_cran() test_that("gp_to_s is working properly for unidimensional gps", { # All true gp() terms should be changed to s() with k = k+1 formula <- y ~ s(series) + gp(banana, k = 3, scale = FALSE) + infect:you + gp(hardcourt, k = 3) dat <- data.frame( y = rnorm(10), series = rnorm(10), banana = rnorm(10), infect = rnorm(10), you = rnorm(10), hardcourt = rnorm(10), gp = rnorm(10) ) # Check that the brms_mock formula is correctly remade gp_atts <- mvgam:::get_gp_attributes(formula, data = dat, family = gaussian()) # scale should be passed; gr is always false and cmc is always true expect_true(identical( attr(terms(attr(gp_atts, 'gp_formula')), 'term.labels')[1], "gp(banana, k = 3, cov = \"exp_quad\", iso = TRUE, scale = FALSE, c = 1.25, gr = FALSE, cmc = TRUE)" )) expect_true(identical( attr(terms(attr(gp_atts, 'gp_formula')), 'term.labels')[2], "gp(hardcourt, k = 3, cov = \"exp_quad\", iso = TRUE, scale = TRUE, c = 1.25, gr = FALSE, cmc = TRUE)" )) expect_equal( attr( terms( mvgam:::gp_to_s(formula, data = dat, family = gaussian()), keep.order = TRUE ), 'term.labels' ), attr( terms( formula( y ~ s(series) + s(banana, k = 4) + infect:you + s(hardcourt, k = 4) ), keep.order = TRUE ), 'term.labels' ) ) # Characters that match to 'gp' should not be changed formula <- y ~ gp(hardcourt, k = 3) + s(gp, k = 3) expect_equal( attr( terms( mvgam:::gp_to_s(formula, data = dat, family = gaussian()), keep.order = TRUE ), 'term.labels' ), attr( terms(formula(y ~ s(hardcourt, k = 4) + s(gp, k = 3)), keep.order = TRUE), 'term.labels' ) ) }) test_that("gp_to_s is working properly for multidimensional gps", { # All true gp() terms should be changed to s() with k = k+1 formula <- y ~ s(series) + gp(banana, hardcourt, k = 3, iso = FALSE, c = 1.33, cov = 'matern52') + infect:you dat <- data.frame( y = rnorm(10), series = rnorm(10), banana = rnorm(10), infect = rnorm(10), you = rnorm(10), hardcourt = rnorm(10), gp = rnorm(10) ) gp_atts <- mvgam:::get_gp_attributes(formula, data = dat, family = gaussian()) expect_true(identical( attr(terms(attr(gp_atts, 'gp_formula')), 'term.labels')[1], "gp(banana, hardcourt, k = 3, cov = \"matern52\", iso = FALSE, scale = TRUE, c = 1.33, gr = FALSE, cmc = TRUE)" )) expect_equal( attr( terms( mvgam:::gp_to_s(formula, data = dat, family = gaussian()), keep.order = TRUE ), 'term.labels' ), attr( terms( formula( y ~ s(series) + ti(banana, hardcourt, k = 3, mc = c(0, 0)) + infect:you ), keep.order = TRUE ), 'term.labels' ) ) }) test_that("unidimensional gp for observation models working properly", { gaus_data$data_train$y[is.na(gaus_data$data_train$y)] <- 0 mod <- mvgam( formula = y ~ s(series, bs = 're') + gp(time, by = series, k = 10, c = 5 / 4) + year:season, data = gaus_data$data_train, family = gaussian(), run_model = FALSE, autoformat = FALSE ) expect_true( any(grepl( 'b[b_idx_gp_time_byseriesseries_3] = sqrt(spd_gp_exp_quad(l_gp_time_byseriesseries_3', mod$model_file, fixed = TRUE )) ) # Gp data structures should be in the model_data expect_true("l_gp_time_byseriesseries_1" %in% names(mod$model_data)) expect_true("b_idx_gp_time_byseriesseries_1" %in% names(mod$model_data)) expect_true("k_gp_time_byseriesseries_1" %in% names(mod$model_data)) # These should match to the eigenvalues and eigenfunctions created by # a similar brms call brms_dat <- suppressWarnings(brms::make_standata( y ~ s(series, bs = 're') + gp(time, by = series, k = 10, c = 5 / 4) + year:season, data = gaus_data$data_train, family = gaussian() )) # Eigenvalues should be identical expect_true(all.equal( as.vector(brms_dat$slambda_1_1), as.vector(mod$model_data$l_gp_time_byseriesseries_1) )) # Eigenfunctions will be nearly identical row_s1 <- which( gaus_data$data_train$series == 'series_1' & !is.na(gaus_data$data_train$y) ) col_s1 <- grep( 'gp(time):seriesseries_1', names(coef(mod$mgcv_model)), fixed = TRUE ) expect_true(identical( dim(brms_dat$Xgp_1_1), dim(mod$model_data$X[row_s1, col_s1]) )) expect_true(all( unlist( lapply(seq_len(NCOL(brms_dat$Xgp_1_1)), function(x) { cor(brms_dat$Xgp_1_1[, x], mod$model_data$X[row_s1, col_s1][, x]) }), use.names = FALSE ) > 0.99 )) # The mgcv model formula should contain s() in place of gp() expect_equal( attr(terms(mod$mgcv_model$formula, keep.order = TRUE), 'term.labels'), attr( terms( formula( y ~ s(time, by = series, k = 11) + year:season + s(series, bs = "re") ), keep.order = TRUE ), 'term.labels' ) ) }) test_that("multidimensional gp for observation models working properly", { gaus_data$data_train$y[is.na(gaus_data$data_train$y)] <- 0 mod <- mvgam( y ~ s(series, bs = 're') + gp(time, year, k = 4, cov = 'matern32'), data = gaus_data$data_train, family = gaussian(), run_model = FALSE, autoformat = FALSE ) expect_true( any(grepl( 'b[b_idx_gp_timeby_year_] = sqrt(spd_gp_matern32(l_gp_timeby_year_', mod$model_file, fixed = TRUE )) ) # Gp data structures should be in the model_data expect_true("l_gp_timeby_year_" %in% names(mod$model_data)) expect_true("b_idx_gp_timeby_year_" %in% names(mod$model_data)) expect_true("k_gp_timeby_year_" %in% names(mod$model_data)) # These should match to the eigenvalues and eigenfunctions created by # a similar brms call brms_dat <- suppressWarnings(brms::make_standata( y ~ s(series, bs = 're') + gp(time, year, k = 4, gr = FALSE), data = gaus_data$data_train, family = gaussian() )) # Eigenvalues should be identical expect_true(all.equal(brms_dat$slambda_1, mod$model_data$l_gp_timeby_year_)) # Eigenfunctions will be nearly identical col_s1 <- grep('gp(time,year)', names(coef(mod$mgcv_model)), fixed = TRUE) expect_true(identical(dim(brms_dat$Xgp_1), dim(mod$model_data$X[, col_s1]))) expect_true(all( unlist( lapply(seq_len(NCOL(brms_dat$Xgp_1)), function(x) { cor(brms_dat$Xgp_1[, x], mod$model_data$X[, col_s1][, x]) }), use.names = FALSE ) > 0.99 )) # The mgcv model formula should contain s() in place of gp() expect_equal( attr(terms(mod$mgcv_model$formula, keep.order = TRUE), 'term.labels'), attr( terms( formula( y ~ ti(time, year, k = 4, mc = c(0, 0)) + s(series, bs = "re") ), keep.order = TRUE ), 'term.labels' ) ) }) test_that("noncentring with gp terms working properly", { mod <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + gp(time, by = series, k = 10), trend_model = RW(), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ) # Model file should have the non-centred trend parameterisation now expect_true( any(grepl( trimws("trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("trend[2 : n, s] += trend[1 : (n - 1), s];"), trimws(mod$model_file), fixed = TRUE )) ) # Gp data structures should be in the model_data expect_true("l_gp_time_byseriesseries_1" %in% names(mod$model_data)) expect_true("b_idx_gp_time_byseriesseries_1" %in% names(mod$model_data)) expect_true("k_gp_time_byseriesseries_1" %in% names(mod$model_data)) }) test_that("unidimensional gp for process models working properly", { mod <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ gp(time, by = trend, k = 10) + year:season, data = beta_data$data_train, family = betar(), trend_model = AR(), run_model = FALSE, autoformat = FALSE ) # Model file should have prior lines for gp terms expect_true(any(grepl( '// prior for gp(time):trendtrend1_trend...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "b_trend[b_trend_idx_gp_time_bytrendtrend1] = sqrt(spd_gp_exp_quad(", mod$model_file, fixed = TRUE ))) # Gp data structures should be in the model_data expect_true("l_gp_trend_time_bytrendtrend1" %in% names(mod$model_data)) expect_true("b_trend_idx_gp_time_bytrendtrend1" %in% names(mod$model_data)) expect_true("k_gp_trend_time_bytrendtrend1" %in% names(mod$model_data)) # These should match to the eigenvalues and eigenfunctions created by # a similar brms call brms_dat <- suppressWarnings(brms::make_standata( y ~ gp(time, by = series, k = 10, c = 5 / 4), data = beta_data$data_train, family = gaussian() )) # Eigenvalues should be identical expect_true(all.equal( as.vector(brms_dat$slambda_1_1), as.vector(mod$model_data$l_gp_trend_time_bytrendtrend1) )) # Eigenfunctions will be nearly identical row_s1 <- mod$model_data$ytimes_trend[, 1] col_s1 <- grep( 'gp(time):trendtrend1', names(coef(mod$trend_mgcv_model)), fixed = TRUE ) expect_true(identical( dim(brms_dat$Xgp_1_1), dim(mod$model_data$X_trend[row_s1, col_s1]) )) expect_true( max(abs( brms_dat$Xgp_1_1 - mod$model_data$X_trend[row_s1, col_s1] )) < 0.01 ) # The mgcv model formula should contain s() in place of gp() expect_equal( attr(terms(mod$trend_mgcv_model$formula, keep.order = TRUE), 'term.labels'), attr( terms( formula( y ~ s(time, by = series, k = 11) + year:season ), keep.order = TRUE ), 'term.labels' ) ) }) test_that("multidimensional gp for process models working properly", { mod <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ gp(time, season, k = 10, iso = FALSE), data = beta_data$data_train, family = betar(), trend_model = AR(), run_model = FALSE, autoformat = FALSE ) # Model file should have prior lines for gp terms expect_true(any(grepl( '// prior for gp(time,season)_trend...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "b_trend[b_trend_idx_gp_timeby_season_] = sqrt(spd_gp_exp_quad(", mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "array[1] vector[2] rho_gp_trend_timeby_season_;", mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "rho_gp_trend_timeby_season_[1][1] ~ inv_gamma", mod$model_file, fixed = TRUE ))) expect_true(any(grepl( "rho_gp_trend_timeby_season_[1][2] ~ inv_gamma", mod$model_file, fixed = TRUE ))) # Gp data structures should be in the model_data expect_true("l_gp_trend_timeby_season_" %in% names(mod$model_data)) expect_true("b_trend_idx_gp_timeby_season_" %in% names(mod$model_data)) expect_true("k_gp_trend_timeby_season_" %in% names(mod$model_data)) # These should match to the eigenvalues and eigenfunctions created by # a similar brms call brms_dat <- suppressWarnings(brms::make_standata( y ~ gp(time, season, k = 10, c = 5 / 4, cmc = TRUE, gr = FALSE), data = beta_data$data_train, family = gaussian() )) # Eigenvalues should be identical expect_true(all.equal( brms_dat$slambda_1, mod$model_data$l_gp_trend_timeby_season_ )) # Eigenfunctions will be nearly identical col_s1 <- grep( 'gp(time,season)', names(coef(mod$trend_mgcv_model)), fixed = TRUE ) expect_true(identical( dim(brms_dat$Xgp_1), dim(mod$model_data$X_trend[, col_s1]) )) expect_true(all( unlist( lapply(seq_len(NCOL(brms_dat$Xgp_1)), function(x) { cor(brms_dat$Xgp_1[, x], mod$model_data$X_trend[, col_s1][, x]) }), use.names = FALSE ) > 0.99 )) # The mgcv model formula should contain s() in place of gp() expect_equal( attr(terms(mod$trend_mgcv_model$formula, keep.order = TRUE), 'term.labels'), attr( terms( formula(y ~ ti(time, season, k = 10, mc = c(0, 0))), keep.order = TRUE ), 'term.labels' ) ) }) ================================================ FILE: tests/testthat/test-jsdgam.R ================================================ context("jsdgam") # Reconstruct the spider data from mvabund spiderdat <- structure( list( abundance = c( 25L, 0L, 15L, 2L, 1L, 0L, 2L, 0L, 1L, 3L, 15L, 16L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 7L, 17L, 11L, 9L, 3L, 29L, 15L, 10L, 2L, 20L, 6L, 20L, 6L, 7L, 11L, 1L, 0L, 1L, 13L, 43L, 2L, 0L, 3L, 0L, 1L, 1L, 2L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 16L, 15L, 20L, 9L, 6L, 11L, 14L, 0L, 0L, 2L, 1L, 2L, 6L, 12L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 7L, 5L, 0L, 18L, 4L, 1L, 4L, 30L, 9L, 24L, 9L, 6L, 16L, 7L, 0L, 0L, 1L, 0L, 18L, 4L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 55L, 0L, 0L, 0L, 0L, 1L, 3L, 6L, 6L, 2L, 5L, 12L, 13L, 16L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 60L, 1L, 29L, 7L, 2L, 11L, 30L, 2L, 26L, 22L, 95L, 96L, 24L, 14L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 6L, 3L, 11L, 0L, 1L, 6L, 12L, 15L, 18L, 29L, 135L, 27L, 89L, 2L, 1L, 0L, 0L, 1L, 53L, 15L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 6L, 0L, 0L, 0L, 45L, 37L, 45L, 94L, 76L, 24L, 105L, 1L, 1L, 0L, 1L, 8L, 72L, 72L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 57L, 65L, 66L, 86L, 91L, 63L, 118L, 30L, 2L, 1L, 4L, 13L, 97L, 94L, 25L, 28L, 23L, 25L, 22L, 22L, 18L, 1L, 1L, 0L, 16L, 1L, 0L, 2L, 4L, 9L, 1L, 25L, 17L, 34L, 16L, 3L, 0L, 0L, 0L, 0L, 22L, 32L, 3L, 4L, 2L, 0L, 3L, 2L, 2L, 0L, 0L, 0L, 6L, 0L, 0L, 0L ), taxon = structure( c( 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L ), levels = c( "Alopacce", "Alopcune", "Alopfabr", "Arctlute", "Arctperi", "Auloalbi", "Pardlugu", "Pardmont", "Pardnigr", "Pardpull", "Trocterr", "Zoraspin" ), class = "factor" ), site = c( 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L ), soil.dry = c( 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555, 2.3321, 3.0493, 2.5572, 2.6741, 3.0155, 3.381, 3.1781, 2.6247, 2.4849, 2.1972, 2.2192, 2.2925, 3.5175, 3.0865, 3.2696, 3.0301, 3.3322, 3.1224, 2.9232, 3.1091, 2.9755, 1.2528, 1.1939, 1.6487, 1.8245, 0.9933, 0.9555, 0.9555 ), bare.sand = c( 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434, 0, 0, 0, 0, 0, 2.3979, 0, 0, 0, 3.9318, 0, 0, 1.7918, 0, 0, 0, 0, 0, 0, 0, 0, 3.2581, 3.0445, 3.2581, 3.5835, 4.5109, 2.3979, 3.434 ), fallen.leaves = c( 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0, 0, 1.7918, 0, 0, 0, 3.434, 0, 4.2627, 0, 0, 0, 0, 1.7918, 0, 4.3944, 4.6052, 4.4543, 4.3944, 4.5109, 4.5951, 4.5643, 0, 0, 0, 0, 0, 0, 0 ), moss = c( 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136, 3.0445, 1.0986, 2.3979, 2.3979, 0, 2.3979, 0.6931, 1.0986, 4.3307, 3.434, 4.1109, 3.8286, 0.6931, 1.7918, 0.6931, 0.6931, 0.6931, 0, 1.6094, 0.6931, 0.6931, 4.3307, 4.0254, 4.0254, 1.0986, 1.7918, 3.8286, 3.7136 ), herb.layer = c( 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434, 4.4543, 4.5643, 4.6052, 4.6151, 4.6151, 3.434, 4.6151, 3.434, 3.2581, 3.0445, 3.7136, 4.0254, 4.5109, 4.5643, 3.0445, 0.6931, 3.0445, 3.0445, 1.6094, 0.6931, 1.7918, 0.6931, 3.2581, 3.0445, 4.1109, 1.7918, 3.434, 3.434 ), reflection = c( 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889, 3.912, 1.6094, 3.6889, 2.9957, 2.3026, 0.6931, 2.3026, 0.6931, 3.4012, 3.6889, 3.6889, 3.6889, 3.4012, 1.0986, 0.6931, 0, 1.0986, 1.0986, 0, 0, 0, 3.912, 4.0943, 4.0073, 2.3026, 4.382, 3.6889, 3.6889 ) ), row.names = c(NA, -336L), class = "data.frame" ) test_that("family must be correctly specified", { expect_error( mod <- jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = taxon, n_lv = 3, family = 'banana' ), 'family not recognized' ) }) test_that("response variable must be specified", { expect_error( jsdgam( formula = ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection) , # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb() ), 'Not sure how to deal with this response variable specification' ) }) test_that("unit must exist in data", { expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = banana, species = taxon, n_lv = 3, family = nb() ), 'Variable "banana" not found in data' ) }) test_that("species must exist in data", { expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = banana, n_lv = 3, family = nb() ), 'Variable "banana" not found in data' ) }) test_that("species must be a factor in data", { spiderdat$taxon <- as.numeric(spiderdat$taxon) expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb() ), 'Variable "taxon" must be a factor type' ) }) test_that("unit must be a numeric / integer in data", { spiderdat$site <- as.factor(spiderdat$site) expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb() ), 'Variable "site" must be either numeric or integer type' ) }) test_that("n_lv must be <= number of species", { expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, data = spiderdat, unit = site, species = taxon, n_lv = 15, family = nb() ), 'Number of factors must be <= number of levels in species' ) }) test_that("knots must be a list", { expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend) - 1, # supplying knots as a vector should fail factor_knots = seq( min(spiderdat$soil.dry), max(spiderdat$soil.dry), length.out = 4 ), data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb(), run_model = FALSE ), 'all "knot" arguments must be supplied as lists' ) }) test_that("errors about knot lengths should be propagated from mgcv", { expect_error( jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend, bs = 'cr') - 1, # knot length should be 5 for this CR basis factor_knots = list( soil.dry = seq( min(spiderdat$soil.dry), max(spiderdat$soil.dry), length.out = 4 ) ), data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb(), run_model = FALSE ), 'number of supplied knots != k for a cr smooth' ) }) test_that("get_mvgam_priors accepts factor_formula", { expect_no_error(get_mvgam_priors( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, by = trend, bs = 'cr') - 1, data = spiderdat, unit = site, species = taxon, n_lv = 3, trend_model = 'None' )) }) # Skip the next test as it should actually initiate the model, and may take a few seconds skip_on_cran() test_that("jsdgam should initiate correctly", { mod <- jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, bs = 'cr', by = trend) - 1, # supplying knots should also work if k matches length(knots) factor_knots = list( soil.dry = seq( min(spiderdat$soil.dry), max(spiderdat$soil.dry), length.out = 5 ) ), data = spiderdat, unit = site, species = taxon, n_lv = 3, family = nb(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(identical( attr(mod$trend_mgcv_model, 'knots'), list( soil.dry = seq( min(spiderdat$soil.dry), max(spiderdat$soil.dry), length.out = 5 ) ) )) expect_true(is.null(attr(mod$mgcv_model, 'knots'))) expect_true(any(grepl( '// raw latent factors (with linear predictors)', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( 'matrix[n_series, n_lv] lv_coefs = rep_matrix(0, n_series, n_lv);', mod$model_file, fixed = TRUE ))) expect_true(attr(mod$model_data, 'trend_model') == 'None') expect_true(inherits( attr(mod$model_data, 'prepped_trend_model'), 'mvgam_trend' )) expect_true(attr(mod$model_data, 'prepped_trend_model')$subgr == 'taxon') expect_true(attr(mod$model_data, 'prepped_trend_model')$trend_model == 'ZMVN') }) test_that("jsdgam post-processing works correctly", { # Run a short one to ensure post-processing and update work correctly mod <- SM(jsdgam( formula = abundance ~ # Environmental model includes species-level intercepts # and random slopes for a linear effect of reflection s(taxon, bs = 're') + s(taxon, bs = 're', by = reflection), # Each factor estimates a different, possibly nonlinear effect of soil.dry factor_formula = ~ s(soil.dry, k = 5, bs = 'cr', by = trend) - 1, # supplying knots should also work if k matches length(knots) factor_knots = list( soil.dry = seq( min(spiderdat$soil.dry), max(spiderdat$soil.dry), length.out = 5 ) ), data = spiderdat %>% dplyr::filter(site < 7), unit = site, species = taxon, n_lv = 2, family = nb(), run_model = TRUE, residuals = FALSE, chains = 2, silent = 2 )) expect_true(inherits(mod, 'jsdgam')) expect_true(is.null(mod$resids)) # Calculate residual correlations to ensure it works post_cors <- residual_cor(mod, summary = FALSE) expect_equal(dim(post_cors$all_cormat)[1], 1000L) expect_equal(dim(post_cors$all_cormat)[2], nlevels(spiderdat$taxon)) expect_true(all(post_cors$all_cormat <= 1)) expect_true(all(post_cors$all_cormat >= -1)) post_cors <- residual_cor(mod, summary = TRUE) expect_equal(dim(post_cors$cor)[1], nlevels(spiderdat$taxon)) expect_ggplot(plot(post_cors)) # Ensure ordination works expect_ggplot(ordinate(mod)) expect_ggplot(ordinate(mod, biplot = FALSE)) expect_ggplot(ordinate(mod, label_sites = FALSE)) }) ================================================ FILE: tests/testthat/test-marginaleffects.R ================================================ context("marginaleffects") skip_on_cran() test_that("data_grid gives expected output structure", { # Dataframe example simdat <- sim_mvgam( n_series = 3, prop_trend = 1, trend_model = GP(), mu = 1.5 ) out <- SW(mvgam:::data_grid( season = unique, year = mean, newdata = simdat$data_test )) expect_true(all(out$year == mean(simdat$data_test$year))) myfunc = function(x) { c( mean(x, na.rm = TRUE), max(x, na.rm = TRUE) + 22.4, min(x, na.rm = TRUE) - 11.7 ) } out <- mvgam:::data_grid(time = myfunc, newdata = simdat$data_test) expect_true(NROW(out) == 3) # A list example out <- mvgam:::data_grid( season = fivenum, year = mean, newdata = mvgam:::mvgam_example4$obs_data ) expect_true(all.equal(names(out), names(mvgam:::mvgam_example4$obs_data))) expect_true(all(out$year == mean(mvgam:::mvgam_example4$obs_data$year))) }) test_that("get_data gives expected output structure", { plot_data <- insight::get_data(mvgam:::mvgam_example2) obs_data <- mvgam:::mvgam_example2$obs_data # get_data should give the exact same data used for modelling, including # any NAs in the outcome variable expect_true(identical( data.frame( y = plot_data$y, season = plot_data$season, series = plot_data$series ), data.frame( y = obs_data$y, season = obs_data$season, series = obs_data$series ) )) }) test_that("get_predict gives expected output structure", { preds <- marginaleffects::get_predict( mvgam:::mvgam_example4, newdata = mvgam:::mvgam_example4$obs_data ) expect_equal(NROW(preds), length(mvgam:::mvgam_example4$obs_data$y)) preds <- marginaleffects::get_predict( mvgam:::mvgam_example2, newdata = mvgam:::mvgam_example2$obs_data ) expect_equal(NROW(preds), NROW(mvgam:::mvgam_example2$obs_data)) }) test_that("averages give expected output structures", { ems <- marginaleffects::avg_predictions( mvgam:::mvgam_example3 ) expect_equal(NROW(ems), 1) expect_true(all(c("estimate", "conf.low", "conf.high") %in% colnames(ems))) ems <- marginaleffects::avg_predictions( mvgam:::mvgam_example4, variables = list(season = c(1, 6, 12)) ) expect_equal(NROW(ems), 3) expect_true(all( c("season", "estimate", "conf.low", "conf.high") %in% colnames(ems) )) ems <- marginaleffects::avg_predictions( mvgam:::mvgam_example4, variables = list(season = c(1, 6, 12)) ) expect_equal(NROW(ems), 3) expect_true(all( c("season", "estimate", "conf.low", "conf.high") %in% colnames(ems) )) ems <- marginaleffects::predictions( mvgam:::mvgam_example2, by = 'series' ) expect_equal(NROW(ems), nlevels(mvgam:::mvgam_example3$obs_data$series)) }) test_that("comparisons give expected output structures", { cmp <- marginaleffects::comparisons( mvgam:::mvgam_example2, variables = 'series', by = 'time' ) expect_equal(levels(as.factor(cmp$contrast)), c("series_2 - series_1")) cmp <- marginaleffects::comparisons( mvgam:::mvgam_example2, variables = list(series = 'pairwise'), by = 'time' ) expect_equal(levels(as.factor(cmp$contrast)), c("series_2 - series_1")) cmp <- marginaleffects::comparisons( mvgam:::mvgam_example2, newdata = marginaleffects::datagrid(time = c(1, 6, 9), series = unique), variables = list(series = 'pairwise'), by = 'time' ) expect_equal(levels(as.factor(cmp$contrast)), c("series_2 - series_1")) expect_equal(NROW(cmp), 3) expect_equal(unique(cmp$time), c(1, 6, 9)) }) ================================================ FILE: tests/testthat/test-monotonic.R ================================================ context("monotonic") # Simulations are a bit time-consuming skip_on_cran() # Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 mod_data <- data.frame(y = y, x = x, z = rnorm(80), time = 1:80) test_that("k must be an even integer for s(bs = 'moi')", { expect_error( mvgam(y ~ s(x, bs = 'moi', k = 11), data = mod_data, family = gaussian()), "Argument 'k(bs = 'moi')' must be an even integer", fixed = TRUE ) expect_error( mvgam(y ~ s(x, bs = 'moi', k = 1), data = mod_data, family = gaussian()), "Basis dimension is too small", fixed = TRUE ) }) test_that("monotonic only works for one dimensional smooths", { expect_error( mvgam( y ~ s(x, z, bs = 'moi', k = 10), data = mod_data, family = gaussian() ), "Monotonic basis only handles 1D smooths", fixed = TRUE ) }) test_that("monotonic for observation models working properly", { mod <- mvgam( y ~ z + s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), run_model = FALSE ) # Monotonic indices should be in the model_data expect_true("b_idx_s_x_" %in% names(mod$model_data)) # The smooth should be an MOI class expect_true(inherits(mod$mgcv_model$smooth[[1]], 'moi.smooth')) # The coefficients should be fixed to be non-negative expect_true(any(grepl( 'b[b_idx_s_x_] = abs(b_raw[b_idx_s_x_]) * 1;', mod$model_file, fixed = TRUE ))) # Repeat a check for decreasing functions mod <- mvgam( y ~ z + s(x, bs = 'mod', k = 18), data = mod_data, family = gaussian(), run_model = FALSE ) # The smooth should be an MOD class expect_true(inherits(mod$mgcv_model$smooth[[1]], 'mod.smooth')) # The coefficients should be fixed to be non-positive expect_true(any(grepl( 'b[b_idx_s_x_] = abs(b_raw[b_idx_s_x_]) * -1;', mod$model_file, fixed = TRUE ))) }) test_that("monotonic for process models working properly", { mod <- mvgam( y ~ 0, trend_formula = ~ z + s(x, bs = 'moi', k = 18), trend_model = RW(), data = mod_data, family = gaussian(), run_model = FALSE ) # Monotonic indices should be in the model_data expect_true("b_trend_idx_s_x_" %in% names(mod$model_data)) # The smooth should be an MOI class expect_true(inherits(mod$trend_mgcv_model$smooth[[1]], 'moi.smooth')) # The coefficients should be fixed to be non-negative expect_true(any(grepl( 'b_trend[b_trend_idx_s_x_] = abs(b_raw_trend[b_trend_idx_s_x_]) * 1;', mod$model_file, fixed = TRUE ))) # And for decreasing mod <- mvgam( y ~ 0, trend_formula = ~ z + s(x, bs = 'mod', k = 18), trend_model = RW(), data = mod_data, family = gaussian(), run_model = FALSE ) # The smooth should be an MOD class expect_true(inherits(mod$trend_mgcv_model$smooth[[1]], 'mod.smooth')) # The coefficients should be fixed to be non-positive expect_true(any(grepl( 'b_trend[b_trend_idx_s_x_] = abs(b_raw_trend[b_trend_idx_s_x_]) * -1;', mod$model_file, fixed = TRUE ))) }) ================================================ FILE: tests/testthat/test-mvgam-methods.R ================================================ context("class methods") test_that("inverse links working", { expect_true(is(mvgam:::family_invlinks('gaussian'), 'function')) expect_true(is(mvgam:::family_invlinks('Gamma'), 'function')) expect_true(is(mvgam:::family_invlinks('beta_binomial'), 'function')) }) test_that("series_to_mvgam working", { data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') expect_true(inherits( series_to_mvgam(series, frequency(series), 0.85), 'list' )) # An xts object example dates <- seq(as.Date("2001-05-01"), length = 30, by = "quarter") data <- cbind( c(gas = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))) ) series <- xts::xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') expect_list(series_to_mvgam(series, freq = 4, train_prop = 0.85)) }) test_that("stancode and standata working properly", { simdat <- sim_mvgam() mod <- mvgam( y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE ) expect_character(stancode(mod)) expect_list(standata(mod)) }) # Skip remaining time-consuming tests on CRAN skip_on_cran() test_that("add_residuals working properly", { mod <- mvgam:::mvgam_example1 oldresids <- mod$resids mod <- add_residuals(mod) expect_true(all(unlist(lapply(seq_along(oldresids), function(x) { all.equal(dim(oldresids[[x]]), dim(mod$resids[[x]])) })))) }) test_that("mcmc diagnostics working properly", { expect_true(inherits(nuts_params(mvgam:::mvgam_example1), 'data.frame')) expect_true(inherits(nuts_params(mvgam:::mvgam_example2), 'data.frame')) expect_true(inherits(rhat(mvgam:::mvgam_example1), 'numeric')) expect_true(inherits(rhat(mvgam:::mvgam_example4), 'numeric')) expect_true(inherits(SW(neff_ratio(mvgam:::mvgam_example1)), 'numeric')) expect_true(inherits(SW(neff_ratio(mvgam:::mvgam_example4)), 'numeric')) }) test_that("compute_edf working properly", { mod <- mvgam:::mvgam_example1 expect_no_error(capture_output(mvgam:::compute_edf( mod$mgcv_model, mod, 'rho', 'sigma_raw', conservative = FALSE ))) mod <- mvgam:::mvgam_example4 expect_no_error(capture_output(mvgam:::compute_edf( mod$trend_mgcv_model, mod, 'rho_trend', 'sigma_raw_trend', conservative = TRUE ))) }) test_that("conditional_effects works properly", { effects <- conditional_effects(mvgam:::mvgam_example1) lapply(effects, expect_ggplot) effects <- conditional_effects(mvgam:::mvgam_example2) lapply(effects, expect_ggplot) effects <- conditional_effects(mvgam:::mvgam_example3) lapply(effects, expect_ggplot) effects <- conditional_effects(mvgam:::mvgam_example4) lapply(effects, expect_ggplot) }) test_that("mcmc_plot works properly", { expect_ggplot(mcmc_plot(mvgam:::mvgam_example1, type = "dens")) expect_ggplot(mcmc_plot( mvgam:::mvgam_example1, type = "scatter", variable = variables(mvgam:::mvgam_example1)$observation_betas[2:3, 1] )) expect_error( mcmc_plot(mvgam:::mvgam_example1, type = "density"), "Invalid plot type" ) expect_ggplot(SW(mcmc_plot(mvgam:::mvgam_example2, type = "neff"))) expect_silent(p <- mcmc_plot(mvgam:::mvgam_example3, type = "areas")) expect_error( mcmc_plot(mvgam:::mvgam_example3, type = "hex"), "Exactly 2 parameters must be selected" ) expect_ggplot(mcmc_plot(mvgam:::mvgam_example4)) expect_no_error(SW(pairs(mvgam:::mvgam_example2))) expect_no_error(SW(pairs( mvgam:::mvgam_example4, variable = c('sigma'), regex = TRUE ))) }) test_that("pp_check and ppc work properly", { expect_ggplot(SW(SM(pp_check(mvgam:::mvgam_example1)))) expect_ggplot(SW(SM(pp_check( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_example1$obs_data[1:10, ] )))) expect_ggplot(SW(SM(pp_check(mvgam:::mvgam_example2, "stat", ndraws = 5)))) expect_ggplot(SW(SM(pp_check(mvgam:::mvgam_example3, "error_binned")))) expect_ggplot(SW(SM(pp_check( mvgam:::mvgam_example3, "resid_hist", ndraws = 5 )))) expect_ggplot(SW(SM(pp_check( mvgam:::mvgam_example1, ndraws = 5, type = 'resid_hist_grouped', group = 'series' )))) pp <- SW(SM(pp_check( object = mvgam:::mvgam_example4, type = "ribbon_grouped", group = "series", x = "season" ))) expect_ggplot(pp) pp <- SW(SM(pp_check( mvgam:::mvgam_example2, type = "violin_grouped", group = "season", newdata = mvgam:::mvgam_example2$obs_data[1:10, ] ))) expect_ggplot(pp) expect_ggplot(SW(SM(pp_check(mvgam:::mvgam_example4, prefix = "ppd")))) expect_no_error(ppc(mvgam:::mvgam_example4)) expect_error(ppc(mvgam:::mvgam_example1, type = 'banana')) expect_no_error(ppc(mvgam:::mvgam_example1, type = 'hist')) expect_error( ppc(mvgam:::mvgam_example3, type = 'rootogram'), 'Rootograms not supported for checking non-count data' ) }) test_that("model.frame gives expected output structure", { mod_data <- model.frame(mvgam:::mvgam_example1) # model.frame should give the exact same data used for modelling, including # any NAs in the outcome variable expect_true(identical( data.frame(y = mod_data$y, season = mod_data$season), data.frame( y = mvgam:::mvgam_example1$obs_data$y, season = mvgam:::mvgam_example1$obs_data$season ) )) # The setup mgcv model should have the same number of observations, but with # these NAs imputed mod_data <- model.frame(mvgam:::mvgam_example1$mgcv_model) expect_equal(NROW(mod_data), NROW(mvgam:::mvgam_example1$obs_data)) expect_false(any(is.na(mod_data))) }) test_that("as.data.frame and friends have resonable outputs", { out <- as.data.frame(mvgam:::mvgam_example4, variable = 'betas') expect_s3_class(out, "data.frame") expect_equal( names(out), c( "(Intercept)", "seriesseries_2", "s(season).1", "s(season).2", "s(season).3" ) ) out <- as.data.frame(mvgam:::mvgam_example4, variable = 'trend_params') expect_s3_class(out, "data.frame") expect_equal(names(out)[1], "alpha_gp[1]") out <- as.data.frame(mvgam:::mvgam_example4, variable = 'obs_params') expect_s3_class(out, "data.frame") expect_equal(names(out), c("sigma_obs[1]", "sigma_obs[2]")) out <- as.matrix(mvgam:::mvgam_example2, variable = 'obs_params') expect_true(inherits(out, "matrix")) expect_equal(dimnames(out)[[2]], c("sigma_obs[1]", "sigma_obs[2]")) }) test_that("coef has resonable outputs", { out <- coef(mvgam:::mvgam_example1) expect_equal( rownames(out), c("(Intercept)", "s(season).1", "s(season).2", "s(season).3") ) expect_equal(dim(out), c(4, 5)) }) test_that("logLik has reasonable ouputs", { liks <- logLik(mvgam:::mvgam_example4) expect_equal(dim(liks), c(5, NROW(mvgam:::mvgam_example2$obs_data))) # NAs in observations should propagate for likelihood calculations expect_true(all(is.na(liks[, which(is.na( mvgam:::mvgam_example2$obs_data$y ))]))) }) test_that("predict has reasonable outputs", { newdat1 <- data.frame(series = 'series_1') expect_error( predict(mvgam_example1, newdata = newdat1), "the following required variables are missing from newdata:\n season" ) newdat2 <- data.frame(series = factor('series_1'), time = 1) expect_error( predict(mvgam_example1, newdata = newdat2), "the following required variables are missing from newdata:\n season" ) newdat3 <- list(series = factor('series_1'), time = 1) expect_error( predict(mvgam_example4, newdata = newdat3), "the following required variables are missing from newdata:\n season" ) gaus_preds <- predict(mvgam:::mvgam_example4, type = 'link', summary = FALSE) expect_equal(dim(gaus_preds), c(5, NROW(mvgam:::mvgam_example2$obs_data))) gaus_preds <- predict( mvgam:::mvgam_example3, type = 'response', summary = FALSE ) expect_equal(dim(gaus_preds), c(5, NROW(mvgam:::mvgam_example3$obs_data))) expect_error( predict(mvgam:::mvgam_example1, type = 'latent_N'), '"latent_N" type only available for N-mixture models', fixed = TRUE ) preds <- predict(mvgam:::mvgam_example3, type = 'terms') expect_true(inherits(preds, 'list')) expect_true(all.equal(names(preds$obs_effects), c('fit', 'se.fit'))) expect_true(is.null(preds$process_effects)) preds <- predict(mvgam:::mvgam_example2, type = 'terms') expect_true(inherits(preds, 'list')) expect_true(length(preds$obs_effects) == 0) expect_true(!is.null(preds$process_effects)) preds <- predict(mvgam:::mvgam_example4, type = 'terms', summary = FALSE) expect_true(inherits(preds, 'list')) expect_true(is.matrix(preds$obs_effects[[1]])) }) test_that("get_predict has reasonable outputs", { gaus_preds <- predict( mvgam:::mvgam_example1, type = 'link', process_error = FALSE, summary = FALSE ) meffects_preds <- get_predict( mvgam:::mvgam_example1, newdata = mvgam:::mvgam_example1$obs_data, type = 'link' ) expect_true(NROW(meffects_preds) == NCOL(gaus_preds)) expect_true(identical(meffects_preds$estimate, apply(gaus_preds, 2, median))) }) test_that("hindcast has reasonable outputs", { expect_error( forecast(mvgam:::mvgam_example2), 'newdata must be supplied to compute forecasts' ) hc <- hindcast(mvgam:::mvgam_example1) expect_s3_class(hc, 'mvgam_forecast') expect_true(is.null(hc$forecasts)) expect_equal( dim(hc$hindcasts[[1]]), c( 5, NROW(mvgam:::mvgam_example2$obs_data) / nlevels(mvgam:::mvgam_example2$obs_data$series) ) ) expect_equal( hc$train_observations[[1]], mvgam:::mvgam_example2$obs_data$y[ which(mvgam:::mvgam_example2$obs_data$series == 'series_1') ] ) }) test_that("plot_mvgam_resids gives reasonable outputs", { expect_ggplot(plot_mvgam_resids(mvgam:::mvgam_example1)) expect_ggplot(plot_mvgam_resids(mvgam:::mvgam_example2)) }) test_that("plot_mvgam_resids handles NA residuals without error", { # mvgam_example1 has NAs in residuals from missing observations; # quantile() calls must use na.rm = TRUE to avoid errors mod <- mvgam:::mvgam_example1 expect_true(anyNA(mod$resids[[1]])) expect_ggplot(plot_mvgam_resids(mod, series = 1)) }) test_that("residuals method returns NAs for missing observations", { mod <- mvgam:::mvgam_example1 resids <- residuals(mod) # Residuals should be NaN where observations were missing na_obs <- which(is.na(mod$obs_data$y)) expect_true(length(na_obs) > 0) expect_true(all(is.nan(resids[na_obs, "Estimate"]))) }) test_that("plot_mvgam_series gives reasonable outputs", { simdat <- sim_mvgam() expect_ggplot(plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test )) expect_ggplot(plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 'all' )) expect_ggplot(plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, lines = FALSE )) expect_ggplot(plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, lines = FALSE, series = 'all' )) # Should also work for list data dat_train <- list() for (i in 1:NCOL(simdat$data_train)) { dat_train[[i]] <- simdat$data_train[, i] } names(dat_train) <- colnames(simdat$data_train) dat_test <- list() for (i in 1:NCOL(simdat$data_test)) { dat_test[[i]] <- simdat$data_test[, i] } names(dat_test) <- colnames(simdat$data_test) expect_ggplot(plot_mvgam_series(data = dat_train)) expect_ggplot(plot_mvgam_series(data = dat_train, newdata = dat_test)) expect_ggplot(plot_mvgam_series( data = dat_train, newdata = dat_test, series = 'all' )) expect_ggplot(plot_mvgam_series( data = dat_train, newdata = dat_test, lines = FALSE )) expect_ggplot(plot_mvgam_series( data = dat_train, newdata = dat_test, lines = FALSE, series = 'all' )) # And for mvgam objects expect_ggplot(plot_mvgam_series(object = mvgam:::mvgam_example1, series = 1)) expect_no_error(SW(plot(mvgam:::mvgam_example1, type = 'series'))) }) test_that("forecast and ensemble have reasonable outputs", { set.seed(1234) mvgam_examp_dat <- sim_mvgam( family = gaussian(), T = 40, prop_missing = 0.1, n_series = 2 ) newdat <- mvgam_examp_dat$data_test fc <- forecast(object = mvgam:::mvgam_example4, newdata = newdat) expect_s3_class(fc, 'mvgam_forecast') expect_equal( dim(fc$forecasts[[1]]), c( 5, NROW(newdat) / nlevels(newdat$series) ) ) expect_equal( fc$test_observations[[1]], newdat$y[which(newdat$series == 'series_1')] ) # Check that ensemble.mvgam_forecast works fc2 <- forecast(object = mvgam:::mvgam_example3, newdata = newdat) fc_ens <- ensemble(fc, fc2, ndraws = 3000) expect_equal( dim(fc_ens$forecasts[[1]]), c( 3000, NROW(newdat) / nlevels(newdat$series) ) ) expect_equal( fc_ens$test_observations[[1]], newdat$y[which(newdat$series == 'series_1')] ) fc_ens <- ensemble(fc, fc2, ndraws = 19) expect_equal( dim(fc_ens$forecasts[[1]]), c( 19, NROW(newdat) / nlevels(newdat$series) ) ) # ndraws must be positive integer expect_error(ensemble(fc, fc2, ndraws = 0)) }) test_that("ensemble gives equal pooling", { set.seed(1234) mvgam_examp_dat <- sim_mvgam( family = gaussian(), T = 40, prop_missing = 0.1, n_series = 2 ) newdat <- mvgam_examp_dat$data_test fc <- forecast(object = mvgam:::mvgam_example4, newdata = newdat) fc2 <- forecast(object = mvgam:::mvgam_example3, newdata = newdat) # Replace casts with dummy data fc$hindcasts = lapply( fc$hindcasts, \(series_hcs) matrix(4, 1, ncol(series_hcs)) ) fc2$hindcasts = lapply( fc2$hindcasts, \(series_hcs) matrix(5, 10, ncol(series_hcs)) ) fc$forecasts = lapply( fc$forecasts, \(series_fcs) matrix(1, 1, ncol(series_fcs)) ) fc2$forecasts = lapply( fc2$forecasts, \(series_fcs) matrix(2, 10, ncol(series_fcs)) ) n_draws <- 500 fc_ens <- ensemble(fc, fc2, ndraws = n_draws) # Expect that roughly 50% of hindcasts should be a 4 and 50% a 5 four_props <- unlist( lapply(seq_along(fc_ens$hindcasts), function(x) { length(which(fc_ens$hindcasts[[x]] == 4)) / length(fc_ens$hindcasts[[x]]) }), use.names = FALSE ) expect_equal(four_props, rep(0.5, 2), tolerance = 0.03) # Expect that roughly 50% of forecasts should be a 1 and 50% a 2 two_props <- unlist( lapply(seq_along(fc_ens$hindcasts), function(x) { length(which(fc_ens$forecasts[[x]] == 2)) / length(fc_ens$forecasts[[x]]) }), use.names = FALSE ) expect_equal(two_props, rep(0.5, 2), tolerance = 0.03) }) ================================================ FILE: tests/testthat/test-mvgam.R ================================================ context("mvgam") test_that("family must be correctly specified", { expect_error( mod <- mvgam( y ~ s(season), trend_model = AR(), data = beta_data$data_train, family = 'besta', run_model = FALSE ), 'family not recognized' ) }) test_that("response variable must be specified", { expect_error( mod <- mvgam( ~ s(season), trend_model = AR(), data = beta_data$data_train, family = betar(), run_model = FALSE ), 'response variable is missing from formula' ) }) test_that("drift deprecation message works", { expect_message( mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = RW(ma = TRUE), drift = TRUE, data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), run_model = FALSE ), 'The "drift" argument is deprecated; use fixed effects of "time" instead' ) }) test_that("id to link smooths not allowed yet", { expect_error( mod <- mvgam( y ~ s(time, id = 1) + s(time, by = series, id = 1), data = beta_data$data_train, family = betar(), run_model = FALSE ), 'smooth terms with the "id" argument not yet supported by mvgam' ) }) test_that("response variable must follow family-specific restrictions", { expect_error( mod <- mvgam( y ~ s(season), trend_model = AR(), data = gaus_data$data_train, family = lognormal(), run_model = FALSE ), 'Values <= 0 not allowed for lognormal responses' ) expect_error( mod <- mvgam( y ~ s(season), trend_model = AR(), data = gaus_data$data_train, family = poisson(), run_model = FALSE ), 'Values < 0 not allowed for count family responses' ) }) test_that("trend_model must be correctly specified", { expect_error(SW( mod <- mvgam( y ~ s(season), trend_model = 'AR11', data = beta_data$data_train, family = betar(), run_model = FALSE ) )) }) test_that("outcome variable must be present in data", { data = data.frame(out = rnorm(100), temp = rnorm(100), time = 1:100) expect_error( mod <- mvgam( formula = y ~ dynamic(temp, rho = 20), data = data, family = gaussian(), run_model = FALSE ), 'variable y not found in data' ) }) test_that("series levels must match unique entries in series", { levels(beta_data$data_train$series) <- paste0('series_', 1:6) expect_error(mvgam( y ~ s(season), trend_model = GP(), data = beta_data$data_train, newdata = beta_data$data_test, family = betar(), run_model = FALSE )) }) test_that("missing values not allowed in predictors", { # Include missing vals in training data simdat <- sim_mvgam() simdat$data_train$season[4] <- NA expect_error(mvgam( y ~ s(season), trend_model = GP(), data = simdat$data_train, newdata = simdat$data_test, run_model = FALSE )) # Include missing vals in testing data simdat <- sim_mvgam() simdat$data_test$season[4] <- NA expect_error(mvgam( y ~ s(season), data = simdat$data_train, newdata = simdat$data_test, run_model = FALSE )) }) test_that("all series must have observations for all unique timepoints", { data <- sim_mvgam() data$data_train <- data$data_train[-2, ] expect_error( mod <- mvgam( y ~ s(season), trend_model = AR(), data = data$data_train, family = poisson(), run_model = FALSE ), 'One or more series in data is missing observations for one or more timepoints' ) data <- sim_mvgam() data$data_test <- data$data_test[-2, ] expect_error( mod <- mvgam( y ~ s(season), trend_model = AR(), data = data$data_train, newdata = data$data_test, family = poisson(), run_model = FALSE ), 'One or more series in newdata is missing observations for one or more timepoints' ) }) test_that("rho argument must be positive numeric", { data = data.frame(out = rnorm(100), temp = rnorm(100), time = 1:100) expect_error( mod <- mvgam( formula = out ~ dynamic(temp, rho = -1), data = data, family = gaussian(), run_model = FALSE ), 'Argument "rho" in dynamic() must be a positive value', fixed = TRUE ) }) # Skip remaining tests on CRAN as they are slightly time-consuming skip_on_cran() test_that("JAGS setups should work", { # JAGS setup should work, whether installed or not simdat <- sim_mvgam() mod <- mvgam( y ~ s(season), trend_model = RW(), data = simdat$data_train, family = poisson(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) mod <- mvgam( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = poisson(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) mod <- mvgam( y ~ s(season), trend_model = AR(p = 2), data = simdat$data_train, family = poisson(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) mod <- mvgam( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = nb(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) mod <- mvgam( y ~ s(season), trend_model = AR(p = 3), data = simdat$data_train, family = nb(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) mod <- mvgam( y ~ s(season), trend_model = AR(p = 3), data = simdat$data_train, family = gaussian(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) expect_true(inherits( get_mvgam_priors( y ~ s(season), trend_model = 'RW', data = simdat$data_train, family = gaussian(), use_stan = FALSE ), 'data.frame' )) mod <- mvgam( y ~ s(season), trend_model = RW(), data = simdat$data_train, family = gaussian(), use_stan = FALSE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(mod$drift == FALSE) }) test_that("implicit_vars are added correctly", { simdat <- sim_mvgam(n_series = 1) no_series <- simdat$data_train no_series$series <- NULL mod <- mvgam(y ~ s(season, bs = 'cc'), data = no_series, run_model = FALSE) expect_equal('series', attr(mod$obs_data, 'implicit_vars')) mod <- SM(mvgam( y ~ 1, trend_formula = ~ s(season), data = no_series, run_model = FALSE )) expect_equal('series', attr(mod$obs_data, 'implicit_vars')) no_series$time <- NULL mod <- mvgam(y ~ s(season, bs = 'cc'), data = no_series, run_model = FALSE) expect_equal(c('series', 'time'), attr(mod$obs_data, 'implicit_vars')) }) test_that("trend = 'None' works for State Space", { mod <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = 'None', data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( trimws("LV[i, j] ~ normal(trend_mus[ytimes_trend[i, j]], sigma[j]);"), trimws(mod$model_file), fixed = TRUE ))) expect_true(any(grepl( trimws("trend[i, s] = dot_product(Z[s, : ], LV[i, : ]);"), trimws(mod$model_file), fixed = TRUE ))) }) test_that("noncentring working properly for a range of models", { # First check messages expect_message( mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = RW(ma = TRUE), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ), 'Non-centering of trends currently not available for this model' ) expect_message( mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = RW(cor = TRUE), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ), 'Non-centering of trends currently not available for this model' ) expect_message( mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = AR(p = 2, cor = TRUE), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ), 'Non-centering of trends currently not available for this model' ) expect_message( mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(season, series, bs = 'sz'), trend_model = VAR(), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ), 'Non-centering of trends currently not available for this model' ) # Now check that the non-centering is incorporated properly mod <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = RW(), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ) # Model file should have the non-centred trend parameterisation now expect_true( any(grepl( trimws("trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("trend[2 : n, s] += trend[1 : (n - 1), s];"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("to_vector(trend_raw) ~ std_normal();"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("b[b_idx_s_time_] = abs(b_raw[b_idx_s_time_]) * 1;"), trimws(mod$model_file), fixed = TRUE )) ) mod <- mvgam( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(season, series, bs = 'sz'), trend_model = AR(p = 3), priors = c( prior(beta(2, 2), class = ar1, lb = 0, ub = 1), prior(exponential(3.466), class = sigma) ), data = gaus_data$data_train, newdata = gaus_data$data_test, family = gaussian(), noncentred = TRUE, run_model = FALSE ) expect_true( any(grepl( trimws("ar1 ~ beta(2, 2);"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("sigma ~ exponential(3.466);"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("trend[2, s] += ar1[s] * trend[1, s];"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("trend[3, s] += ar1[s] * trend[2, s] + ar2[s] * trend[1, s];"), trimws(mod$model_file), fixed = TRUE )) ) expect_true( any(grepl( trimws("to_vector(trend_raw) ~ std_normal();"), trimws(mod$model_file), fixed = TRUE )) ) }) test_that("prior_only works", { mod <- mvgam( y ~ s(season), trend_model = AR(p = 2), data = gaus_data$data_train, prior_simulation = TRUE, family = gaussian(), threads = 2, run_model = FALSE ) expect_no_error(capture_output(code(mod))) expect_true(!any(grepl('likelihood functions', mod$model_file, fixed = TRUE))) expect_true(!any(grepl('flat_ys ~ ', mod$model_file, fixed = TRUE))) mod <- mvgam( y ~ 1, trend_formula = ~ s(season) + s(trend, bs = 're'), trend_model = VAR(ma = TRUE), data = gaus_data$data_train, prior_simulation = TRUE, family = gaussian(), threads = 2, run_model = FALSE ) expect_no_error(capture_output(code(mod))) expect_true(!any(grepl('likelihood functions', mod$model_file, fixed = TRUE))) expect_true(!any(grepl('flat_ys ~ ', mod$model_file, fixed = TRUE))) mod <- mvgam( y ~ 1, trend_formula = ~ s(season) + s(trend, bs = 're'), trend_model = CAR(), data = gaus_data$data_train, prior_simulation = TRUE, family = gaussian(), threads = 2, run_model = FALSE ) expect_no_error(capture_output(code(mod))) expect_true(!any(grepl('likelihood functions', mod$model_file, fixed = TRUE))) expect_true(!any(grepl('flat_ys ~ ', mod$model_file, fixed = TRUE))) # trend_map not yet allowed for CAR1 dynamics trend_map <- data.frame( series = unique(gaus_data$data_train$series), trend = c(1, 1, 2) ) expect_error( mvgam( y ~ 1, trend_formula = ~ s(season) + s(trend, bs = 're'), trend_model = CAR(), trend_map = trend_map, data = gaus_data$data_train, prior_simulation = TRUE, family = gaussian(), threads = 2, run_model = FALSE ), 'cannot yet use trend mapping for CAR1 dynamics' ) mod <- mvgam( y ~ s(season), trend_model = AR(p = 3), data = beta_data$data_train, prior_simulation = TRUE, family = betar(), run_model = FALSE ) expect_no_error(capture_output(code(mod))) expect_true(!any(grepl('likelihood functions', mod$model_file, fixed = TRUE))) expect_true(!any(grepl('flat_ys ~ ', mod$model_file, fixed = TRUE))) mod <- mvgam( y ~ 1, trend_formula = ~ s(season) + s(trend, bs = 're'), trend_model = AR(cor = TRUE, ma = TRUE), data = beta_data$data_train, prior_simulation = TRUE, family = betar(), run_model = FALSE ) expect_no_error(capture_output(code(mod))) expect_true(!any(grepl('likelihood functions', mod$model_file, fixed = TRUE))) expect_true(!any(grepl('flat_ys ~ ', mod$model_file, fixed = TRUE))) }) test_that("time not required in data if this is a no trend model", { data <- data.frame(out = rnorm(100), temp = rnorm(100)) mod <- mvgam( formula = out ~ dynamic(temp, rho = 20), data = data, family = gaussian(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_equal(attr(mod$obs_data, 'implicit_vars'), c('series', 'time')) }) test_that("median coefs should be stored in the mgcv object", { expect_true(identical( unname(coef(mvgam:::mvgam_example2$mgcv_model)), coef(mvgam:::mvgam_example2)[, 2] )) }) test_that("empty obs formula is allowed, even if no trend_formula", { mod <- mvgam( formula = y ~ -1, trend_model = AR(), data_train = gaus_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) }) test_that("empty obs formula allowed if trend_formula supplied", { mod <- mvgam( formula = y ~ -1, trend_formula = ~ s(season), trend_model = AR(), data = gaus_data$data_train, family = gaussian(), run_model = FALSE ) # Check that the intercept coefficient is correctly fixed at zero expect_true(any(grepl( '// (Intercept) fixed at zero', mod$model_file, fixed = TRUE ))) expect_true(any(grepl('b[1] = 0;', mod$model_file, fixed = TRUE))) }) test_that("share_obs_params working", { # Standard beta mod <- mvgam( y ~ s(season, by = series), trend_model = RW(cor = TRUE), family = betar(), data = beta_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realphi;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'phi_vec[1:n,s]=rep_vector(phi,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # State-space beta mod <- mvgam( y ~ -1, trend_formula = ~ s(season, by = trend), trend_model = RW(cor = TRUE), family = betar(), data = beta_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realphi;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'phi_vec[1:n,s]=rep_vector(phi,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Standard gaussian mod <- mvgam( y ~ s(season, by = series), trend_model = RW(cor = TRUE), family = gaussian(), data = gaus_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # State-space gaussian mod <- mvgam( y ~ -1, trend_formula = ~ s(season, by = trend) + s(trend, bs = 're'), trend_model = RW(cor = TRUE), family = gaussian(), data = gaus_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Standard student mod <- mvgam( y ~ s(season, by = series), trend_model = RW(cor = TRUE), family = student_t(), data = gaus_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # State-space student mod <- mvgam( y ~ -1, trend_formula = ~ s(season, by = trend), trend_model = RW(cor = TRUE), family = student_t(), data = gaus_data$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Standard lognormal simdat <- sim_mvgam(family = Gamma()) mod <- mvgam( y ~ s(season, by = series), trend_model = RW(cor = TRUE), family = lognormal(), data = simdat$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # State-space lognormal mod <- mvgam( y ~ -1, trend_formula = ~ s(season, by = trend), trend_model = RW(cor = TRUE), family = lognormal(), data = simdat$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'sigma_obs_vec[1:n,s]=rep_vector(sigma_obs,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Standard Gamma mod <- mvgam( y ~ s(season, by = series), trend_model = RW(cor = TRUE), family = Gamma(), data = simdat$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realshape;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'shape_vec[1:n,s]=rep_vector(shape,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # State-space Gamma mod <- mvgam( y ~ -1, trend_formula = ~ s(season, by = trend), trend_model = RW(cor = TRUE), family = Gamma(), data = simdat$data_train, share_obs_params = TRUE, run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(any(grepl( 'realshape;', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'shape_vec[1:n,s]=rep_vector(shape,n);', gsub(' ', '', mod$model_file), fixed = TRUE ))) }) test_that("trend_map is behaving propoerly", { sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train trend_map <- data.frame(series = unique(mod_data$series), trend = c(1, 1, 2)) mod_map <- mvgam( y ~ s(season, bs = 'cc'), trend_map = trend_map, trend_model = AR(), data = mod_data, run_model = FALSE ) expect_true(identical( mod_map$model_data$Z, matrix(c(1, 0, 1, 0, 0, 1), ncol = 2, byrow = TRUE) )) expect_true(mod_map$use_lv) # Ill-specified trend_map trend_map <- data.frame( series = c('series_1', 'series_1', 'series_2'), trend = c(1, 1, 2) ) expect_error( mvgam( y ~ s(season, bs = 'cc'), trend_map = trend_map, trend_model = AR(), data = mod_data, run_model = FALSE ), 'Argument "trend_map" must have an entry for every unique time series in "data"', fixed = TRUE ) }) test_that("models with only random effects should work without error", { sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train mod_map <- mvgam(y ~ s(series, bs = 're'), data = mod_data, run_model = FALSE) expect_true(inherits(mod_map, 'mvgam_prefit')) }) test_that("models with only fs smooths should work without error", { sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train mod_map <- mvgam( y ~ s(season, series, bs = 'fs'), data = mod_data, run_model = FALSE ) expect_true(inherits(mod_map, 'mvgam_prefit')) }) test_that("trend_formula setup is working properly", { sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train mod_map <- mvgam( y ~ s(series, bs = 're'), trend_formula = ~ s(season, bs = 'cc'), trend_model = AR(), data = mod_data, run_model = FALSE ) expect_true(identical( mod_map$model_data$Z, matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3, byrow = TRUE) )) expect_true(mod_map$use_lv) expect_true(!is.null(mod_map$trend_mgcv_model)) expect_equal( colnames(model.frame(mod_map, trend_effects = TRUE)), c('trend_y', 'season') ) expect_equal(colnames(get_data(mod_map)), c('y', 'series', 'time', 'season')) expect_error( mvgam( y ~ 1, trend_formula = 1 ~ s(series, bs = 're') + s(season, bs = 'cc'), trend_model = AR(), data = mod_data, run_model = FALSE ), 'Argument "trend_formula" should not have a left-hand side', fixed = TRUE ) expect_error( mvgam( y ~ 1, trend_formula = ~ s(series, bs = 're') + s(season, bs = 'cc'), trend_model = AR(), data = mod_data, run_model = FALSE ), 'Argument "trend_formula" should not have the identifier "series" in it.\nUse "trend" instead for varying effects', fixed = TRUE ) }) # Check that parametric effect priors are properly incorporated in the # model for a wide variety of model forms test_that("parametric effect priors correctly incorporated in models", { mod_data <- mvgam:::mvgam_examp_dat mod_data$data_train$x1 <- rnorm(NROW(mod_data$data_train)) mod_data$data_train$x2 <- rnorm(NROW(mod_data$data_train)) mod_data$data_train$x3 <- rnorm(NROW(mod_data$data_train)) # Observation formula; no trend mod <- mvgam( y ~ s(season) + series:x1 + series:x2 + series:x3, trend_model = 'None', data = mod_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(any(grepl( '// prior for seriesseries_2:x1...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// prior for (Intercept)...', mod$model_file, fixed = TRUE ))) para_names <- paste0(paste0( '// prior for seriesseries_', 1:2, paste0(':x', 1:3, '...') )) for (i in seq_along(para_names)) { expect_true(any(grepl(para_names[i], mod$model_file, fixed = TRUE))) } priors <- get_mvgam_priors( y ~ s(season) + series:x1 + series:x2 + series:x3, trend_model = 'None', data = mod_data$data_train, family = gaussian() ) expect_true(any(grepl('seriesseries_1:x2', priors$param_name))) expect_true(any(grepl('seriesseries_2:x3', priors$param_name))) # Observation formula; complex trend mod <- mvgam( y ~ s(season) + series:x1 + series:x2 + series:x3, trend_model = VAR(ma = TRUE), data = mod_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(any(grepl( '// prior for seriesseries_2:x1...', mod$model_file, fixed = TRUE ))) expect_true(any(grepl( '// prior for (Intercept)...', mod$model_file, fixed = TRUE ))) para_names <- paste0(paste0( '// prior for seriesseries_', 1:2, paste0(':x', 1:3, '...') )) for (i in seq_along(para_names)) { expect_true(any(grepl(para_names[i], mod$model_file, fixed = TRUE))) } priors <- get_mvgam_priors( y ~ s(season) + series:x1 + series:x2 + series:x3, trend_model = VAR(ma = TRUE), data = mod_data$data_train, family = gaussian() ) expect_true(any(grepl('seriesseries_1:x2', priors$param_name))) expect_true(any(grepl('seriesseries_2:x3', priors$param_name))) # Trend formula; RW mod <- mvgam( y ~ 1, trend_formula = ~ s(season) + trend:x1 + trend:x2 + trend:x3, trend_model = RW(), data = mod_data$data_train, noncentred = TRUE, family = gaussian(), run_model = FALSE ) expect_true(any(grepl( '// prior for (Intercept)...', mod$model_file, fixed = TRUE ))) para_names <- paste0(paste0( '// prior for trendtrend', 1:2, paste0(':x', 1:3, '_trend...') )) for (i in seq_along(para_names)) { expect_true(any(grepl(para_names[i], mod$model_file, fixed = TRUE))) } priors <- get_mvgam_priors( y ~ 1, trend_formula = ~ s(season) + trend:x1 + trend:x2 + trend:x3, trend_model = RW(), data = mod_data$data_train, family = gaussian() ) expect_true(any(grepl('trendtrend1:x1_trend', priors$param_name))) expect_true(any(grepl('trendtrend2:x3_trend', priors$param_name))) # Trend formula; VARMA mod <- mvgam( y ~ 1, trend_formula = ~ s(season) + trend:x1 + trend:x2 + trend:x3, trend_model = VAR(ma = TRUE), data = mod_data$data_train, family = gaussian(), run_model = FALSE ) expect_true(any(grepl( '// prior for (Intercept)...', mod$model_file, fixed = TRUE ))) para_names <- paste0(paste0( '// prior for trendtrend', 1:2, paste0(':x', 1:3, '_trend...') )) for (i in seq_along(para_names)) { expect_true(any(grepl(para_names[i], mod$model_file, fixed = TRUE))) } priors <- get_mvgam_priors( y ~ 1, trend_formula = ~ s(season) + trend:x1 + trend:x2 + trend:x3, trend_model = RW(), data = mod_data$data_train, family = gaussian() ) expect_true(any(grepl('trendtrend1:x1_trend', priors$param_name))) expect_true(any(grepl('trendtrend2:x3_trend', priors$param_name))) }) ================================================ FILE: tests/testthat/test-mvgam_priors.R ================================================ context("mvgam_priors") test_that("drift deprecation message works", { expect_message( get_mvgam_priors( y ~ s(series, bs = 're') + s(season, bs = 'cc', k = 8) + s(time, bs = 'moi', k = 8), trend_model = RW(ma = TRUE), drift = TRUE, data = gaus_data$data_train, family = gaussian() ), 'The "drift" argument is deprecated; use fixed effects of "time" instead' ) }) # Generating this many model skeletons takes time; CRAN will complain skip_on_cran() test_that("get_mvgam_priors works for a variety of ma and cor trends", { priors <- get_mvgam_priors( y ~ s(season, k = 7), trend_model = RW(), family = gaussian(), data = mvgam:::mvgam_examp_dat$data_train ) expect_true(inherits(priors, 'data.frame')) priors <- get_mvgam_priors( y ~ s(season, k = 7), trend_model = RW(), family = gaussian(), data = mvgam:::mvgam_examp_dat$data_train ) expect_true(inherits(priors, 'data.frame')) priors <- get_mvgam_priors( y ~ s(season, k = 7), trend_model = RW(ma = TRUE), family = gaussian(), data = mvgam:::mvgam_examp_dat$data_train ) expect_true(inherits(priors, 'data.frame')) priors <- get_mvgam_priors( y ~ s(season, k = 7), trend_model = AR(p = 2, ma = TRUE, cor = TRUE), family = gaussian(), data = mvgam:::mvgam_examp_dat$data_train ) expect_true(inherits(priors, 'data.frame')) priors <- get_mvgam_priors( y ~ 1, trend_formula = ~ s(season, k = 7), trend_model = RW(ma = TRUE, cor = TRUE), family = gaussian(), data = mvgam:::mvgam_examp_dat$data_train ) expect_true(inherits(priors, 'data.frame')) }) test_that("get_mvgam_priors finds all classes for which priors can be specified", { beta_data$data_train$cov <- rnorm(NROW(beta_data$data_train)) beta_data$data_train$cov2 <- rnorm(NROW(beta_data$data_train)) expect_equal( get_mvgam_priors( y ~ s(season) + cov + cov2 + cov * cov2, data = beta_data$data_train, family = betar() )$param_info, c( "(Intercept)", "cov fixed effect", "cov2 fixed effect", "cov:cov2 fixed effect", "s(season) smooth parameters", "Beta precision parameter" ) ) }) test_that("family must be correctly specified", { expect_error( get_mvgam_priors( y ~ s(season), trend_model = AR(), data = beta_data$data_train, family = 'besta' ), 'family not recognized' ) }) test_that("trend_model must be correctly specified", { expect_error(get_mvgam_priors( y ~ s(season), trend_model = 'AR11', data = beta_data$data_train, family = betar() )) }) test_that("response variable must follow family-specific restrictions", { expect_error( get_mvgam_priors( y ~ s(season), trend_model = AR(), data = gaus_data$data_train, family = lognormal() ), 'Values <= 0 not allowed for lognormal responses' ) expect_error( get_mvgam_priors( y ~ s(season), trend_model = AR(), data = gaus_data$data_train, family = poisson() ), 'Values < 0 not allowed for count family responses' ) }) test_that("default intercept prior should match brms implementation", { simdat <- sim_mvgam(family = gaussian(), mu = 500) def_prior <- get_mvgam_priors( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = gaussian() )$prior[1] expect_equal( trimws(strsplit(def_prior, "[~]")[[1]][2]), paste0( brms::get_prior( y ~ 1, data = data.frame(y = simdat$data_train$y), family = gaussian() )$prior[1], ';' ) ) # Now try Student def_prior <- get_mvgam_priors( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = student_t() )$prior[1] expect_equal( trimws(strsplit(def_prior, "[~]")[[1]][2]), paste0( brms::get_prior( y ~ 1, data = data.frame(y = simdat$data_train$y), family = student_t() )$prior[1], ';' ) ) # Now Poisson simdat <- sim_mvgam(family = poisson(), mu = 0) def_prior <- get_mvgam_priors( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = poisson() )$prior[1] expect_equal( trimws(strsplit(def_prior, "[~]")[[1]][2]), paste0( brms::get_prior( y ~ 1, data = data.frame(y = simdat$data_train$y), family = poisson() )$prior[1], ';' ) ) # Now Beta simdat <- sim_mvgam(family = betar(), mu = 0) def_prior <- get_mvgam_priors( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = betar() )$prior[1] expect_equal( trimws(strsplit(def_prior, "[~]")[[1]][2]), paste0( brms::get_prior( y ~ 1, data = data.frame(y = simdat$data_train$y), family = brms::Beta() )$prior[1], ';' ) ) # Now Negative Binomial simdat <- sim_mvgam(family = nb(), mu = 0) def_prior <- get_mvgam_priors( y ~ s(season), trend_model = AR(), data = simdat$data_train, family = nb() )$prior[1] expect_equal( trimws(strsplit(def_prior, "[~]")[[1]][2]), paste0( brms::get_prior( y ~ 1, data = data.frame(y = simdat$data_train$y), family = brms::negbinomial() )$prior[1], ';' ) ) }) test_that("specified priors appear in the Stan code", { priors <- get_mvgam_priors( formula = y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar() ) priors$prior[3] <- "alpha_gp ~ normal(-1, 0.75);" stancode <- mvgam( y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )$model_file expect_true(expect_match2( gsub(' ', '', stancode), 'alpha_gp~normal(-1,0.75);' )) # Now the same using brms functionality priors <- prior(normal(-1, 0.75), class = alpha_gp) stancode <- mvgam( y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )$model_file expect_true(expect_match2(stancode, 'alpha_gp ~ normal(-1, 0.75);')) expect_true(expect_match2( gsub(' ', '', stancode), 'vector[n_series]alpha_gp;' )) }) test_that("specified trend_formula priors appear in the Stan code", { priors <- get_mvgam_priors( formula = y ~ 1, trend_formula = ~ s(season, bs = 'cc') + year, trend_model = AR(), data = beta_data$data_train, family = betar() ) priors$prior[5] <- "year_trend ~ uniform(-2, 1);" stancode <- mvgam( formula = y ~ 1, trend_formula = ~ s(season, bs = 'cc') + year, trend_model = AR(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )$model_file expect_true(expect_match2( gsub(' ', '', stancode), 'b_raw_trend[1]~uniform(-2,1);' )) }) test_that("priors on parametric effects behave correctly", { priors <- get_mvgam_priors( formula = y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar() ) priors$prior[1] <- "(Intercept) ~ normal(-1, 0.75);" stancode <- mvgam( y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )$model_file expect_true(expect_match2( gsub(' ', '', stancode), 'b_raw[1]~normal(-1,0.75);' )) # Now the same using brms functionality priors <- prior(normal(-1, 0.75), class = Intercept) stancode <- mvgam( formula = y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )$model_file expect_true(expect_match2( gsub(' ', '', stancode), 'b_raw[1]~normal(-1,0.75);' )) # Bounds not allowed on parametric effect priors yet priors <- prior(normal(-1, 0.75), class = `Intercept`, lb = 0) expect_warning(mvgam( formula = y ~ s(season, bs = 'cc'), trend_model = GP(), data = beta_data$data_train, family = betar(), priors = priors, run_model = FALSE )) }) test_that("priors on gp() effects work properly", { dat <- sim_mvgam() priors <- c( prior(normal(0, 0.5), class = `alpha_gp(time):seriesseries_1`, ub = 1), prior(normal(5, 1.3), class = `rho_gp_trend(season)[1]`, ub = 50) ) expect_warning( mvgam( formula = y ~ gp(time, by = series, scale = FALSE, k = 10), trend_formula = ~ gp(season, scale = FALSE, k = 10), trend_model = AR(), data = dat$data_train, run_model = FALSE, priors = priors ), 'bounds cannot currently be changed for gp parameters' ) priors <- c( prior(normal(0, 0.5), class = `alpha_gp(time):seriesseries_1`), prior(normal(5, 1.3), class = `rho_gp_trend(season)[1]`) ) mod <- mvgam( formula = y ~ gp(time, by = series, scale = FALSE, k = 10), trend_formula = ~ gp(season, scale = FALSE, k = 10), trend_model = AR(), data = dat$data_train, run_model = FALSE, priors = priors ) # Observation model priors working expect_true(any(grepl( 'alpha_gp_time_byseriesseries_1~normal(0,0.5);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # Process model priors working expect_true(any(grepl( 'rho_gp_trend_season_[1]~normal(5,1.3);', gsub(' ', '', mod$model_file), fixed = TRUE ))) # A quick test of multidimensional gp priors dat <- mgcv::gamSim(1, n = 30, scale = 2) mod <- mvgam( y ~ gp(x1, x2, cov = "matern32", k = 10, iso = FALSE, scale = FALSE), data = dat, family = gaussian(), priors = c( prior(exponential(2.5), class = `alpha_gp(x1, x2)`), prior(normal(0.5, 1), class = `rho_gp(x1, x2)[1][1]`), prior(normal(0.75, 2), class = `rho_gp(x1, x2)[1][2]`) ), run_model = FALSE ) expect_true(any(grepl( 'alpha_gp_x1by_x2_~exponential(2.5);', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'rho_gp_x1by_x2_[1][1]~normal(0.5,1);', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(any(grepl( 'rho_gp_x1by_x2_[1][2]~normal(0.75,2);', gsub(' ', '', mod$model_file), fixed = TRUE ))) }) ================================================ FILE: tests/testthat/test-nmixture.R ================================================ context("n_mixture") # Simulations take a bit of time to set up skip_on_cran() set.seed(100) poisdat <- sim_mvgam() test_that("only count data allowed for nmixtures", { gaus_data$data_train$cap <- 100 expect_error( mvgam( y ~ s(season), trend_formula = ~trend, family = nmix(), data = gaus_data$data_train ), 'Values < 0 not allowed for count family responses', fixed = TRUE ) }) test_that("cap must be supplied in data", { expect_error( get_mvgam_priors( formula = y ~ s(season), trend_formula = ~ s(season) + trend, trend_model = 'None', family = nmix(), data = poisdat$data_train ), 'Max abundances must be supplied as a variable named "cap" for N-mixture models', fixed = TRUE ) poisdat$data_train$cap <- rpois(NROW(poisdat$data_train), lambda = 5) + max(poisdat$data_train$y, na.rm = TRUE) expect_error( mvgam( formula = y ~ s(season), trend_formula = ~ s(season) + trend, trend_model = 'None', family = nmix(), data = poisdat$data_train, newdata = poisdat$data_test ), '"data" and "newdata" have different numbers of columns', fixed = TRUE ) poisdat$data_test$emu <- 50 expect_error( mvgam( formula = y ~ s(season), trend_formula = ~ s(season) + trend, trend_model = 'None', family = nmix(), data = poisdat$data_train, newdata = poisdat$data_test ), 'Max abundances must be supplied in test data as a variable named "cap" for N-mixture models', fixed = TRUE ) }) poisdat$data_train$cap <- rpois(NROW(poisdat$data_train), lambda = 5) + max(poisdat$data_train$y, na.rm = TRUE) poisdat$data_test$cap <- rpois(NROW(poisdat$data_test), lambda = 5) + max(poisdat$data_test$y, na.rm = TRUE) test_that("latent process intercept is allowed in nmixtures", { prior_df <- get_mvgam_priors( formula = y ~ s(season), trend_formula = ~ s(season) + trend, trend_model = 'None', family = nmix(), data = poisdat$data_train ) expect_true(any(grepl( '(Intercept)_trend', prior_df$param_name, fixed = TRUE ))) mod <- mvgam( formula = y ~ s(season), trend_formula = ~ s(season) + trend, trend_model = 'None', family = nmix(), data = poisdat$data_train, newdata = poisdat$data_test, priors = prior(std_normal(), class = '(Intercept)_trend'), run_model = FALSE ) expect_true(any(grepl('(Intercept)_trend', mod$model_file, fixed = TRUE))) expect_true(any(grepl( 'b_raw_trend[1] ~ std_normal();', mod$model_file, fixed = TRUE ))) # Can also test that 'cap' is properly included in model_data # The caps should be arranged by series and then by time train_cap = poisdat$data_train %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) test_cap = poisdat$data_test %>% dplyr::arrange(series, time) %>% dplyr::pull(cap) expect_true(all(mod$model_data$cap == c(train_cap, test_cap))) }) # Check that the model fits and post-processing works using the # example from the families man page test_that("nmix() post-processing works", { set.seed(0) data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat = testdat %>% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate) ) # series identifiers testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) # The trend_map to state how replicates are structured testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map # Fit a model mod <- SW(mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 300, residuals = FALSE, chains = 2, silent = 2 )) expect_no_error(capture_output(summary(mod))) expect_no_error(capture_output(plot(mod, type = 'pterms'))) expect_no_error(capture_output(plot( mod, type = 'pterms', trend_effects = TRUE ))) expect_no_error(capture_output(print(mod))) expect_true(inherits(hindcast(mod), 'mvgam_forecast')) expect_true(inherits(hindcast(mod, type = 'latent_N'), 'mvgam_forecast')) expect_true(inherits(hindcast(mod, type = 'detection'), 'mvgam_forecast')) preds <- predict(mod, summary = FALSE, type = 'response') expect_true(NCOL(preds) == NROW(testdat)) expect_true(all(preds >= 0L)) preds <- predict(mod, summary = FALSE, type = 'detection') expect_true(NCOL(preds) == NROW(testdat)) expect_true(all(preds <= 1L & preds >= 0L)) preds <- predict(mod, summary = FALSE, type = 'latent_N') expect_true(NCOL(preds) == NROW(testdat)) expect_true(all(preds >= 0L)) expect_no_error(plot(mod, type = 'smooths', trend_effects = TRUE)) expect_no_error(plot( mod, type = 'smooths', realisations = TRUE, trend_effects = TRUE )) expect_no_error(plot( mod, type = 'smooths', residuals = TRUE, trend_effects = TRUE )) options(mc.cores = 1) expect_loo(SW(loo(mod))) }) ================================================ FILE: tests/testthat/test-offset.R ================================================ context("offsets") skip_on_cran() test_that("offset incorporated into link-level linpred for beta", { beta_data$data_train$pop <- as.numeric(beta_data$data_train$series) + 0.5 beta_data$data_test$pop <- as.numeric(beta_data$data_test$series) + 0.5 testmod <- mvgam( y ~ s(season, bs = 'cc') + offset(pop) + s(series, bs = 're'), trend_model = 'GP', data = beta_data$data_train, family = betar(), run_model = FALSE ) stancode <- testmod$model_file # Offset should be recorded in the mgcv model expect_true(!is.null(attr(testmod$mgcv_model$terms, 'offset'))) # Offset should be in the linpred calculation expect_true(expect_match2(stancode, 'eta = X * b + off_set;')) # Offset should be inv_logit in the model declaration expect_true(expect_match2( stancode, '* append_row(b, 1.0) + off_set[obs_ind])' )) # Offset should be provided in 'data' expect_true(expect_match2(stancode, 'vector[total_obs] off_set;')) # Data for the offset vector should also be incorporated # in model_data expect_true(!is.null(testmod$model_data$off_set)) }) test_that("offset incorporated into link-level linpred for NB", { data = data.frame( out = rpois(100, lambda = 5), pop = rnorm(100), time = 1:100 ) testmod <- mvgam( out ~ 1 + offset(pop), trend_model = 'GP', data = data, family = nb(), run_model = FALSE ) stancode <- testmod$model_file # Offset should be recorded in the mgcv model expect_true(!is.null(attr(testmod$mgcv_model$terms, 'offset'))) # Offset should be in the linpred calculation expect_true(expect_match2(stancode, 'eta = X * b + off_set;')) # Offset should be exponentiated in the model declaration expect_true(expect_match2( stancode, '* append_row(b, 1.0) + off_set[obs_ind])' )) # Offset should be provided in 'data' expect_true(expect_match2(stancode, 'vector[total_obs] off_set;')) # Data for the offset vector should also be incorporated # in model_data expect_true(!is.null(testmod$model_data$off_set)) }) test_that("offset not allowed in trend_formula", { data = data.frame( out = rpois(100, lambda = 5), x = rnorm(100), pop = rnorm(100), time = 1:100 ) expect_error( mvgam( out ~ 1, trend_formula = ~ x + offset(pop), trend_model = 'AR1', data = data, family = nb(), run_model = FALSE ), 'Offsets not allowed in argument "trend_formula"' ) }) test_that("offset works when no intercept is provided", { simdat <- sim_mvgam() simdat$data_train$offset <- rep(log(100), NROW(simdat$data_train)) mod <- mvgam( formula = y ~ offset(offset) - 1, trend_formula = ~ s(season), trend_model = RW(), data = simdat$data_train, run_model = FALSE ) stancode <- mod$model_file # Offset should be recorded in the mgcv model expect_true(!is.null(attr(mod$mgcv_model$terms, 'offset'))) # Offset should be in the linpred calculation expect_true(expect_match2(stancode, 'eta = X * b + off_set;')) # Offset should be provided in 'data' expect_true(expect_match2(stancode, 'vector[total_obs] off_set;')) # Offset should be in model_data expect_true(!is.null(mod$model_data$off_set)) }) ================================================ FILE: tests/testthat/test-piecewise.R ================================================ context("piecewise") # Simulate data from a piecewise logistic trend ts <- mvgam:::piecewise_logistic( t = 1:100, cap = 8.5, deltas = extraDistr::rlaplace(10, 0, 0.025), k = 0.075, m = 0, changepoint_ts = sample(1:100, 10) ) y <- rnorm(100, ts, 0.75) # Don't put 'cap' variable in dataframe df <- data.frame( y = y, time = 1:100, series = as.factor('series1'), #cap = 8.75, fake = rnorm(100) ) test_that("logistic should error if cap is missing", { expect_error( mvgam( formula = y ~ 0, data = df, trend_model = PW(growth = 'logistic', n_changepoints = 10), # priors = prior(normal(2, 5), class = k_trend), family = gaussian(), run_model = TRUE, return_model_data = TRUE ), 'Capacities must be supplied as a variable named "cap" for logistic growth' ) }) # Now include some missing values in 'cap' df <- data.frame( y = y, time = 1:100, series = as.factor('series1'), cap = sample(c(8.75, NA), 100, TRUE), fake = rnorm(100) ) test_that("logistic should error if cap has NAs", { expect_error( mvgam( formula = y ~ 0, data = df, trend_model = PW(growth = 'logistic', n_changepoints = 10), priors = prior(normal(2, 5), class = k_trend), family = gaussian(), run_model = TRUE, return_model_data = TRUE ), 'Missing values found for some "cap" terms' ) }) # Missing values can also happen when transforming to the link scale y <- rpois(100, ts + 5) df <- data.frame( y = y, time = 1:100, series = as.factor('series1'), cap = -1, fake = rnorm(100) ) test_that("logistic should error if cap has NAs after link transformation", { expect_error( mvgam( formula = y ~ 0, data = df, trend_model = PW(growth = 'logistic', n_changepoints = 10), family = poisson(), run_model = TRUE, return_model_data = TRUE ), paste0( 'Missing or infinite values found for some "cap" terms\n', 'after transforming to the log link scale' ) ) }) # Make sure cap is in the right order y <- rpois(100, ts + 5) df <- rbind( data.frame( y = y, time = 1:100, series = as.factor('series1'), cap = y + 20, fake = rnorm(100) ), data.frame( y = y + 2, time = 1:100, series = as.factor('series2'), cap = y + 22, fake = rnorm(100) ) ) test_that("logistic caps should be included in the correct order", { skip_on_cran() mod <- mvgam( formula = y ~ 0, data = df, trend_model = PW(growth = 'logistic', n_changepoints = 10), family = poisson(), run_model = FALSE, return_model_data = TRUE ) # caps should now be logged and in a matrix [1:n_timepoints, 1:n_series] expect_true(all( mod$model_data$cap == log(cbind( df %>% dplyr::filter(series == 'series1') %>% dplyr::arrange(time) %>% dplyr::pull(cap), df %>% dplyr::filter(series == 'series2') %>% dplyr::arrange(time) %>% dplyr::pull(cap) )) )) # Should also work for list data df_list <- list( series = df$series, time = df$time, cap = df$cap, y = df$y, fake = df$fake ) mod <- mvgam( formula = y ~ 0, data = df_list, trend_model = PW(growth = 'logistic', n_changepoints = 10), family = poisson(), run_model = FALSE, return_model_data = TRUE ) # caps should now be logged and in a matrix [1:n_timepoints, 1:n_series] expect_true(all( mod$model_data$cap == log(cbind( df %>% dplyr::filter(series == 'series1') %>% dplyr::arrange(time) %>% dplyr::pull(cap), df %>% dplyr::filter(series == 'series2') %>% dplyr::arrange(time) %>% dplyr::pull(cap) )) )) }) test_that("piecewise models fit and forecast without error", { skip_on_cran() # Example of logistic growth with possible changepoints # Simple logistic growth model dNt = function(r, N, k) { r * N * (k - N) } # Iterate growth through time Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { # population at next time step is current population + growth, # but we introduce several 'shocks' as changepoints if (i %in% c(5, 15, 25, 41, 45, 60, 80)) { N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k)) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } # Simulate expected values set.seed(555) expected <- Nt(0.004, 2, 100, 30) # Take Poisson draws y <- rpois(100, expected) # Assemble data into dataframe and model. We set a # fixed carrying capacity of 35 for this example, but note that # this value is not required to be fixed at each timepoint mod_data <- data.frame( y = y, time = 1:100, cap = 35, series = as.factor('series_1') ) dat_train <- mod_data %>% dplyr::filter(time <= 90) dat_test <- mod_data %>% dplyr::filter(time > 90) # The intercept is nonidentifiable when using piecewise # trends because the trend functions have their own offset # parameters 'm'; it is recommended to always drop intercepts # when using these trend models mod <- mvgam( y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = dat_train, chains = 2, silent = 2 ) expect_no_error(capture_output(how_to_cite(mod))) # Compute and plot forecasts fc <- forecast(mod, newdata = dat_test, type = 'trend') expect_no_error(capture_output(plot(fc))) # Should also work for piecewise linear mod <- SW(mvgam( y ~ 0, trend_model = PW(growth = 'linear', n_changepoints = 5), family = poisson(), data = dat_train, chains = 2, silent = 2 )) # Compute and plot forecasts fc <- forecast(mod, newdata = dat_test, type = 'trend') expect_no_error(capture_output(plot(fc))) }) ================================================ FILE: tests/testthat/test-sim_mvgam.R ================================================ context("sim_mvgam") #### Test basic error and warning messages #### test_that("family must be correctly specified", { expect_error( sim_mvgam(family = 'bogan', trend_model = 'RW', trend_rel = 0.5), 'family not recognized' ) }) test_that("trend_model must be correctly specified", { expect_error(sim_mvgam( family = gaussian(), trend_model = 'AR4', trend_rel = 0.5 )) }) test_that("trend_rel must be a valid proportion", { expect_error( sim_mvgam(family = gaussian(), trend_model = 'AR2', trend_rel = -0.1), "Argument 'trend_rel' must be a proportion ranging from 0 to 1, inclusive" ) }) test_that("n_lv must be a positive integer", { expect_error( sim_mvgam( family = gaussian(), trend_model = 'AR2', trend_rel = 0.4, n_lv = 0.5 ), "Argument 'n_lv' must be a positive integer" ) }) test_that("run sim AR and VAR functions for memory checks", { expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = RW() ))) expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = AR(p = 1) ))) expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = AR(p = 2) ))) expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = AR(p = 3) ))) expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = VAR() ))) expect_no_error(capture_output(sim_mvgam( family = gaussian(), trend_model = VAR(cor = TRUE) ))) }) ================================================ FILE: tests/testthat/test-summary-structure.R ================================================ context("summary structure") test_that("summary.mvgam returns structured object", { # Test with internal example object summary_obj <- summary(mvgam:::mvgam_example1) # Check class expect_s3_class(summary_obj, "mvgam_summary") expect_s3_class(summary_obj, "list") # Check structure expect_named( summary_obj, c("model_spec", "parameters", "diagnostics", "sampling_info") ) # Check that each component is a list expect_type(summary_obj$model_spec, "list") expect_type(summary_obj$parameters, "list") expect_type(summary_obj$diagnostics, "list") expect_type(summary_obj$sampling_info, "list") # Check that model_spec has expected components expect_named( summary_obj$model_spec, c( "formulas", "family", "link", "trend_model", "upper_bounds", "latent_variables", "dimensions", "is_jsdgam" ) ) }) test_that("print.mvgam_summary works", { summary_obj <- summary(mvgam:::mvgam_example1) # Should print without error expect_no_error(capture_output(print(summary_obj))) # Should return object invisibly result <- capture_output(returned_obj <- print(summary_obj)) expect_identical(summary_obj, returned_obj) }) test_that("summary can be saved and reloaded", { summary_obj <- summary(mvgam:::mvgam_example1) # Save to temporary file temp_file <- tempfile(fileext = ".rds") saveRDS(summary_obj, temp_file) # Reload reloaded_obj <- readRDS(temp_file) # Should be identical expect_identical(summary_obj, reloaded_obj) # Should still print correctly expect_no_error(capture_output(print(reloaded_obj))) # Clean up unlink(temp_file) }) ================================================ FILE: tests/testthat/test-tidier_methods.R ================================================ context("tidier methods") # `tidy()` tests test_that("`tidy()` snapshot value of `mvgam_example1`", { local_edition(3) expect_snapshot_value(tidy.mvgam(mvgam_example1), style = "json2") }) test_that("`tidy()` snapshot value of `mvgam_example2`", { local_edition(3) expect_snapshot_value(tidy.mvgam(mvgam_example2), style = "json2") }) test_that("`tidy()` snapshot value of `mvgam_example3`", { local_edition(3) expect_snapshot_value(tidy.mvgam(mvgam_example3), style = "json2") }) test_that("`tidy()` snapshot value of `mvgam_example4`", { local_edition(3) expect_snapshot_value(tidy.mvgam(mvgam_example4), style = "json2") }) test_that("`tidy()` snapshot value of `mvgam_example6`", { local_edition(3) testthat::skip_on_cran() # Hierarchical dynamics example adapted from RW documentation example. # The difference is that this uses 4 species rather than 3. simdat1 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 4, mu = c(1, 2, 3, 4) ) simdat2 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 4, mu = c(1, 2, 3, 4) ) simdat3 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 4, mu = c(1, 2, 3, 4) ) simdat_all <- rbind( simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic') ) %>% dplyr::mutate( species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region) ) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) mvgam_example6 <- suppressWarnings(mvgam( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = simdat_all, silent = 2 )) tidyout = tidy.mvgam(mvgam_example6) expect_equal(dim(tidyout), c(65, 7)) expect_equal( colnames(tidyout), c("parameter", "type", "mean", "sd", "2.5%", "50%", "97.5%") ) expect_snapshot_value(tidyout[c("parameter", "type")], style = "json2") }) # `augment()` tests test_that("augment doesn't error", { expect_no_error(augment(mvgam:::mvgam_example1)) expect_no_error(augment(mvgam:::mvgam_example4)) }) test_that("augment returns correct types", { out1 <- augment(mvgam:::mvgam_example1) out4 <- augment(mvgam:::mvgam_example4) expect_equal(class(out1)[[1]], "tbl_df") expect_equal(class(out4), "list") # Lengths of augment output and of obs data should be equal expect_equal(NROW(out1), NROW(mvgam:::mvgam_example1$obs_data)) expect_equal(length(out4$y), length(mvgam:::mvgam_example4$obs_data$y)) # NAs in obs data should equal NAs in residuals expect_true(all( which(is.na(mvgam:::mvgam_example1$obs_data$y)) %in% which(is.na(out1$.resid)) )) }) ================================================ FILE: tests/testthat/test-update.R ================================================ context("update.mvgam") skip_on_cran() test_that("update() working correctly", { # Can update trend_model mod <- update( mvgam:::mvgam_example1, trend_model = AR(p = 2), control = list(max_treedepth = 11), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(attr(mod$model_data, 'trend_model') == 'AR2') expect_true(any(grepl( 'ar2~std_normal();', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(mod$call == mvgam:::mvgam_example1$call) # Update trend_model and formula mod <- update( mvgam:::mvgam_example1, formula = y ~ s(season, k = 6) - 1, trend_model = PW(), run_model = FALSE ) expect_true(inherits(mod, 'mvgam_prefit')) expect_true(attr(mod$model_data, 'trend_model') == 'PWlinear') expect_true(any(grepl( 'to_vector(delta_trend)~double_exponential(0,changepoint_scale);', gsub(' ', '', mod$model_file), fixed = TRUE ))) expect_true(mod$call != mvgam:::mvgam_example1$call) expect_true(rlang::f_rhs(mod$call) == 's(season, k = 6) - 1') # Errors should pass from mvgam() expect_error( update( mvgam:::mvgam_example1, formula = y ~ s(season, k = 6) - 1, trend_model = PW(growth = 'logistic'), run_model = FALSE ), 'Capacities must be supplied as a variable named "cap" for logistic growth' ) # Update to include shared observation params mod <- update( mvgam:::mvgam_example1, share_obs_params = TRUE, run_model = FALSE ) expect_true(any(grepl( 'realsigma_obs;', gsub(' ', '', mod$model_file), fixed = TRUE ))) }) test_that("update() passes original knots correctly", { # Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- SM(mvgam( y ~ s(season, bs = 'cc'), knots = list(season = c(0.5, 12.5)), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 )) expect_true(identical( attr(mod$mgcv_model, 'knots'), list(season = c(0.5, 12.5)) )) # Update to an AR2 model updated_mod <- update( mod, trend_model = AR(p = 2), noncentred = TRUE, run_model = FALSE ) expect_true(identical( attr(updated_mod$mgcv_model, 'knots'), list(season = c(0.5, 12.5)) )) }) ================================================ FILE: tests/testthat.R ================================================ library(testthat) library(mvgam) test_check("mvgam") ================================================ FILE: vignettes/data_in_mvgam.Rmd ================================================ --- title: "Formatting data for use in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Formatting data for use in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how to take raw data and format it for use in `mvgam`. This is not an exhaustive example, as data can be recorded and stored in a variety of ways, which requires different approaches to wrangle the data into the necessary format for `mvgam`. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html) and [the growing set of walk through video tutorials on `mvgam` applications](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3&si=lyg7qUrMLbD-tHCB). ## Required *tidy* data format Manipulating the data into a 'long' format (i.e. *tidy* format) is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to pass as data for to the two primary modelling functions, `mvgam()` and `jsdgam()`. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} simdat <- sim_mvgam( n_series = 4, T = 24, prop_missing = 0.2 ) head(simdat$data_train, 16) ``` ### `series` as a `factor` variable Notice how we have four different time series in these simulated data, and we have identified the series-level indicator as a `factor` variable. ```{r} class(simdat$data_train$series) levels(simdat$data_train$series) ``` It is important that the number of levels matches the number of unique series in the data to ensure indexing across series works properly in the underlying modelling functions. Several of the main workhorse functions in the package (including `mvgam()` and `get_mvgam_priors()`) will give an error if this is not the case, but it may be worth checking anyway: ```{r} all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series)) ``` Note that you can technically supply data that does not have a `series` indicator, and the package will generally assume that you are only using a single time series. There are exceptions to this, for example if you have grouped data and would like to estimate hierarchical dependencies (see an example of hierarchical process error correlations in the `?AR` documentation) or if you would like to set up a Joint Species Distribution Model (JSDM) using a Zero-Mean Multivariate Gaussian distribution for the latent residuals (see examples in the `?ZMVN` documentation). ### A single outcome variable You may also have notices that we do not spread the `numeric / integer`-classed outcome variable into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data (though the outcome does not have to be labelled `y`). This is another important requirement in `mvgam`, but it shouldn't be too unfamiliar to `R` users who frequently use modelling packages such as `lme4`, `mgcv`, `brms` or the many other regression modelling packages out there. The advantage of this format is that it is now very easy to specify effects that vary among time series: ```{r} summary(glm( y ~ series + time, data = simdat$data_train, family = poisson() )) ``` ```{r} summary(mgcv::gam( y ~ series + s(time, by = series), data = simdat$data_train, family = poisson() )) ``` Depending on the observation families you plan to use when building models, there may be some restrictions that need to be satisfied within the outcome variable. For example, a Beta regression can only handle proportional data, so values `>= 1` or `<= 0` are not allowed. Likewise, a Poisson regression can only handle non-negative integers. Most regression functions in `R` will assume the user knows all of this and so will not issue any warnings or errors if you choose the wrong distribution, but often this ends up leading to some unhelpful error from an optimizer that is difficult to interpret and diagnose. `mvgam` will attempt to provide some errors if you do something that is simply not allowed. For example, we can simulate data from a zero-centred Gaussian distribution (ensuring that some of our values will be `< 1`) and attempt a Beta regression in `mvgam` using the `betar` family: ```{r} gauss_dat <- data.frame( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) gauss_dat ``` A call to `gam()` using the `mgcv` package leads to a model that actually fits (though it does give an unhelpful warning message): ```{r} mgcv::gam(outcome ~ time, family = betar(), data = gauss_dat ) ``` But the same call to `mvgam()` gives us something more useful: ```{r error=TRUE} mvgam(outcome ~ time, family = betar(), data = gauss_dat ) ``` Please see `?mvgam_families` for more information on the types of responses that the package can handle and their restrictions ### A `time` variable The other requirement for most models that can be fit in `mvgam` is a `numeric / integer`-classed variable labelled `time`. This ensures the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models. If you plan to use any of the autoregressive dynamic trend functions available in `mvgam` (see `?mvgam_trends` for details of available dynamic processes), you will need to ensure your time series are entered with a fixed sampling interval (i.e. the time between timesteps 1 and 2 should be the same as the time between timesteps 2 and 3, etc...). But note that you can have missing observations for some (or all) series. `mvgam()` will check this for you, but again it is useful to ensure you have no missing timepoint x series combinations in your data. You can generally do this with a simple `dplyr` call: ```{r} # A function to ensure all timepoints within a sequence are identical all_times_avail <- function(time, min_time, max_time) { identical( as.numeric(sort(time)), as.numeric(seq.int(from = min_time, to = max_time)) ) } # Get min and max times from the data min_time <- min(simdat$data_train$time) max_time <- max(simdat$data_train$time) # Check that all times are recorded for each series data.frame( series = simdat$data_train$series, time = simdat$data_train$time ) %>% dplyr::group_by(series) %>% dplyr::summarise(all_there = all_times_avail( time, min_time, max_time )) -> checked_times if (any(checked_times$all_there == FALSE)) { warning("One or more series in is missing observations for one or more timepoints") } else { cat("All series have observations at all timepoints :)") } ``` Note that models which use dynamic components will assume that smaller values of `time` are *older* (i.e. `time = 1` came *before* `time = 2`, etc...) ### Irregular sampling intervals? Most `mvgam` dynamic trend models expect `time` to be measured in discrete, evenly-spaced intervals (i.e. one measurement per week, or one per year, for example; though missing values are allowed). But please note that irregularly sampled time intervals are allowed, in which case the `CAR()` trend model (continuous time autoregressive) is appropriate. You can see an example of this kind of model in the **Examples** section in `?CAR`. You can also use `trend_model = 'None'` (the default in `mvgam()`) and instead use a Gaussian Process to model temporal variation for irregularly-sampled time series. See the `?brms::gp` for details. But to reiterate the point from above, if you do not have time series data (or don't want to estimate latent temporal dynamics) but you would like to estimate correlated latent residuals among multivariate outcomes, you can set up models that use `trend_model = ZMVN(...)` without the need for a `time` variable (see `?ZMVN` for details). ## Checking data with `get_mvgam_priors()` The `get_mvgam_priors()` function is designed to return information about the parameters in a model whose prior distributions can be modified by the user. But in doing so, it will perform a series of checks to ensure the data are formatted properly. It can therefore be very useful to new users for ensuring there isn't anything strange going on in the data setup. For example, we can replicate the steps taken above (to check factor levels and timepoint x series combinations) with a single call to `get_mvgam_priors()`. Here we first simulate some data in which some of the timepoints in the `time` variable are not included in the data: ```{r} bad_times <- data.frame( time = seq(1, 16, by = 2), series = factor("series_1"), outcome = rnorm(8) ) bad_times ``` Next we call `get_mvgam_priors()` by simply specifying an intercept-only model, which is enough to trigger all the checks: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_times, family = gaussian() ) ``` This error is useful as it tells us where the problem is. There are many ways to fill in missing timepoints, so the correct way will have to be left up to the user. But if you don't have any covariates, it should be pretty easy using `expand.grid()`: ```{r} bad_times %>% dplyr::right_join(expand.grid( time = seq( min(bad_times$time), max(bad_times$time) ), series = factor(unique(bad_times$series), levels = levels(bad_times$series) ) )) %>% dplyr::arrange(time) -> good_times good_times ``` Now the call to `get_mvgam_priors()`, using our filled in data, should work: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = good_times, family = gaussian() ) ``` This function should also pick up on misaligned factor levels for the `series` variable. We can check this by again simulating, this time adding an additional factor level that is not included in the data: ```{r} bad_levels <- data.frame( time = 1:8, series = factor("series_1", levels = c( "series_1", "series_2" ) ), outcome = rnorm(8) ) levels(bad_levels$series) ``` Another call to `get_mvgam_priors()` brings up a useful error: ```{r error = TRUE} get_mvgam_priors(outcome ~ 1, data = bad_levels, family = gaussian() ) ``` Following the message's advice tells us there is a level for `series_2` in the `series` variable, but there are no observations for this series in the data: ```{r} setdiff(levels(bad_levels$series), unique(bad_levels$series)) ``` Re-assigning the levels fixes the issue: ```{r} bad_levels %>% dplyr::mutate(series = droplevels(series)) -> good_levels levels(good_levels$series) ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ 1, data = good_levels, family = gaussian() ) ``` ### Covariates with no `NA`s Covariates can be used in models just as you would when using `mgcv` (see `?formula.gam` for details of the formula syntax). But although the outcome variable can have `NA`s, covariates cannot. Most regression software will silently drop any raws in the model matrix that have `NA`s, which is not helpful when debugging. Both the `mvgam()` and `get_mvgam_priors()` functions will run some simple checks for you, and hopefully will return useful errors if it finds in missing values: ```{r} miss_dat <- data.frame( outcome = rnorm(10), cov = c(NA, rnorm(9)), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat ``` ```{r error = TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` Just like with the `mgcv` package, `mvgam` can also accept data as a `list` object. This is useful if you want to set up linear functional predictors or even distributed lag predictors. The checks run by `mvgam` should still work on these data. Here we change the `cov` predictor to be a `matrix`: ```{r} miss_dat <- list( outcome = rnorm(10), series = factor("series1", levels = "series1" ), time = 1:10 ) miss_dat$cov <- matrix(rnorm(50), ncol = 5, nrow = 10) miss_dat$cov[2, 3] <- NA ``` A call to `get_mvgam_priors()` returns the same error: ```{r error=TRUE} get_mvgam_priors( outcome ~ cov, data = miss_dat, family = gaussian() ) ``` ## Plotting with `plot_mvgam_series()` Plotting the data is a useful way to ensure everything looks ok, once you've gone throug the above checks on factor levels and timepoint x series combinations. The `plot_mvgam_series()` function will take supplied data and plot either a series of line plots (if you choose `series = 'all'`) or a set of plots to describe the distribution for a single time series. For example, to plot all of the time series in our data, and highlight a single series in each plot, we can use: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = "all" ) ``` Or we can look more closely at the distribution for the first time series: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, y = "y", series = 1 ) ``` If you have split your data into training and testing folds (i.e. for forecast evaluation), you can include the test data in your plots: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, y = "y", series = 1 ) ``` ## Example with NEON tick data To give one example of how data can be reformatted for `mvgam` modelling, we will use observations from the National Ecological Observatory Network (NEON) tick drag cloth samples. *Ixodes scapularis* is a widespread tick species capable of transmitting a diversity of parasites to animals and humans, many of which are zoonotic. Due to the medical and ecological importance of this tick species, a common goal is to understand factors that influence their abundances. The NEON field team carries out standardised [long-term monitoring of tick abundances as well as other important indicators of ecological change](https://www.neonscience.org/data-collection/ticks){target="_blank"}. Nymphal abundance of *I. scapularis* is routinely recorded across NEON plots using a field sampling method called drag cloth sampling, which is a common method for sampling ticks in the landscape. Field researchers sample ticks by dragging a large cloth behind themselves through terrain that is suspected of harboring ticks, usually working in a grid-like pattern. The sites have been sampled since 2014, resulting in a rich dataset of nymph abundance time series. These tick time series show strong seasonality and incorporate many of the challenging features associated with ecological data including overdispersion, high proportions of missingness and irregular sampling in time, making them useful for exploring the utility of dynamic GAMs. We begin by loading NEON tick data for the years 2014 - 2021, which were downloaded from NEON and prepared as described in [Clark & Wells 2022](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13974){target="_blank"}. You can read a bit about the data using the call `?all_neon_tick_data` ```{r} data("all_neon_tick_data") str(dplyr::ungroup(all_neon_tick_data)) ``` For this exercise, we will use the `epiWeek` variable as an index of seasonality, and we will only work with observations from a few sampling plots (labelled in the `plotID` column): ```{r} plotIDs <- c( "SCBI_013", "SCBI_002", "SERC_001", "SERC_005", "SERC_006", "SERC_012", "BLAN_012", "BLAN_005" ) ``` Now we can select the target species we want (*I. scapularis*), filter to the correct plot IDs and convert the `epiWeek` variable from `character` to `numeric`: ```{r} model_dat <- all_neon_tick_data %>% dplyr::ungroup() %>% dplyr::mutate(target = ixodes_scapularis) %>% dplyr::filter(plotID %in% plotIDs) %>% dplyr::select(Year, epiWeek, plotID, target) %>% dplyr::mutate(epiWeek = as.numeric(epiWeek)) ``` Now is the tricky part: we need to fill in missing observations with `NA`s. The tick data are sparse in that field observers do not go out and sample in each possible `epiWeek`. So there are many particular weeks in which observations are not included in the data. But we can use `expand.grid()` again to take care of this: ```{r} model_dat %>% # Create all possible combos of plotID, Year and epiWeek; # missing outcomes will be filled in as NA dplyr::full_join(expand.grid( plotID = unique(model_dat$plotID), Year = unique(model_dat$Year), epiWeek = seq(1, 52) )) %>% # left_join back to original data so plotID and siteID will # match up, in case you need the siteID for anything else later on dplyr::left_join(all_neon_tick_data %>% dplyr::select(siteID, plotID) %>% dplyr::distinct()) -> model_dat ``` Create the `series` variable needed for `mvgam` modelling: ```{r} model_dat %>% dplyr::mutate( series = plotID, y = target ) %>% dplyr::mutate( siteID = factor(siteID), series = factor(series) ) %>% dplyr::select(-target, -plotID) %>% dplyr::arrange(Year, epiWeek, series) -> model_dat ``` Now create the `time` variable, which needs to track `Year` and `epiWeek` for each unique series. The `n` function from `dplyr` is often useful if generating a `time` index for grouped dataframes: ```{r} model_dat %>% dplyr::ungroup() %>% dplyr::group_by(series) %>% dplyr::arrange(Year, epiWeek) %>% dplyr::mutate(time = seq(1, dplyr::n())) %>% dplyr::ungroup() -> model_dat ``` Check factor levels for the `series`: ```{r} levels(model_dat$series) ``` This looks good, as does a more rigorous check using `get_mvgam_priors()`: ```{r error=TRUE} get_mvgam_priors( y ~ 1, data = model_dat, family = poisson() ) ``` We can also set up a model in `mvgam()` but use `run_model = FALSE` to further ensure all of the necessary steps for creating the modelling code and objects will run. It is recommended that you use the `cmdstanr` backend if possible, as the auto-formatting options available in this package are very useful for checking the package-generated `Stan` code for any inefficiencies that can be fixed to lead to sampling performance improvements: ```{r} testmod <- mvgam( y ~ s(epiWeek, by = series, bs = "cc") + s(series, bs = "re"), trend_model = AR(), data = model_dat, backend = "cmdstanr", run_model = FALSE ) ``` This call runs without issue, and the resulting object now contains the model code and data objects that are needed to initiate sampling: ```{r} str(testmod$model_data) ``` ```{r} stancode(testmod) ``` ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/forecast_evaluation.Rmd ================================================ --- title: "Forecasting and forecast evaluation in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Forecasting and forecast evaluation in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to produce probabilistic forecasts and to evaluate those forecasts using a variety of proper scoring rules. ## Simulating discrete time series We begin by simulating some data to show how forecasts are computed and evaluated in `mvgam`. The `sim_mvgam()` function can be used to simulate series that come from a variety of response distributions as well as seasonal patterns and/or dynamic temporal patterns. Here we simulate a collection of three time count-valued series. These series all share the same seasonal pattern but have different temporal dynamics. By setting `trend_model = GP()` and `prop_trend = 0.75`, we are generating time series that have smooth underlying temporal trends (evolving as Gaussian Processes with squared exponential kernel) and moderate seasonal patterns. The observations are Poisson-distributed and we allow 10% of observations to be missing. ```{r} set.seed(1) simdat <- sim_mvgam( T = 100, n_series = 3, mu = 2, trend_model = GP(), prop_trend = 0.75, family = poisson(), prop_missing = 0.10 ) ``` The returned object is a `list` containing training and testing data (`sim_mvgam()` automatically splits the data into these folds for us) together with some other information about the data generating process that was used to simulate the data ```{r} str(simdat) ``` Each series in this case has a shared seasonal pattern. The resulting time series are similar to what we might encounter when dealing with count-valued data that can take small counts: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, series = "all" ) ``` For individual series, we can plot the training and testing data, as well as some more specific features of the observed data: ```{r, fig.alt = "Plotting time series features for GAM models in mvgam"} plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test, series = 1 ) ``` ### Modelling dynamics with splines The first model we will fit uses a shared cyclic spline to capture the repeated seasonality, as well as series-specific splines of time to capture the long-term dynamics. We allow the temporal splines to be fairly complex so they can capture as much of the temporal variation as possible: ```{r include=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, newdata = simdat$data_test ) ``` ```{r eval=FALSE} mod1 <- mvgam( y ~ s(season, bs = "cc", k = 8) + s(time, by = series, bs = "cr", k = 20), knots = list(season = c(0.5, 12.5)), trend_model = "None", data = simdat$data_train, silent = 2 ) ``` The model fits without issue: ```{r} summary(mod1, include_betas = FALSE) ``` And we can plot the conditional effects of the splines (on the link scale) to see that they are estimated to be highly nonlinear ```{r, fig.alt = "Plotting GAM smooth functions using mvgam"} conditional_effects(mod1, type = "link") ``` ### Modelling dynamics with a correlated AR1 Before showing how to produce and evaluate forecasts, we will fit a second model to these data so the two models can be compared. This model is equivalent to the above, except we now use a correlated AR(1) process to model series-specific dynamics. See `?AR` for more details. ```{r include=FALSE, message=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, silent = 1 ) ``` The summary for this model now contains information on the autoregressive and process error parameters for each time series: ```{r} summary(mod2, include_betas = FALSE) ``` We can plot the posteriors for these parameters, and for any other parameter for that matter, using `bayesplot` routines. First the autoregressive parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "ar", regex = TRUE, type = "areas") ``` And now the variance ($\sigma$) parameters: ```{r, fig.alt = "Summarising latent Gaussian Process parameters in mvgam"} mcmc_plot(mod2, variable = "sigma", regex = TRUE, type = "areas") ``` We can again plot the conditional seasonal effect: ```{r, fig.alt = "Plotting latent Gaussian Process effects in mvgam and marginaleffects"} conditional_effects(mod2, type = "link") ``` The estimates for the seasonal component are fairly similar for the two models, but below we will see if they produce similar forecasts ## Forecasting with the `forecast()` function Probabilistic forecasts can be computed in two main ways in `mvgam`. The first is to take a model that was fit only to training data (as we did above in the two example models) and produce temporal predictions from the posterior predictive distribution by feeding `newdata` to the `forecast()` function. It is crucial that any `newdata` fed to the `forecast()` function follows on sequentially from the data that was used to fit the model (this is not internally checked by the package because it might be a headache to do so when data are not supplied in a specific time-order). When calling the `forecast()` function, you have the option to generate different kinds of predictions (i.e. predicting on the link scale, response scale or to produce expectations; see `?forecast.mvgam` for details). We will use the default and produce forecasts on the response scale, which is the most common way to evaluate forecast distributions ```{r} fc_mod1 <- forecast(mod1, newdata = simdat$data_test) fc_mod2 <- forecast(mod2, newdata = simdat$data_test) ``` The objects we have created are of class `mvgam_forecast`, which contain information on hindcast distributions, forecast distributions and true observations for each series in the data: ```{r} str(fc_mod1) ``` We can plot the forecasts for some series from each model using the `S3 plot` method for objects of this class: ```{r} plot(fc_mod1, series = 1) plot(fc_mod2, series = 1) plot(fc_mod1, series = 2) plot(fc_mod2, series = 2) ``` Clearly the two models do not produce equivalent forecasts. We will come back to scoring these forecasts in a moment. ## Forecasting with `newdata` in `mvgam()` The second way we can produce forecasts in `mvgam` is to feed the testing data directly to the `mvgam()` function as `newdata`. This will include the testing data as missing observations so that they are automatically predicted from the posterior predictive distribution using the `generated quantities` block in `Stan`. As an example, we can refit `mod2` but include the testing data for automatic forecasts: ```{r include=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` ```{r eval=FALSE} mod2 <- mvgam(y ~ 1, trend_formula = ~ s(season, bs = "cc", k = 8) - 1, trend_knots = list(season = c(0.5, 12.5)), trend_model = AR(cor = TRUE), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, silent = 2 ) ``` Because the model already contains a forecast distribution, we do not need to feed `newdata` to the `forecast()` function: ```{r} fc_mod2 <- forecast(mod2) ``` The forecasts will be nearly identical to those calculated previously: ```{r warning=FALSE, fig.alt = "Plotting posterior forecast distributions using mvgam and R"} plot(fc_mod2, series = 1) ``` ## Scoring forecast distributions A primary purpose of the `mvgam_forecast` class is to readily allow forecast evaluations for each series in the data, using a variety of possible scoring functions. See `?mvgam::score.mvgam_forecast` to view the types of scores that are available. A useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution. ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps") str(crps_mod1) crps_mod1$series_1 ``` The returned list contains a `data.frame` for each series in the data that shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the `in_interval` column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval: ```{r warning=FALSE} crps_mod1 <- score(fc_mod1, score = "crps", interval_width = 0.6) crps_mod1$series_1 ``` We can also compare forecasts against out of sample observations using the [Expected Log Predictive Density (ELPD; also known as the log score)](https://link.springer.com/article/10.1007/s11222-016-9696-4){target="_blank"}. The ELPD is a strictly proper scoring rule that can be applied to any distributional forecast, but to compute it we need predictions on the link scale rather than on the outcome scale. This is where it is advantageous to change the type of prediction we can get using the `forecast()` function: ```{r} link_mod1 <- forecast(mod1, newdata = simdat$data_test, type = "link") score(link_mod1, score = "elpd")$series_1 ``` Finally, when we have multiple time series it may also make sense to use a multivariate proper scoring rule. `mvgam` offers two such options: the Energy score and the Variogram score. The first penalizes forecast distributions that are less well calibrated against the truth, while the second penalizes forecasts that do not capture the observed true correlation structure. Which score to use depends on your goals, but both are very easy to compute: ```{r} energy_mod2 <- score(fc_mod2, score = "energy") str(energy_mod2) ``` The returned object still provides information on interval coverage for each individual series, but there is only a single score per horizon now (which is provided in the `all_series` slot): ```{r} energy_mod2$all_series ``` You can use your score(s) of choice to compare different models. For example, we can compute and plot the difference in CRPS scores for each series in data. Here, a negative value means the AR(1) model (`mod2`) is better, while a positive value means the spline model (`mod1`) is better. ```{r} crps_mod1 <- score(fc_mod1, score = "crps") crps_mod2 <- score(fc_mod2, score = "crps") diff_scores <- crps_mod2$series_1$score - crps_mod1$series_1$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_2$score - crps_mod1$series_2$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) diff_scores <- crps_mod2$series_3$score - crps_mod1$series_3$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(CRPS[AR1] ~ -~ CRPS[spline]) ) abline(h = 0, lty = "dashed", lwd = 2) ar1_better <- length(which(diff_scores < 0)) title(main = paste0( "AR(1) better in ", ar1_better, " of ", length(diff_scores), " evaluations", "\nMean difference = ", round(mean(diff_scores, na.rm = TRUE), 2) )) ``` The correlated AR(1) model consistently gives better forecasts, and the difference between scores tends to grow as the forecast horizon increases. This is not unexpected given the way that splines linearly extrapolate outside the range of training data ## Further reading The following papers and resources offer useful material about Bayesian forecasting and proper scoring rules: Clark N.J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ* 13:e18929 (2025) https://doi.org/10.7717/peerj.18929 Hyndman, Rob J., and George Athanasopoulos. [Forecasting: principles and practice](https://otexts.com/fpp3/distaccuracy.html). *OTexts*, (2018). Gneiting, Tilmann, and Adrian E. Raftery. [Strictly proper scoring rules, prediction, and estimation](https://www.tandfonline.com/doi/abs/10.1198/016214506000001437) *Journal of the American statistical Association* 102.477 (2007) 359-378. Simonis, Juniper L., et al. [Evaluating probabilistic ecological forecasts](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecy.3431) *Ecology* 102.8 (2021) e03431. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/mvgam_overview.Rmd ================================================ --- title: "Overview of the mvgam package" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Overview of the mvgam package} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to give a general overview of the `mvgam` package and its primary functions. ## Dynamic GAMs `mvgam` is designed to propagate unobserved temporal processes to capture latent dynamics in the observed time series. This works in a state-space format, with the temporal *trend* evolving independently of the observation process. An introduction to the package and some worked examples are also shown in this seminar: [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"}. Briefly, assume $\tilde{\boldsymbol{y}}_{i,t}$ is the conditional expectation of response variable $\boldsymbol{i}$ at time $\boldsymbol{t}$. Assuming $\boldsymbol{y_i}$ is drawn from an exponential distribution with an invertible link function, the linear predictor for a multivariate Dynamic GAM can be written as: $$for~i~in~1:N_{series}~...$$ $$for~t~in~1:N_{timepoints}~...$$ $$g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,$$ Here $\alpha$ are the unknown intercepts, the $\boldsymbol{s}$'s are unknown smooth functions of covariates ($\boldsymbol{x}$'s), which can potentially vary among the response series, and $\boldsymbol{z}$ are dynamic latent processes. Each smooth function $\boldsymbol{s_j}$ is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between $\boldsymbol{x}_{j}$ and $g^{-1}(\tilde{\boldsymbol{y}})$. The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. For more information on GAMs and how they can smooth through data, see [this blogpost on how to interpret nonlinear effects from Generalized Additive Models](https://ecogambler.netlify.app/blog/interpreting-gams/){target="_blank"}. Latent processes are captured with $\boldsymbol{Z}\boldsymbol{z}_{i,t}$, where $\boldsymbol{Z}$ is an $i~by~k$ matrix of loading coefficients (which can be fixed or a combination of fixed and freely estimated parameters) and $\boldsymbol{z}_{k,t}$ are a set of $K$ latent factors that can also include their own GAM linear predictors (see the [State-Space models vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html)), the [N-mixtures vignette](https://nicholasjclark.github.io/mvgam/articles/nmixtures.html) and the example in [`jsdgam`](https://nicholasjclark.github.io/mvgam/reference/jsdgam.html) to get an idea of how flexible these processes can be. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Gamma) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. `mvgam` supports a number of different observation families, which are summarized below: ## Supported observation families |Distribution | Function | Support | Extra parameter(s) | |:----------------:|:---------------:| :------------------------------------------------:|:--------------------:| |Gaussian (identity link) | `gaussian()` | Real values in $(-\infty, \infty)$ | $\sigma$ | |Student's T (identity link) | `student-t()` | Heavy-tailed real values in $(-\infty, \infty)$ | $\sigma$, $\nu$ | |LogNormal (identity link) | `lognormal()` | Positive real values in $[0, \infty)$ | $\sigma$ | |Gamma (log link) | `Gamma()` | Positive real values in $[0, \infty)$ | $\alpha$ | |Beta (logit link) | `betar()` | Real values (proportional) in $[0,1]$ | $\phi$ | |Bernoulli (logit link) | `bernoulli()` | Binary data in ${0,1}$ | - | |Poisson (log link) | `poisson()` | Non-negative integers in $(0,1,2,...)$ | - | |Negative Binomial2 (log link)| `nb()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Binomial (logit link) | `binomial()` | Non-negative integers in $(0,1,2,...)$ | - | |Beta-Binomial (logit link) | `beta_binomial()` | Non-negative integers in $(0,1,2,...)$ | $\phi$ | |Poisson Binomial N-mixture (log link)| `nmix()` | Non-negative integers in $(0,1,2,...)$ | - | For all supported observation families, any extra parameters that need to be estimated (i.e. the $\sigma$ in a Gaussian model or the $\phi$ in a Negative Binomial model) are by default estimated independently for each series. However, users can opt to force all series to share extra observation parameters using `share_obs_params = TRUE` in `mvgam()`. Note that default link functions cannot currently be changed. ## Supported temporal dynamic processes As stated above, the latent processes can take a wide variety of forms, some of which can be multivariate to allow the different observational variables to interact or be correlated. When using the `mvgam()` function, the user chooses between different process models with the `trend_model` argument. Available process models are described in detail below. ### Correlated multivariate processes If more than one observational unit (usually referred to as 'series') is included in `data` $(N_{series} > 1)$, use `trend_model = ZMVN()` to set up a model where the outcomes for different observational units may be correlated according to: \begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*} The covariance matrix $\Sigma$ will capture potentially correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Note that this `trend_model` does not assume that measurements occur over *time*, as users can specify what variable in the `data` represents the unit of analysis (i.e. outcomes could be counts of different *species* across different *sites* or *regions*, for example; see [`?ZMVN()](https://nicholasjclark.github.io/mvgam/reference/ZMVN.html) for guidelines). ### Independent Random Walks Use `trend_model = 'RW'` or `trend_model = RW()` to set up a model where each series in `data` has independent latent temporal dynamics of the form: \begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*} Process error parameters $\sigma$ are modeled independently for each series. If a moving average process is required, use `trend_model = RW(ma = TRUE)` to set up the following: \begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*} Moving average coefficients $\theta$ are independently estimated for each series and will be forced to be stationary by default $(abs(\theta)<1)$. Only moving averages of order $q=1$ are currently allowed. ### Multivariate Random Walks If more than one series is included in `data` $(N_{series} > 1)$, a multivariate Random Walk can be set up using `trend_model = RW(cor = TRUE)`, resulting in the following: \begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*} Where the latent process estimate $z_t$ now takes the form of a vector. The covariance matrix $\Sigma$ will capture contemporaneously correlated process errors. It is parameterised using a Cholesky factorization, which requires priors on the series-level variances $\sigma$ and on the strength of correlations using `Stan`'s `lkj_corr_cholesky` distribution. Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients $\theta$ will be parameterised as an $N_{series} * N_{series}$ matrix ### Autoregressive processes Autoregressive models up to $p=3$, in which the autoregressive coefficients are estimated independently for each series, can be used by specifying `trend_model = 'AR1'`, `trend_model = 'AR2'`, `trend_model = 'AR3'`, or `trend_model = AR(p = 1, 2, or 3)`. For example, a univariate AR(1) model takes the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*} All options are the same as for Random Walks, but additional options will be available for placing priors on the autoregressive coefficients. By default, these coefficients will not be forced into stationarity, but users can impose this restriction by changing the upper and lower bounds on their priors. See `?get_mvgam_priors` for more details. ### Vector Autoregressive processes A Vector Autoregression of order $p=1$ can be specified if $N_{series} > 1$ using `trend_model = 'VAR1'` or `trend_model = VAR()`. A VAR(1) model takes the form: \begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*} Where $A$ is an $N_{series} * N_{series}$ matrix of autoregressive coefficients in which the diagonals capture lagged self-dependence (i.e. the effect of a process at time $t$ on its own estimate at time $t+1$), while off-diagonals capture lagged cross-dependence (i.e. the effect of a process at time $t$ on the process for another series at time $t+1$). By default, the covariance matrix $\Sigma$ will assume no process error covariance by fixing the off-diagonals to $0$. To allow for correlated errors, use `trend_model = 'VAR1cor'` or `trend_model = VAR(cor = TRUE)`. A moving average of order $q=1$ can also be included using `trend_model = VAR(ma = TRUE, cor = TRUE)`. Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in [Heaps 2022](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648) Heaps, Sarah E. "[Enforcing stationarity through the prior in vector autoregressions.](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648)" *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. ### Hierarchical processes Several of the above-mentioned `trend_model` options can be modified to account for grouping structures in `data` by setting up hierarchical latent processes. If an optional grouping variable (`gr`; which must be a `factor` in the supplied `data`) exists, users can model hierarchical residual correlation structures. where the residual correlations for a specific level of `gr` are modelled hierarchically: \begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*} where $\Omega_{global}$ is a *global* correlation matrix, $\Omega_{group, local}$ is a *local deviation* correlation matrix and $\alpha_{cor}$ is a weighting parameter controlling how strongly the local correlation matrix $\Omega_{group}$ (i.e. the derived correlation matrix that will be used for each level of the grouping factor `gr`) is shrunk towards the global correlation matrix $\Omega_{global}$ (larger values of $\alpha_{cor}$ indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). This option is valuable for many types of designs where the same observational units (i.e. *financial assets* or *species*, for example) are measured in different strata (i.e. *regions*, *countries* or *experimental units*, for example). Currently hierarchical correlations can be included for `AR()`, `VAR()` or `ZMVN()` `trend_model` options. ### Gaussian Processes The final option for modelling temporal dynamics is to use a Gaussian Process with squared exponential kernel. These are set up independently for each series (there is currently no multivariate GP option), using `trend_model = 'GP'`. The dynamics for each latent process are modelled as: \begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*} The latent dynamic process evolves from a complex, high-dimensional Multivariate Normal distribution which depends on $\rho$ (often called the length scale parameter) to control how quickly the correlations between the model's errors decay as a function of time. For these models, covariance decays exponentially fast with the squared distance (in time) between the observations. The functions also depend on a parameter $\alpha$, which controls the marginal variability of the temporal function at all points; in other words it controls how much the GP term contributes to the linear predictor. `mvgam` capitalizes on some advances that allow GPs to be approximated using Hilbert space basis functions, which [considerably speed up computation at little cost to accuracy or prediction performance](https://link.springer.com/article/10.1007/s11222-022-10167-2){target="_blank"}. ### Piecewise logistic and linear trends Modeling growth for many types of time series is often similar to modeling population growth in natural ecosystems, where there series exhibits nonlinear growth that saturates at some particular carrying capacity. The logistic trend model available in {`mvgam`} allows for a time-varying capacity $C(t)$ as well as a non-constant growth rate. Changes in the base growth rate $k$ are incorporated by explicitly defining changepoints throughout the training period where the growth rate is allowed to vary. The changepoint vector $a$ is represented as a vector of `1`s and `0`s, and the rate of growth at time $t$ is represented as $k+a(t)^T\delta$. Potential changepoints are selected uniformly across the training period, and the number of changepoints, as well as the flexibility of the potential rate changes at these changepoints, can be controlled using `trend_model = PW()`. The full piecewise logistic growth model is then: \begin{align*} z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*} For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as: \begin{align*} z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*} In both trend models, $m$ is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends [in this paper by Taylor and Letham](https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1380080){target="_blank"}. Sean J. Taylor and Benjamin Letham. "[Forecasting at scale.](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1380080)" *The American Statistician* 72.1 (2018): 37-45. ### Continuous time AR(1) processes Most trend models in the `mvgam()` function expect time to be measured in regularly-spaced, discrete intervals (i.e. one measurement per week, or one per year for example). But some time series are taken at irregular intervals and we'd like to model autoregressive properties of these. The `trend_model = CAR()` can be useful to set up these models, which currently only support autoregressive processes of order `1`. The evolution of the latent dynamic process follows the form: \begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*} Where $distance$ is a vector of non-negative measurements of the time differences between successive observations. These models are perhaps more widely known as Ornstein–Uhlenbeck processes. See the **Examples** section in `?CAR` for an illustration of how to set these models up. ## Regression formulae `mvgam` supports an observation model regression formula, built off the `mgcv` package, as well as an optional process model regression formula. The formulae supplied to `mvgam()` are exactly like those supplied to `glm()` except that smooth terms, `s()`, `te()`, `ti()` and `t2()`, time-varying effects using `dynamic()`, monotonically increasing (using `s(x, bs = 'moi')`) or decreasing splines (using `s(x, bs = 'mod')`; see `?smooth.construct.moi.smooth.spec` for details), as well as Gaussian Process functions using `gp()`, can be added to the right hand side (and `.` is not supported in `mvgam` formulae). See `?mvgam_formulae` for more guidance. For setting up State-Space models, the optional process model formula can be used (see [the State-Space model vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) and [the shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html) for guidance on using trend formulae). ## Example time series data The 'portal_data' object contains time series of rodent captures from the Portal Project, [a long-term monitoring study based near the town of Portal, Arizona](https://portal.weecology.org/){target="_blank"}. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970's. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc...). You can read about the full sampling protocol [in this preprint by Ernest et al on the Biorxiv](https://www.biorxiv.org/content/10.1101/332783v3.full){target="_blank"}. ```{r Access time series data} data("portal_data") ``` As the data come pre-loaded with the `mvgam` package, you can read a little about it in the help page using `?portal_data`. Before working with data, it is important to inspect how the data are structured, first using `head()`: ```{r Inspect data format and structure} head(portal_data) ``` But the `glimpse()` function in `dplyr` is also useful for understanding how variables are structured ```{r} dplyr::glimpse(portal_data) ``` We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse *Chaetodipus penicillatus*. This species is interesting in that it goes into a kind of "hibernation" during the colder months, leading to very low captures during the winter period ## Manipulating data for modelling Manipulating the data into a 'long' format is necessary for modelling in `mvgam`. By 'long' format, we mean that each `series x time` observation needs to have its own entry in the `dataframe` or `list` object that we wish to use as data for modelling. A simple example can be viewed by simulating data using the `sim_mvgam()` function. See `?sim_mvgam` for more details ```{r} data <- sim_mvgam(n_series = 4, T = 24) head(data$data_train, 12) ``` Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. Rather, there is only a single column for the outcome variable, labelled `y` in these simulated data. We also must supply a variable labelled `time` to ensure the modelling software knows how to arrange the time series when building models. This setup still allows us to formulate multivariate time series models, as you can see in the [State-Space vignette](https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html). Below are the steps needed to shape our `portal_data` object into the correct form. First, we create a `time` variable, select the column representing counts of our target species (`PP`), and select appropriate variables that we can use as predictors ```{r Wrangle data for modelling} portal_data %>% # Filter the data to only contain captures of the 'PP' dplyr::filter(series == 'PP') %>% droplevels() %>% dplyr::mutate(count = captures) %>% # Add a 'year' variable dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>% # Select the variables of interest to keep in the model_data dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data ``` The data now contain six variables: `series`, a factor indexing which time series each observation belongs to `year`, the year of sampling `time`, the indicator of which time step each observation belongs to `count`, the response variable representing the number of captures of the species `PP` in each sampling observation `mintemp`, the monthly average minimum temperature at each time step `ndvi_ma12`, a 12-month moving average of the monthly Normalized Difference Vegetation Index at each time step Now check the data structure again ```{r} head(model_data) ``` ```{r} dplyr::glimpse(model_data) ``` You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values ```{r Summarise variables} summary(model_data) ``` We have some `NA`s in our response variable `count`. These observations will generally be thrown out by most modelling packages in \R. But as you will see when we work through the tutorials, `mvgam` keeps these in the data so that predictions can be automatically returned for the full dataset. The time series and some of its descriptive features can be plotted using `plot_mvgam_series()`: ```{r} plot_mvgam_series(data = model_data, series = 1, y = "count") ``` ## GLMs with temporal random effects Our first task will be to fit a Generalized Linear Model (GLM) that can adequately capture the features of our `count` observations (integer data, lower bound at zero, missing values) while also attempting to model temporal variation. We are almost ready to fit our first model, which will be a GLM with Poisson observations, a log link function and random (hierarchical) intercepts for `year`. This will allow us to capture our prior belief that, although each year is unique, having been sampled from the same population of effects, all years are connected and thus might contain valuable information about one another. This will be done by capitalizing on the partial pooling properties of hierarchical models. Hierarchical (also known as random) effects offer many advantages when modelling data with grouping structures (i.e. multiple species, locations, years etc...). The ability to incorporate these in time series models is a huge advantage over traditional models such as ARIMA or Exponential Smoothing. But before we fit the model, we will need to convert `year` to a factor so that we can use a random effect basis in `mvgam`. See `?smooth.terms` and `?smooth.construct.re.smooth.spec` for details about the `re` basis construction that is used by both `mvgam` and `mgcv` ```{r} model_data %>% # Create a 'year_fac' factor version of 'year' dplyr::mutate(year_fac = factor(year)) -> model_data ``` Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data ```{r} dplyr::glimpse(model_data) levels(model_data$year_fac) ``` We are now ready for our first `mvgam` model. The syntax will be familiar to users who have previously built models with `mgcv`. But for a refresher, see `?formula.gam` and the examples in `?gam`. Random effects can be specified using the `s` wrapper with the `re` basis. Note that we can also suppress the primary intercept using the usual `R` formula syntax `- 1`. `mvgam` has a number of possible observation families that can be used, see `?mvgam_families` for more information. We will use `Stan` as the fitting engine, which deploys Hamiltonian Monte Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will be run using a warmup of 500 iterations and collecting 500 posterior samples from each chain. The package will also aim to use the `Cmdstan` backend when possible, so it is recommended that users have an up-to-date installation of `Cmdstan` and the associated `cmdstanr` interface on their machines (note that you can set the backend yourself using the `backend` argument: see `?mvgam` for details). Interested users should consult the [`Stan` user's guide](https://mc-stan.org/docs/stan-users-guide/index.html){target="_blank"} for more information about the software and the enormous variety of models that can be tackled with HMC. ```{r model1, include=FALSE, results='hide'} model1 <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data, parallel = FALSE ) ``` ```{r eval=FALSE} model1 <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` The model can be described mathematically for each timepoint $t$ as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*} Where the $\beta_{year}$ effects are drawn from a *population* distribution that is parameterized by a common mean $(\mu_{year})$ and variance $(\sigma_{year})$. Priors on most of the model parameters can be interrogated and changed using similar functionality to the options available in `brms`. For example, the default priors on $(\mu_{year})$ and $(\sigma_{year})$ can be viewed using the following code: ```{r} get_mvgam_priors(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = model_data ) ``` See examples in `?get_mvgam_priors` to find out different ways that priors can be altered. Once the model has finished, the first step is to inspect the `summary()` to ensure no major diagnostic warnings have been produced and to quickly summarise posterior distributions for key parameters ```{r} summary(model1) ``` The diagnostic messages at the bottom of the summary show that the HMC sampler did not encounter any problems or difficult posterior spaces. This is a good sign. Posterior distributions for model parameters can be extracted in any way that an object of class `brmsfit` can (see `?mvgam::mvgam_draws` for details). For example, we can extract the coefficients related to the GAM linear predictor (i.e. the $\beta$'s) into a `data.frame` using: ```{r Extract coefficient posteriors} beta_post <- as.data.frame(model1, variable = "betas") dplyr::glimpse(beta_post) ``` With any model fitted in `mvgam`, the underlying `Stan` code can be viewed using the `stancode()` function: ```{r} stancode(model1) ``` ### Plotting effects and residuals Now for interrogating the model. We can get some sense of the variation in yearly intercepts from the summary above, but it is easier to understand them using targeted plots. Plot posterior distributions of the temporal random effects using `plot.mvgam()` with `type = 're'`. See `?plot.mvgam` for more details about the types of plots that can be produced from fitted `mvgam` objects ```{r Plot random effect estimates} plot(model1, type = "re") ``` ### `bayesplot` support We can also capitalize on most of the useful MCMC plotting functions from the `bayesplot` package to visualize posterior distributions and diagnostics (see `?mvgam::mcmc_plot.mvgam` for details): ```{r} mcmc_plot( object = model1, variable = "betas", type = "areas" ) ``` We can also use the wide range of posterior checking functions available in `bayesplot` (see `?mvgam::ppc_check.mvgam` for details): ```{r} pp_check(object = model1) ``` There is clearly some variation in these yearly intercept estimates. But how do these translate into time-varying predictions? To understand this, we can plot posterior hindcasts from this model for the training period using `plot.mvgam()` with `type = 'forecast'` ```{r Plot posterior hindcasts} plot(model1, type = "forecast") ``` If you wish to extract these hindcasts for other downstream analyses, the `hindcast()` function can be used. This will return a list object of class `mvgam_forecast`. In the `hindcasts` slot, a matrix of posterior retrodictions will be returned for each series in the data (only one series in our example): ```{r Extract posterior hindcast} hc <- hindcast(model1) str(hc) ``` You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation: ```{r Extract hindcasts on the linear predictor scale} hc <- hindcast(model1, type = "link") range(hc$hindcasts$PP) ``` In any regression analysis, a key question is whether the residuals show any patterns that can be indicative of un-modelled sources of variation. For GLMs, we can use a modified residual called the [Dunn-Smyth, or randomized quantile, residual](https://www.jstor.org/stable/1390802){target="_blank"}. Inspect Dunn-Smyth residuals from the model using `plot.mvgam()` with `type = 'residuals'` ```{r Plot posterior residuals} plot(model1, type = "residuals") ``` ## Automatic forecasting for new data These temporal random effects do not have a sense of "time". Because of this, each yearly random intercept is not restricted in some way to be similar to the previous yearly intercept. This drawback becomes evident when we predict for a new year. To do this, we can repeat the exercise above but this time will split the data into training and testing sets before re-running the model. We can then supply the test set as `newdata`. For splitting, we will make use of the `filter()` function from `dplyr` ```{r} model_data %>% dplyr::filter(time <= 70) -> data_train model_data %>% dplyr::filter(time > 70) -> data_test ``` ```{r include=FALSE, message=FALSE, warning=FALSE} model1b <- mvgam(count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model1b <- mvgam( count ~ s(year_fac, bs = "re") - 1, family = poisson(), data = data_train, newdata = data_test ) ``` We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set ```{r Plotting predictions against test data} plot(model1b, type = "forecast", newdata = data_test) ``` As with the `hindcast()` function, we can use the `forecast()` function to automatically extract the posterior distributions for these predictions. This also returns an object of class `mvgam_forecast`, but now it will contain both the hindcasts and forecasts for each series in the data: ```{r Extract posterior forecasts} fc <- forecast(model1b) str(fc) ``` ## Adding predictors as "fixed" effects Any users familiar with GLMs will know that we nearly always wish to include predictor variables that may explain some of the variation in our observations. Predictors are easily incorporated into GLMs / GAMs. Here, we will update the model from above by including a parametric (fixed) effect of `ndvi_ma12` as a linear predictor: ```{r model2, include=FALSE, message=FALSE, warning=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model2 <- mvgam( count ~ s(year_fac, bs = "re") + ndvi_ma12 - 1, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the $\beta_{year}$ effects are the same as before but we now have another predictor $(\beta_{ndvi})$ that applies to the `ndvi_ma12` value at each timepoint $t$. Inspect the summary of this model ```{r, class.output="scroll-300"} summary(model2) ``` Rather than printing the summary each time, we can also quickly look at the posterior empirical quantiles for the fixed effect of `ndvi` (and other linear predictor coefficients) using `coef`: ```{r Posterior quantiles of model coefficients} coef(model2) ``` Look at the estimated effect of `ndvi` using using a histogram. This can be done by first extracting the posterior coefficients: ```{r} beta_post <- as.data.frame(model2, variable = "betas") dplyr::glimpse(beta_post) ``` The posterior distribution for the effect of `ndvi_ma12` is stored in the `ndvi_ma12` column. A quick histogram confirms our inference that `log(counts)` respond positively to increases in `ndvi`: ```{r Histogram of NDVI effects} hist(beta_post$ndvi_ma12, xlim = c( -1 * max(abs(beta_post$ndvi_ma12)), max(abs(beta_post$ndvi)) ), col = "darkred", border = "white", xlab = expression(beta[NDVI]), ylab = "", yaxt = "n", main = "", lwd = 2 ) abline(v = 0, lwd = 2.5) ``` ### `marginaleffects` support Given our model used a nonlinear link function (log link in this example), it can still be difficult to fully understand what relationship our model is estimating between a predictor and the response. Fortunately, the `marginaleffects` package makes this relatively straightforward. Objects of class `mvgam` can be used with `marginaleffects` to inspect contrasts, scenario-based predictions, conditional and marginal effects, all on the outcome scale. Like `brms`, `mvgam` has the simple `conditional_effects()` function to make quick and informative plots for main effects, which rely on `marginaleffects` support. This will likely be your go-to function for quickly understanding patterns from fitted `mvgam` models ```{r warning=FALSE} conditional_effects(model2) ``` ## Adding predictors as smooths Smooth functions, using penalized splines, are a major feature of `mvgam`. Nonlinear splines are commonly viewed as variations of random effects in which the coefficients that control the shape of the spline are drawn from a joint, penalized distribution. This strategy is very often used in ecological time series analysis to capture smooth temporal variation in the processes we seek to study. When we construct smoothing splines, the workhorse package `mgcv` will calculate a set of basis functions that will collectively control the shape and complexity of the resulting spline. It is often helpful to visualize these basis functions to get a better sense of how splines work. We'll create a set of 6 basis functions to represent possible variation in the effect of `time` on our outcome.In addition to constructing the basis functions, `mgcv` also creates a penalty matrix $S$, which contains **known** coefficients that work to constrain the wiggliness of the resulting smooth function. When fitting a GAM to data, we must estimate the smoothing parameters ($\lambda$) that will penalize these matrices, resulting in constrained basis coefficients and smoother functions that are less likely to overfit the data. This is the key to fitting GAMs in a Bayesian framework, as we can jointly estimate the $\lambda$'s using informative priors to prevent overfitting and expand the complexity of models we can tackle. To see this in practice, we can now fit a model that replaces the yearly random effects with a smooth function of `time`. We will need a reasonably complex function (large `k`) to try and accommodate the temporal variation in our observations. Following some [useful advice by Gavin Simpson](https://fromthebottomoftheheap.net/2020/06/03/extrapolating-with-gams/){target="_blank"}, we will use a b-spline basis for the temporal smooth. Because we no longer have intercepts for each year, we also retain the primary intercept term in this model (there is no `-1` in the formula now): ```{r model3, include=FALSE, message=FALSE, warning=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test, parallel = FALSE ) ``` ```{r eval=FALSE} model3 <- mvgam( count ~ s(time, bs = "bs", k = 15) + ndvi_ma12, family = poisson(), data = data_train, newdata = data_test ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*} Where the smooth function $f_{time}$ is built by summing across a set of weighted basis functions. The basis functions $(b)$ are constructed using a thin plate regression basis in `mgcv`. The weights $(\beta_{smooth})$ are drawn from a penalized multivariate normal distribution where the precision matrix $(\Omega$) is multiplied by a smoothing penalty $(\lambda)$. If $\lambda$ becomes large, this acts to *squeeze* the covariances among the weights $(\beta_{smooth})$, leading to a less wiggly spline. Note that sometimes there are multiple smoothing penalties that contribute to the covariance matrix, but I am only showing one here for simplicity. View the summary as before ```{r} summary(model3) ``` The summary above now contains posterior estimates for the smoothing parameters as well as the basis coefficients for the nonlinear effect of `time`. We can visualize `conditional_effects` as before: ```{r warning=FALSE} conditional_effects(model3, type = "link") ``` Inspect the underlying `Stan` code to gain some idea of how the spline is being penalized: ```{r, class.output="scroll-300"} stancode(model3) ``` The line below `// prior for s(time)...` shows how the spline basis coefficients are drawn from a zero-centred multivariate normal distribution. The precision matrix $S$ is penalized by two different smoothing parameters (the $\lambda$'s) to enforce smoothness and reduce overfitting ## Latent dynamics in `mvgam` Forecasts from the above model are not ideal: ```{r} plot(model3, type = "forecast", newdata = data_test) ``` Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly *forever* beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions: ```{r Plot extrapolated temporal functions using newdata} plot_mvgam_smooth( model3, smooth = "s(time)", # pass newdata to the plot function to generate # predictions of the temporal smooth to the end of the # testing period newdata = data.frame( time = 1:max(data_test$time), ndvi_ma12 = 0 ) ) abline(v = max(data_train$time), lty = "dashed", lwd = 2) ``` This model is not doing well. Clearly we need to somehow account for the strong temporal autocorrelation when modelling these data without using a smooth function of `time`. Now onto another prominent feature of `mvgam`: the ability to include (possibly latent) autocorrelated residuals in regression models. To do so, we use the `trend_model` argument (see `?mvgam_trends` for details of different dynamic trend models that are supported). This model will use a separate sub-model for latent residuals that evolve as an AR1 process (i.e. the error in the current time point is a function of the error in the previous time point, plus some stochastic noise). We also include a smooth function of `ndvi_ma12` in this model, rather than the parametric term that was used above, to showcase that `mvgam` can include combinations of smooths and dynamic components: ```{r model4, include=FALSE} model4 <- mvgam(count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR(), parallel = FALSE ) ``` ```{r eval=FALSE} model4 <- mvgam( count ~ s(ndvi_ma12, k = 6), family = poisson(), data = data_train, newdata = data_test, trend_model = AR() ) ``` The model can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*} Here the term $z_t$ captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process: ```{r Summarise the mvgam autocorrelated error model, class.output="scroll-300"} summary(model4) ``` View posterior hindcasts / forecasts and compare against the out of sample test data ```{r} plot(model4, type = "forecast", newdata = data_test) ``` The trend is evolving as an AR1 process, which we can also view: ```{r} plot(model4, type = "trend", newdata = data_test) ``` In-sample model performance can be interrogated using leave-one-out cross-validation utilities from the `loo` package (a higher value is preferred for this metric): ```{r} loo_compare(model3, model4) ``` The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data. Though it should be obvious that this model provides better forecasts, we can quantify forecast performance for models 3 and 4 using the `forecast` and `score` functions. Here we will compare models based on their Discrete Ranked Probability Scores (a lower value is preferred for this metric) ```{r} fc_mod3 <- forecast(model3) fc_mod4 <- forecast(model4) score_mod3 <- score(fc_mod3, score = "drps") score_mod4 <- score(fc_mod4, score = "drps") sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE) ``` A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3) ## Further reading The following papers and resources offer useful material about Dynamic GAMs and how they can be applied in practice: Clark, Nicholas J. and Wells, K. [Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series](https://doi.org/10.1111/2041-210X.13974). *Methods in Ecology and Evolution*. (2023): 14, 771-784. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 de Sousa, Heitor C., et al. [Severe fire regimes decrease resilience of ectothermic populations](https://doi.org/10.1111/1365-2656.14188). *Journal of Animal Ecology* (2024): 93(11), 1656-1669. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant.](https://doi.org/10.1016/j.csda.2022.107659) *Computational Statistics & Data Analysis* (2023): 179, 107659. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Zhu, L., et al. [Responses of a widespread pest insect to extreme high temperatures are stage-dependent and divergent among seasonal cohorts](https://doi.org/10.1111/1365-2435.14711). *Functional Ecology* (2025): 39, 165–180. https://doi.org/10.1111/1365-2435.14711 ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/nmixtures.Rmd ================================================ --- title: "N-mixtures in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{N-mixtures in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) library(dplyr) # A custom ggplot2 theme theme_set(theme_classic(base_size = 12, base_family = "serif") + theme( axis.line.x.bottom = element_line( colour = "black", size = 1 ), axis.line.y.left = element_line( colour = "black", size = 1 ) )) options( ggplot2.discrete.colour = c( "#A25050", "#00008b", "darkred", "#010048" ), ggplot2.discrete.fill = c( "#A25050", "#00008b", "darkred", "#010048" ) ) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate N-mixture models for population abundance counts made with imperfect detection. ## N-mixture models An N-mixture model is a fairly recent addition to the ecological modeller's toolkit that is designed to make inferences about variation in the abundance of species when observations are imperfect ([Royle 2004](https://onlinelibrary.wiley.com/doi/10.1111/j.0006-341X.2004.00142.x){target="_blank"}). Briefly, assume $\boldsymbol{Y_{i,r}}$ is the number of individuals recorded at site $i$ during replicate sampling observation $r$ (recorded as a non-negative integer). If multiple replicate surveys are done within a short enough period to satisfy the assumption that the population remained closed (i.e. there was no substantial change in true population size between replicate surveys), we can account for the fact that observations aren't perfect. This is done by assuming that these replicate observations are Binomial random variables that are parameterized by the true "latent" abundance $N$ and a detection probability $p$: \begin{align*} \boldsymbol{Y_{i,r}} & \sim \text{Binomial}(N_i, p_r) \\ N_{i} & \sim \text{Poisson}(\lambda_i) \end{align*} Using a set of linear predictors, we can estimate effects of covariates $\boldsymbol{X}$ on the expected latent abundance (with a log link for $\lambda$) and, jointly, effects of possibly different covariates (call them $\boldsymbol{Q}$) on detection probability (with a logit link for $p$): \begin{align*} log(\lambda) & = \beta \boldsymbol{X} \\ logit(p) & = \gamma \boldsymbol{Q}\end{align*} `mvgam` can handle this type of model because it is designed to propagate unobserved temporal processes that evolve independently of the observation process in a State-space format. This setup adapts well to N-mixture models because they can be thought of as State-space models in which the latent state is a discrete variable representing the "true" but unknown population size. This is very convenient because we can incorporate any of the package's diverse effect types (i.e. multidimensional splines, time-varying effects, monotonic effects, random effects etc...) into the linear predictors. All that is required for this to work is a marginalization trick that allows `Stan`'s sampling algorithms to handle discrete parameters (see more about how this method of "integrating out" discrete parameters works in [this nice blog post by Maxwell Joseph](https://mbjoseph.github.io/posts/2020-04-28-a-step-by-step-guide-to-marginalizing-over-discrete-parameters-for-ecologists-using-stan/){target="_blank"}). The family `nmix()` is used to set up N-mixture models in `mvgam`, but we still need to do a little bit of data wrangling to ensure the data are set up in the correct format (this is especially true when we have more than one replicate survey per time period). The most important aspects are: (1) how we set up the observation `series` and `trend_map` arguments to ensure replicate surveys are mapped to the correct latent abundance model and (2) the inclusion of a `cap` variable that defines the maximum possible integer value to use for each observation when estimating latent abundance. The two examples below give a reasonable overview of how this can be done. ## Example 1: a two-species system with nonlinear trends First we will use a simple simulation in which multiple replicate observations are taken at each timepoint for two different species. The simulation produces observations at a single site over six years, with five replicate surveys per year. Each species is simulated to have different nonlinear temporal trends and different detection probabilities. For now, detection probability is fixed (i.e. it does not change over time or in association with any covariates). Notice that we add the `cap` variable, which does not need to be static, to define the maximum possible value that we think the latent abundance could be for each timepoint. This simply needs to be large enough that we get a reasonable idea of which latent N values are most likely, without adding too much computational cost: ```{r} set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_1", # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 100 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows(data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( "site_", site, "_", species, "_rep_", replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate)) ``` This data format isn't too difficult to set up, but it does differ from the traditional multidimensional array setup that is commonly used for fitting N-mixture models in other software packages. Next we ensure that species and series IDs are included as factor variables, in case we'd like to allow certain effects to vary by species ```{r} testdat$species <- factor(testdat$species, levels = unique(testdat$species) ) testdat$series <- factor(testdat$series, levels = unique(testdat$series) ) ``` Preview the dataset to get an idea of how it is structured: ```{r} dplyr::glimpse(testdat) head(testdat, 12) ``` ### Setting up the `trend_map` Finally, we need to set up the `trend_map` object. This is crucial for allowing multiple observations to be linked to the same latent process model (see more information about this argument in the [Shared latent states vignette](https://nicholasjclark.github.io/mvgam/articles/shared_states.html){target="_blank"}). In this case, the mapping operates by species and site to state that each set of replicate observations from the same time point should all share the exact same latent abundance model: ```{r} testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map ``` Notice how all of the replicates for species 1 in site 1 share the same process (i.e. the same `trend`). This will ensure that all replicates are Binomial draws of the same latent N. ### Modelling with the `nmix()` family Now we are ready to fit a model using `mvgam()`. This model will allow each species to have different detection probabilities and different temporal trends. We will use `Cmdstan` as the backend, which by default will use Hamiltonian Monte Carlo for full Bayesian inference ```{r include = FALSE, results='hide'} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` ```{r eval = FALSE} mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), samples = 1000 ) ``` View the automatically-generated `Stan` code to get a sense of how the marginalization over latent N works ```{r} code(mod) ``` The posterior summary of this model shows that it has converged nicely ```{r} summary(mod) ``` `loo()` functionality works just as it does for all `mvgam` models to aid in model comparison / selection (though note that Pareto K values often give warnings for mixture models so these may not be too helpful) ```{r} loo(mod) ``` Plot the estimated smooths of time from each species' latent abundance process (on the log scale) ```{r} plot(mod, type = "smooths", trend_effects = TRUE) ``` `marginaleffects` support allows for more useful prediction-based interrogations on different scales (though note that at the time of writing this Vignette, you must have the development version of `marginaleffects` installed for `nmix()` models to be supported; use `remotes::install_github('vincentarelbundock/marginaleffects')` to install). Objects that use family `nmix()` have a few additional prediction scales that can be used (i.e. `link`, `response`, `detection` or `latent_N`). For example, here are the estimated detection probabilities per species, which show that the model has done a nice job of estimating these parameters: ```{r} marginaleffects::plot_predictions(mod, condition = "species", type = "detection" ) + ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + theme(legend.position = "none") ``` A common goal in N-mixture modelling is to estimate the true latent abundance. The model has automatically generated predictions for the unknown latent abundance that are conditional on the observations. We can extract these and produce decent plots using a small function ```{r} hc <- hindcast(mod, type = "latent_N") # Function to plot latent abundance estimates vs truth plot_latentN <- function(hindcasts, data, species = "sp_1") { all_series <- unique(data %>% dplyr::filter(species == !!species) %>% dplyr::pull(series)) # Grab the first replicate that represents this series # so we can get the true simulated values series <- as.numeric(all_series[1]) truths <- data %>% dplyr::arrange(time, series) %>% dplyr::filter(series == !!levels(data$series)[series]) %>% dplyr::pull(truth) # In case some replicates have missing observations, # pull out predictions for ALL replicates and average over them hcs <- do.call(rbind, lapply(all_series, function(x) { ind <- which(names(hindcasts$hindcasts) %in% as.character(x)) hindcasts$hindcasts[[ind]] })) # Calculate posterior empirical quantiles of predictions pred_quantiles <- data.frame(t(apply(hcs, 2, function(x) { quantile(x, probs = c( 0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95 )) }))) pred_quantiles$time <- 1:NROW(pred_quantiles) pred_quantiles$truth <- truths # Grab observations data %>% dplyr::filter(series %in% all_series) %>% dplyr::select(time, obs) -> observations # Plot ggplot(pred_quantiles, aes(x = time, group = 1)) + geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "#DCBCBC") + geom_ribbon(aes(ymin = X30., ymax = X70.), fill = "#B97C7C") + geom_line(aes(x = time, y = truth), colour = "black", linewidth = 1 ) + geom_point(aes(x = time, y = truth), shape = 21, colour = "white", fill = "black", size = 2.5 ) + geom_jitter( data = observations, aes(x = time, y = obs), width = 0.06, shape = 21, fill = "darkred", colour = "white", size = 2.5 ) + labs( y = "Latent abundance (N)", x = "Time", title = species ) } ``` Latent abundance plots vs the simulated truths for each species are shown below. Here, the red points show the imperfect observations, the black line shows the true latent abundance, and the ribbons show credible intervals of our estimates: ```{r} plot_latentN(hc, testdat, species = "sp_1") plot_latentN(hc, testdat, species = "sp_2") ``` We can see that estimates for both species have correctly captured the true temporal variation and magnitudes in abundance ## Example 2: a larger survey with possible nonlinear effects Now for another example with a larger dataset. We will use data from [Jeff Doser's simulation example from the wonderful `spAbundance` package](https://doserlab.com/files/spabundance-web/articles/nmixturemodels){target="_blank"}. The simulated data include one continuous site-level covariate, one factor site-level covariate and two continuous sample-level covariates. This example will allow us to examine how we can include possibly nonlinear effects in the latent process and detection probability models. Download the data and grab observations / covariate measurements for one species ```{r} # Date link load(url("https://github.com/doserjef/spAbundance/raw/main/data/dataNMixSim.rda")) data.one.sp <- dataNMixSim # Pull out observations for one species data.one.sp$y <- data.one.sp$y[1, , ] # Abundance covariates that don't change across repeat sampling observations abund.cov <- dataNMixSim$abund.covs[, 1] abund.factor <- as.factor(dataNMixSim$abund.covs[, 2]) # Detection covariates that can change across repeat sampling observations # Note that `NA`s are not allowed for covariates in mvgam, so we randomly # impute them here det.cov <- dataNMixSim$det.covs$det.cov.1[, ] det.cov[is.na(det.cov)] <- rnorm(length(which(is.na(det.cov)))) det.cov2 <- dataNMixSim$det.covs$det.cov.2 det.cov2[is.na(det.cov2)] <- rnorm(length(which(is.na(det.cov2)))) ``` Next we wrangle into the appropriate 'long' data format, adding indicators of `time` and `series` for working in `mvgam`. We also add the `cap` variable to represent the maximum latent N to marginalize over for each observation ```{r} mod_data <- do.call( rbind, lapply(1:NROW(data.one.sp$y), function(x) { data.frame( y = data.one.sp$y[x, ], abund_cov = abund.cov[x], abund_fac = abund.factor[x], det_cov = det.cov[x, ], det_cov2 = det.cov2[x, ], replicate = 1:NCOL(data.one.sp$y), site = paste0("site", x) ) }) ) %>% dplyr::mutate( species = "sp_1", series = as.factor(paste0(site, "_", species, "_", replicate)) ) %>% dplyr::mutate( site = factor(site, levels = unique(site)), species = factor(species, levels = unique(species)), time = 1, cap = max(data.one.sp$y, na.rm = TRUE) + 20 ) ``` The data include observations for 225 sites with three replicates per site, though some observations are missing ```{r} NROW(mod_data) dplyr::glimpse(mod_data) head(mod_data) ``` The final step for data preparation is of course the `trend_map`, which sets up the mapping between observation replicates and the latent abundance models. This is done in the same way as in the example above ```{r} mod_data %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map %>% dplyr::arrange(trend) %>% head(12) ``` Now we are ready to fit a model using `mvgam()`. Here we will use penalized splines for each of the continuous covariate effects to detect possible nonlinear associations. We also showcase how `mvgam` can make use of the different approximation algorithms available in `Stan` by using the meanfield variational Bayes approximator (this reduces computation time from around 90 seconds to around 12 seconds for this example) ```{r include = FALSE, results='hide'} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 3) + s(det_cov2, k = 3), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 3) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` ```{r eval=FALSE} mod <- mvgam( # effects of covariates on detection probability; # here we use penalized splines for both continuous covariates formula = y ~ s(det_cov, k = 4) + s(det_cov2, k = 4), # effects of the covariates on latent abundance; # here we use a penalized spline for the continuous covariate and # hierarchical intercepts for the factor covariate trend_formula = ~ s(abund_cov, k = 4) + s(abund_fac, bs = "re"), # link multiple observations to each site trend_map = trend_map, # nmix() family and supplied data family = nmix(), data = mod_data, # standard normal priors on key regression parameters priors = c( prior(std_normal(), class = "b"), prior(std_normal(), class = "Intercept"), prior(std_normal(), class = "Intercept_trend"), prior(std_normal(), class = "sigma_raw_trend") ), # use Stan's variational inference for quicker results algorithm = "meanfield", # no need to compute "series-level" residuals residuals = FALSE, samples = 1000 ) ``` Inspect the model summary but don't bother looking at estimates for all individual spline coefficients. Notice how we no longer receive information on convergence because we did not use MCMC sampling for this model ```{r} summary(mod, include_betas = FALSE) ``` Again we can make use of `marginaleffects` support for interrogating the model through targeted predictions. First, we can inspect the estimated average detection probability ```{r} marginaleffects::avg_predictions(mod, type = "detection") ``` Next investigate estimated effects of covariates on latent abundance using the `conditional_effects()` function and specifying `type = 'link'`; this will return plots on the expectation scale ```{r} abund_plots <- plot( conditional_effects(mod, type = "link", effects = c( "abund_cov", "abund_fac" ) ), plot = FALSE ) ``` The effect of the continuous covariate on expected latent abundance ```{r} abund_plots[[1]] + ylab("Expected latent abundance") ``` The effect of the factor covariate on expected latent abundance, estimated as a hierarchical random effect ```{r} abund_plots[[2]] + ylab("Expected latent abundance") ``` Now we can investigate estimated effects of covariates on detection probability using `type = 'detection'` ```{r} det_plots <- plot( conditional_effects(mod, type = "detection", effects = c( "det_cov", "det_cov2" ) ), plot = FALSE ) ``` The covariate smooths were estimated to be somewhat nonlinear on the logit scale according to the model summary (based on their approximate significances). But inspecting conditional effects of each covariate on the probability scale is more intuitive and useful ```{r} det_plots[[1]] + ylab("Pr(detection)") det_plots[[2]] + ylab("Pr(detection)") ``` More targeted predictions are also easy with `marginaleffects` support. For example, we can ask: How does detection probability change as we change *both* detection covariates? ```{r} fivenum_round <- function(x) round(fivenum(x, na.rm = TRUE), 2) marginaleffects::plot_predictions(mod, newdata = marginaleffects::datagrid( det_cov = unique, det_cov2 = fivenum_round ), by = c("det_cov", "det_cov2"), type = "detection" ) + theme_classic() + ylab("Pr(detection)") ``` The model has found support for some important covariate effects, but of course we'd want to interrogate how well the model predicts and think about possible spatial effects to capture unmodelled variation in latent abundance (which can easily be incorporated into both linear predictors using spatial smooths). ## Further reading The following papers and resources offer useful material about N-mixture models for ecological population dynamics investigations: Guélat, Jérôme, and Kéry, Marc. “[Effects of Spatial Autocorrelation and Imperfect Detection on Species Distribution Models.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12983)” *Methods in Ecology and Evolution* 9 (2018): 1614–25. Kéry, Marc, and Royle Andrew J. "[Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models](https://shop.elsevier.com/books/applied-hierarchical-modeling-in-ecology-analysis-of-distribution-abundance-and-species-richness-in-r-and-bugs/kery/978-0-12-809585-0)". London, UK: Academic Press (2020). Royle, Andrew J. "[N‐mixture models for estimating population size from spatially replicated counts.](https://onlinelibrary.wiley.com/doi/full/10.1111/j.0006-341X.2004.00142.x)" *Biometrics* 60.1 (2004): 108-115. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/shared_states.Rmd ================================================ --- title: "Shared latent states in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Shared latent states in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` This vignette gives an example of how `mvgam` can be used to estimate models where multiple observed time series share the same latent process model. For full details on the basic `mvgam` functionality, please see [the introductory vignette](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html). ## The `trend_map` argument The `trend_map` argument in the `mvgam()` function is an optional `data.frame` that can be used to specify which series should depend on which latent process models (called "trends" in `mvgam`). This can be particularly useful if we wish to force multiple observed time series to depend on the same latent trend process, but with different observation processes. If this argument is supplied, a latent factor model is set up by setting `use_lv = TRUE` and using the supplied `trend_map` to set up the shared trends. Users familiar with the `MARSS` family of packages will recognize this as a way of specifying the $Z$ matrix. This `data.frame` needs to have column names `series` and `trend`, with integer values in the `trend` column to state which trend each series should depend on. The `series` column should have a single unique entry for each time series in the data, with names that perfectly match the factor levels of the `series` variable in `data`). For example, if we were to simulate a collection of three integer-valued time series (using `sim_mvgam`), the following `trend_map` would force the first two series to share the same latent trend process: ```{r} set.seed(122) simdat <- sim_mvgam( trend_model = AR(), prop_trend = 0.6, mu = c(0, 1, 2), family = poisson() ) trend_map <- data.frame( series = unique(simdat$data_train$series), trend = c(1, 1, 2) ) trend_map ``` We can see that the factor levels in `trend_map` match those in the data: ```{r} all.equal(levels(trend_map$series), levels(simdat$data_train$series)) ``` ### Checking `trend_map` with `run_model = FALSE` Supplying this `trend_map` to the `mvgam` function for a simple model, but setting `run_model = FALSE`, allows us to inspect the constructed `Stan` code and the data objects that would be used to condition the model. Here we will set up a model in which each series has a different observation process (with only a different intercept per series in this case), and the two latent dynamic process models evolve as independent AR1 processes that also contain a shared nonlinear smooth function to capture repeated seasonality. This model is not too complicated but it does show how we can learn shared and independent effects for collections of time series in the `mvgam` framework: ```{r} fake_mod <- mvgam( y ~ # observation model formula, which has a # different intercept per series series - 1, # process model formula, which has a shared seasonal smooth # (each latent process model shares the SAME smooth) trend_formula = ~ s(season, bs = "cc", k = 6), # AR1 dynamics (each latent process model has DIFFERENT) # dynamics; processes are estimated using the noncentred # parameterisation for improved efficiency trend_model = AR(), noncentred = TRUE, # supplied trend_map trend_map = trend_map, # data and observation family family = poisson(), data = simdat$data_train, run_model = FALSE ) ``` Inspecting the `Stan` code shows how this model is a dynamic factor model in which the loadings are constructed to reflect the supplied `trend_map`: ```{r} stancode(fake_mod) ``` Notice the line that states "lv_coefs = Z;". This uses the supplied $Z$ matrix to construct the loading coefficients. The supplied matrix now looks exactly like what you'd use if you were to create a similar model in the `MARSS` package: ```{r} fake_mod$model_data$Z ``` ### Fitting and inspecting the model Though this model doesn't perfectly match the data-generating process (which allowed each series to have different underlying dynamics), we can still fit it to show what the resulting inferences look like: ```{r full_mod, include = FALSE, results='hide'} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` ```{r eval=FALSE} full_mod <- mvgam( y ~ series - 1, trend_formula = ~ s(season, bs = "cc", k = 6), trend_model = AR(), noncentred = TRUE, trend_map = trend_map, family = poisson(), data = simdat$data_train, silent = 2 ) ``` The summary of this model is informative as it shows that only two latent process models have been estimated, even though we have three observed time series. The model converges well ```{r} summary(full_mod) ``` Both series 1 and 2 share the exact same latent process estimates, while the estimates for series 3 are different: ```{r} plot(full_mod, type = "trend", series = 1) plot(full_mod, type = "trend", series = 2) plot(full_mod, type = "trend", series = 3) ``` However, forecasts for series' 1 and 2 will differ because they have different intercepts in the observation model ## Example: signal detection Now we will explore a more complicated example. Here we simulate a true hidden signal that we are trying to track. This signal depends nonlinearly on some covariate (called `productivity`, which represents a measure of how productive the landscape is). The signal also demonstrates a fairly large amount of temporal autocorrelation: ```{r} set.seed(123) # simulate a nonlinear relationship using the mgcv function gamSim signal_dat <- mgcv::gamSim(n = 100, eg = 1, scale = 1) # productivity is one of the variables in the simulated data productivity <- signal_dat$x2 # simulate the true signal, which already has a nonlinear relationship # with productivity; we will add in a fairly strong AR1 process to # contribute to the signal true_signal <- as.vector(scale(signal_dat$y) + arima.sim(100, model = list(ar = 0.8, sd = 0.1))) ``` Plot the signal to inspect it's evolution over time ```{r} plot( true_signal, type = "l", bty = "l", lwd = 2, ylab = "True signal", xlab = "Time" ) ``` Next we simulate three sensors that are trying to track the same hidden signal. All of these sensors have different observation errors that can depend nonlinearly on a second external covariate, called `temperature` in this example. Again this makes use of `gamSim` ```{r} # Function to simulate a monotonic response to a covariate sim_monotonic <- function(x, a = 2.2, b = 2) { out <- exp(a * x) / (6 + exp(b * x)) * -1 return(2.5 * as.vector(scale(out))) } # Simulated temperature covariate temperature <- runif(100, -2, 2) # Simulate the three series sim_series <- function(n_series = 3, true_signal) { temp_effects <- mgcv::gamSim(n = 100, eg = 7, scale = 0.05) alphas <- rnorm(n_series, sd = 2) do.call(rbind, lapply(seq_len(n_series), function(series) { data.frame( observed = rnorm(length(true_signal), mean = alphas[series] + sim_monotonic(temperature, runif(1, 2.2, 3), runif(1, 2.2, 3)) + true_signal, sd = runif(1, 1, 2) ), series = paste0("sensor_", series), time = 1:length(true_signal), temperature = temperature, productivity = productivity, true_signal = true_signal ) })) } model_dat <- sim_series(true_signal = true_signal) %>% dplyr::mutate(series = factor(series)) ``` Plot the sensor observations ```{r} plot_mvgam_series( data = model_dat, y = "observed", series = "all" ) ``` And now plot the observed relationships between the three sensors and the `temperature` covariate ```{r} plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_1"), pch = 16, bty = "l", ylab = "Sensor 1", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_2"), pch = 16, bty = "l", ylab = "Sensor 2", xlab = "Temperature" ) plot( observed ~ temperature, data = model_dat %>% dplyr::filter(series == "sensor_3"), pch = 16, bty = "l", ylab = "Sensor 3", xlab = "Temperature" ) ``` ### The shared signal model Now we can formulate and fit a model that allows each sensor's observation error to depend nonlinearly on `temperature` while allowing the true signal to depend nonlinearly on `productivity`. By fixing all of the values in the `trend` column to `1` in the `trend_map`, we are assuming that all observation sensors are tracking the same latent signal. We use informative priors on the two variance components (process error and observation error), which reflect our prior belief that the observation error is smaller overall than the true process error ```{r sensor_mod, include = FALSE, results='hide'} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), burnin = 600, control = list(adapt_delta = 0.95), data = model_dat, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam( formula = # formula for observations, allowing for different # intercepts and hierarchical smooth effects of temperature observed ~ series + s(temperature, k = 10) + s(series, temperature, bs = "sz", k = 8), trend_formula = # formula for the latent signal, which can depend # nonlinearly on productivity ~ s(productivity, k = 8) - 1, trend_model = # in addition to productivity effects, the signal is # assumed to exhibit temporal autocorrelation AR(), noncentred = TRUE, trend_map = # trend_map forces all sensors to track the same # latent signal data.frame( series = unique(model_dat$series), trend = c(1, 1, 1) ), # informative priors on process error # and observation error will help with convergence priors = c( prior(normal(2, 0.5), class = sigma), prior(normal(1, 0.5), class = sigma_obs) ), # Gaussian observations family = gaussian(), data = model_dat, silent = 2 ) ``` View a reduced version of the model summary because there will be many spline coefficients in this model ```{r} summary(mod, include_betas = FALSE) ``` ### Inspecting effects on both process and observation models Don't pay much attention to the approximate *p*-values of the smooth terms. The calculation for these values is incredibly sensitive to the estimates for the smoothing parameters so I don't tend to find them to be very meaningful. What are meaningful, however, are prediction-based plots of the smooth functions. All main effects can be quickly plotted with `conditional_effects`: ```{r} conditional_effects(mod, type = "link") ``` `conditional_effects` is simply a wrapper to the more flexible `plot_predictions` function from the `marginaleffects` package. We can get more useful plots of these effects using this function for further customisation: ```{r} plot_predictions( mod, condition = c("temperature", "series", "series"), points = 0.5 ) + theme(legend.position = "none") ``` We have successfully estimated effects, some of them nonlinear, that impact the hidden process AND the observations. All in a single joint model. But there can always be challenges with these models, particularly when estimating both process and observation error at the same time. ### Recovering the hidden signal A final but very key question is whether we can successfully recover the true hidden signal. The `trend` slot in the returned model parameters has the estimates for this signal, which we can easily plot using the `mvgam` S3 method for `plot`. We can also overlay the true values for the hidden signal, which shows that our model has done a good job of recovering it: ```{r} plot(mod, type = "trend") + ggplot2::geom_point(data = data.frame(time = 1:100, y = true_signal), mapping = ggplot2::aes(x = time, y = y)) ``` ## Further reading The following papers and resources offer a lot of useful material about other types of State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. ["A guide to state–space modeling of ecological time series.](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470)" *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Ward, Eric J., et al. "[Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico.](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x)" *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/time_varying_effects.Rmd ================================================ --- title: "Time-varying effects in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Time-varying effects in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to estimate and forecast regression coefficients that vary through time. ## Time-varying effects Dynamic fixed-effect coefficients (often referred to as dynamic linear models) can be readily incorporated into GAMs / DGAMs. In `mvgam`, the `dynamic()` formula wrapper offers a convenient interface to set these up. The plan is to incorporate a range of dynamic options (such as random walk, AR1 etc...) but for the moment only low-rank Gaussian Process (GP) smooths are allowed (making use either of the `gp` basis in `mgcv` of of Hilbert space approximate GPs). These are advantageous over splines or random walk effects for several reasons. First, GPs will force the time-varying effect to be smooth. This often makes sense in reality, where we would not expect a regression coefficient to change rapidly from one time point to the next. Second, GPs provide information on the 'global' dynamics of a time-varying effect through their length-scale parameters. This means we can use them to provide accurate forecasts of how an effect is expected to change in the future, something that we couldn't do well if we used splines to estimate the effect. An example below illustrates. ### Simulating time-varying effects Simulate a time-varying coefficient using a squared exponential Gaussian Process function with length scale $\rho$=10. We will do this using an internal function from `mvgam` (the `sim_gp` function): ```{r} set.seed(1111) N <- 200 beta_temp <- mvgam:::sim_gp(rnorm(1), alpha_gp = 0.75, rho_gp = 10, h = N ) + 0.5 ``` A plot of the time-varying coefficient shows that it changes smoothly through time: ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} plot(beta_temp, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Coefficient", col = "darkred" ) box(bty = "l", lwd = 2) ``` Next we need to simulate the values of the covariate, which we will call `temp` (to represent $temperature$). In this case we just use a standard normal distribution to simulate this covariate: ```{r} temp <- rnorm(N, sd = 1) ``` Finally, simulate the outcome variable, which is a Gaussian observation process (with observation error) over the time-varying effect of $temperature$ ```{r, fig.alt = "Simulating time-varying effects in mvgam and R"} out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.25 ) time <- seq_along(temp) plot(out, type = "l", lwd = 3, bty = "l", xlab = "Time", ylab = "Outcome", col = "darkred" ) box(bty = "l", lwd = 2) ``` Gather the data into a `data.frame` for fitting models, and split the data into training and testing folds. ```{r} data <- data.frame(out, temp, time) data_train <- data[1:190, ] data_test <- data[191:200, ] ``` ### The `dynamic()` function Time-varying coefficients can be fairly easily set up using the `s()` or `gp()` wrapper functions in `mvgam` formulae by fitting a nonlinear effect of `time` and using the covariate of interest as the numeric `by` variable (see `?mgcv::s` or `?brms::gp` for more details). The `dynamic()` formula wrapper offers a way to automate this process, and will eventually allow for a broader variety of time-varying effects (such as random walk or AR processes). Depending on the arguments that are specified to `dynamic`, it will either set up a low-rank GP smooth function using `s()` with `bs = 'gp'` and a fixed value of the length scale parameter $\rho$, or it will set up a Hilbert space approximate GP using the `gp()` function with `c=5/4` so that $\rho$ is estimated (see `?dynamic` for more details). In this first example we will use the `s()` option, and will mis-specify the $\rho$ parameter here as, in practice, it is never known. This call to `dynamic()` will set up the following smooth: `s(time, by = temp, bs = "gp", m = c(-2, 8, 2), k = 40)` ```{r, include=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r, eval=FALSE} mod <- mvgam(out ~ dynamic(temp, rho = 8, stationary = TRUE, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` Inspect the model summary, which shows how the `dynamic()` wrapper was used to construct a low-rank Gaussian Process smooth function: ```{r} summary(mod, include_betas = FALSE) ``` Because this model used a spline with a `gp` basis, it's smooths can be visualised just like any other `gam`. We can plot the estimates for the in-sample and out-of-sample periods to see how the Gaussian Process function produces sensible smooth forecasts. Here we supply the full dataset to the `newdata` argument in `plot_mvgam_smooth()` to inspect posterior forecasts of the time-varying smooth function. Overlay the true simulated function to see that the model has adequately estimated it's dynamics in both the training and testing data partitions ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` We can also use `plot_predictions()` from the `marginaleffects` package to visualise the time-varying coefficient for what the effect would be estimated to be at different values of $temperature$: ```{r} require(marginaleffects) range_round <- function(x) { round(range(x, na.rm = TRUE), 2) } plot_predictions(mod, newdata = datagrid( time = unique, temp = range_round ), by = c("time", "temp", "temp"), type = "link" ) ``` This results in sensible forecasts of the observations as well ```{r} fc <- forecast(mod, newdata = data_test) plot(fc) ``` The syntax is very similar if we wish to estimate the parameters of the underlying Gaussian Process, this time using a Hilbert space approximation. We simply omit the `rho` argument in `dynamic()` to make this happen. This will set up a call similar to `gp(time, by = 'temp', c = 5/4, k = 40)`. ```{r include=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` ```{r eval=FALSE} mod <- mvgam(out ~ dynamic(temp, k = 40), family = gaussian(), data = data_train, silent = 2 ) ``` This model summary now contains estimates for the marginal deviation and length scale parameters of the underlying Gaussian Process function: ```{r} summary(mod, include_betas = FALSE) ``` Effects for `gp()` terms can also be plotted as smooths: ```{r} plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = "dashed", lwd = 2) lines(beta_temp, lwd = 2.5, col = "white") lines(beta_temp, lwd = 2) ``` ## Salmon survival example Here we will use openly available data on marine survival of Chinook salmon to illustrate how time-varying effects can be used to improve ecological time series models. [Scheuerell and Williams (2005)](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2419.2005.00346.x) used a dynamic linear model to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. The authors hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton and provide better foraging opportunities for juvenile salmon entering the ocean. The data on survival is measured as a proportional variable over 42 years (1964–2005) and is available in the `MARSS` package: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/SalmonSurvCUI.rda")) dplyr::glimpse(SalmonSurvCUI) ``` First we need to prepare the data for modelling. The variable `CUI.apr` will be standardized to make it easier for the sampler to estimate underlying GP parameters for the time-varying effect. We also need to convert the survival back to a proportion, as in its current form it has been logit-transformed (this is because most time series packages cannot handle proportional data). As usual, we also need to create a `time` indicator and a `series` indicator for working in `mvgam`: ```{r} SalmonSurvCUI %>% # create a time variable dplyr::mutate(time = dplyr::row_number()) %>% # create a series variable dplyr::mutate(series = as.factor("salmon")) %>% # z-score the covariate CUI.apr dplyr::mutate(CUI.apr = as.vector(scale(CUI.apr))) %>% # convert logit-transformed survival back to proportional dplyr::mutate(survival = plogis(logit.s)) -> model_data ``` Inspect the data ```{r} dplyr::glimpse(model_data) ``` Plot features of the outcome variable, which shows that it is a proportional variable with particular restrictions that we want to model: ```{r} plot_mvgam_series(data = model_data, y = "survival") ``` ### A State-Space Beta regression `mvgam` can easily handle data that are bounded at 0 and 1 with a Beta observation model (using the `mgcv` function `betar()`, see `?mgcv::betar` for details). First we will fit a simple State-Space model that uses an AR1 dynamic process model with no predictors and a Beta observation model: ```{r include = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` ```{r eval = FALSE} mod0 <- mvgam( formula = survival ~ 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary of this model shows good behaviour of the Hamiltonian Monte Carlo sampler and provides useful summaries on the Beta observation model parameters: ```{r} summary(mod0) ``` A plot of the underlying dynamic component shows how it has easily handled the temporal evolution of the time series: ```{r} plot(mod0, type = "trend") ``` ### Including time-varying upwelling effects Now we can increase the complexity of our model by constructing and fitting a State-Space model with a time-varying effect of the coastal upwelling index in addition to the autoregressive dynamics. We again use a Beta observation model to capture the restrictions of our proportional observations, but this time will include a `dynamic()` effect of `CUI.apr` in the latent process model. We do not specify the $\rho$ parameter, instead opting to estimate it using a Hilbert space approximate GP: ```{r include=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, control = list(adapt_delta = 0.99), silent = 2 ) ``` ```{r eval=FALSE} mod1 <- mvgam( formula = survival ~ 1, trend_formula = ~ dynamic(CUI.apr, k = 25, scale = FALSE) - 1, trend_model = AR(), noncentred = TRUE, priors = prior(normal(-3.5, 0.5), class = Intercept), family = betar(), data = model_data, silent = 2 ) ``` The summary for this model now includes estimates for the time-varying GP parameters: ```{r} summary(mod1, include_betas = FALSE) ``` The estimates for the underlying dynamic process, and for the hindcasts, haven't changed much: ```{r} plot(mod1, type = "trend") ``` ```{r} plot(mod1, type = "forecast") ``` But the process error parameter $\sigma$ is slightly smaller for this model than for the first model: ```{r} # Extract estimates of the process error 'sigma' for each model mod0_sigma <- as.data.frame(mod0, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod0") mod1_sigma <- as.data.frame(mod1, variable = "sigma", regex = TRUE) %>% dplyr::mutate(model = "Mod1") sigmas <- rbind(mod0_sigma, mod1_sigma) # Plot using ggplot2 require(ggplot2) ggplot(sigmas, aes(y = `sigma[1]`, fill = model)) + geom_density(alpha = 0.3, colour = NA) + coord_flip() ``` Why does the process error not need to be as flexible in the second model? Because the estimates of this dynamic process are now informed partly by the time-varying effect of upwelling, which we can visualise on the link scale using `plot()`: ```{r} plot(mod1, type = "smooths", trend_effects = TRUE) ``` ### Comparing model predictive performances A key question when fitting multiple time series models is whether one of them provides better predictions than the other. There are several options in `mvgam` for exploring this quantitatively. First, we can compare models based on in-sample approximate leave-one-out cross-validation as implemented in the popular `loo` package: ```{r} loo_compare(mod0, mod1) ``` The second model has the larger Expected Log Predictive Density (ELPD), meaning that it is slightly favoured over the simpler model that did not include the time-varying upwelling effect. However, the two models certainly do not differ by much. But this metric only compares in-sample performance, and we are hoping to use our models to produce reasonable forecasts. Luckily, `mvgam` also has routines for comparing models using approximate leave-future-out cross-validation. Here we refit both models to a reduced training set (starting at time point 30) and produce approximate 1-step ahead forecasts. These forecasts are used to estimate forecast ELPD before expanding the training set one time point at a time. We use Pareto-smoothed importance sampling to reweight posterior predictions, acting as a kind of particle filter so that we don't need to refit the model too often (you can read more about how this process works in Bürkner et al. 2020). ```{r include=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` ```{r eval=FALSE} lfo_mod0 <- lfo_cv(mod0, min_t = 30) lfo_mod1 <- lfo_cv(mod1, min_t = 30) ``` The model with the time-varying upwelling effect tends to provides better 1-step ahead forecasts, with a higher total forecast ELPD ```{r} sum(lfo_mod0$elpds) sum(lfo_mod1$elpds) ``` We can also plot the ELPDs for each model as a contrast. Here, values less than zero suggest the time-varying predictor model (Mod1) gives better 1-step ahead forecasts: ```{r, fig.alt = "Comparing forecast skill for dynamic beta regression models in mvgam and R"} plot( x = 1:length(lfo_mod0$elpds) + 30, y = lfo_mod0$elpds - lfo_mod1$elpds, ylab = "ELPDmod0 - ELPDmod1", xlab = "Evaluation time point", pch = 16, col = "darkred", bty = "l" ) abline(h = 0, lty = "dashed") ``` A useful exercise to further expand this model would be to think about what kinds of predictors might impact measurement error, which could easily be implemented into the observation formula in `mvgam()`. But for now, we will leave the model as-is. ## Further reading The following papers and resources offer a lot of useful material about dynamic linear models and how they can be applied / evaluated in practice: Bürkner, PC, Gabry, J and Vehtari, A [Approximate leave-future-out cross-validation for Bayesian time series models](https://www.tandfonline.com/doi/full/10.1080/00949655.2020.1783262). *Journal of Statistical Computation and Simulation*. 90:14 (2020) 2499-2523. Herrero, Asier, et al. [From the individual to the landscape and back: time‐varying effects of climate and herbivory on tree sapling growth at distribution limits](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12527). *Journal of Ecology* 104.2 (2016): 430-442. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. "[MARSS: multivariate autoregressive state-space models for analyzing time-series data.](https://journal.r-project.org/articles/RJ-2012-002/)" *R Journal*. 4.1 (2012): 11. Scheuerell, Mark D., and John G. Williams. [Forecasting climate induced changes in the survival of Snake River Spring/Summer Chinook Salmon (*Oncorhynchus Tshawytscha*)](https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2419.2005.00346.x) *Fisheries Oceanography* 14 (2005): 448–57. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au) ================================================ FILE: vignettes/trend_formulas.Rmd ================================================ --- title: "State-Space models in mvgam" author: "Nicholas J Clark" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{State-Space models in mvgam} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} params: EVAL: !r identical(tolower(Sys.getenv("NOT_CRAN")), "true") --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = if (isTRUE(exists("params"))) params$EVAL else FALSE ) ``` ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, dpi = 100, fig.asp = 0.8, fig.width = 6, out.width = "60%", fig.align = "center" ) library(mvgam) library(ggplot2) theme_set(theme_bw(base_size = 12, base_family = "serif")) ``` The purpose of this vignette is to show how the `mvgam` package can be used to fit and interrogate State-Space models with nonlinear effects. ## State-Space Models ![Illustration of a basic State-Space model, which assumes that a latent dynamic *process* (X) can evolve independently from the way we take *observations* (Y) of that process](SS_model.svg){width=85%}
State-Space models allow us to separately make inferences about the underlying dynamic *process model* that we are interested in (i.e. the evolution of a time series or a collection of time series) and the *observation model* (i.e. the way that we survey / measure this underlying process). This is extremely useful in ecology because our observations are always imperfect / noisy measurements of the thing we are interested in measuring. It is also helpful because we often know that some covariates will impact our ability to measure accurately (i.e. we cannot take accurate counts of rodents if there is a thunderstorm happening) while other covariates might impact the underlying process (it is highly unlikely that rodent abundance responds to one storm, but instead probably responds to longer-term weather and climate variation). A State-Space model allows us to model both components in a single unified modelling framework. A major advantage of `mvgam` is that it can include nonlinear effects and random effects in BOTH model components while also capturing dynamic processes. ### Lake Washington plankton data The data we will use to illustrate how we can fit State-Space models in `mvgam` are from a long-term monitoring study of plankton counts (cells per mL) taken from Lake Washington in Washington, USA. The data are available as part of the `MARSS` package and can be downloaded using the following: ```{r} load(url("https://github.com/atsa-es/MARSS/raw/master/data/lakeWAplankton.rda")) ``` We will work with five different groups of plankton: ```{r} outcomes <- c("Greens", "Bluegreens", "Diatoms", "Unicells", "Other.algae") ``` As usual, preparing the data into the correct format for `mvgam` modelling takes a little bit of wrangling in `dplyr`: ```{r} # loop across each plankton group to create the long datframe plankton_data <- do.call(rbind, lapply(outcomes, function(x) { # create a group-specific dataframe with counts labelled 'y' # and the group name in the 'series' variable data.frame( year = lakeWAplanktonTrans[, "Year"], month = lakeWAplanktonTrans[, "Month"], y = lakeWAplanktonTrans[, x], series = x, temp = lakeWAplanktonTrans[, "Temp"] ) })) %>% # change the 'series' label to a factor dplyr::mutate(series = factor(series)) %>% # filter to only include some years in the data dplyr::filter(year >= 1965 & year < 1975) %>% dplyr::arrange(year, month) %>% dplyr::group_by(series) %>% # z-score the counts so they are approximately standard normal dplyr::mutate(y = as.vector(scale(y))) %>% # add the time indicator dplyr::mutate(time = dplyr::row_number()) %>% dplyr::ungroup() ``` Inspect the data structure ```{r} head(plankton_data) ``` ```{r} dplyr::glimpse(plankton_data) ``` Note that we have z-scored the counts in this example as that will make it easier to specify priors (though this is not completely necessary; it is often better to build a model that respects the properties of the actual outcome variables) ```{r} plot_mvgam_series(data = plankton_data, series = "all") ``` We have some missing observations, but this isn't an issue for modelling in `mvgam`. A useful property to understand about these counts is that they tend to be highly seasonal. Below are some plots of z-scored counts against the z-scored temperature measurements in the lake for each month: ```{r} plankton_data %>% dplyr::filter(series == "Other.algae") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Other algae (red)") ``` ```{r} plankton_data %>% dplyr::filter(series == "Diatoms") %>% ggplot(aes(x = time, y = temp)) + geom_line(size = 1.1) + geom_line(aes(y = y), col = "white", size = 1.3 ) + geom_line(aes(y = y), col = "darkred", size = 1.1 ) + ylab("z-score") + xlab("Time") + ggtitle("Temperature (black) vs Diatoms (red)") ``` We will have to try and capture this seasonality in our process model, which should be easy to do given the flexibility of GAMs. Next we will split the data into training and testing splits: ```{r} plankton_train <- plankton_data %>% dplyr::filter(time <= 112) plankton_test <- plankton_data %>% dplyr::filter(time > 112) ``` Now time to fit some models. This requires a bit of thinking about how we can best tackle the seasonal variation and the likely dependence structure in the data. These algae are interacting as part of a complex system within the same lake, so we certainly expect there to be some lagged cross-dependencies underling their dynamics. But if we do not capture the seasonal variation, our multivariate dynamic model will be forced to try and capture it, which could lead to poor convergence and unstable results (we could feasibly capture cyclic dynamics with a more complex multi-species Lotka-Volterra model, but ordinary differential equation approaches are beyond the scope of `mvgam`). ### Capturing seasonality First we will fit a model that does not include a dynamic component, just to see if it can reproduce the seasonal variation in the observations. This model introduces hierarchical multidimensional smooths, where all time series share a "global" tensor product of the `month` and `temp` variables, capturing our expectation that algal seasonality responds to temperature variation. But this response should depend on when in the year these temperatures are recorded (i.e. a response to warm temperatures in Spring should be different to a response to warm temperatures in Autumn). The model also fits series-specific deviation smooths (i.e. one tensor product per series) to capture how each algal group's seasonality differs from the overall "global" seasonality. Note that we do not include series-specific intercepts in this model because each series was z-scored to have a mean of 0. ```{r notrend_mod, include = FALSE, results='hide'} notrend_mod <- mvgam( y ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` ```{r eval=FALSE} notrend_mod <- mvgam( y ~ # tensor of temp and month to capture # "global" seasonality te(temp, month, k = c(4, 4)) + # series-specific deviation tensor products te(temp, month, k = c(4, 4), by = series) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = "None" ) ``` The "global" tensor product smooth function can be quickly visualized: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 1) ``` On this plot, red indicates below-average linear predictors and white indicates above-average. We can then plot the deviation smooths for a few algal groups to see how they vary from the "global" pattern: ```{r} plot_mvgam_smooth(notrend_mod, smooth = 2) ``` ```{r} plot_mvgam_smooth(notrend_mod, smooth = 3) ``` These multidimensional smooths have done a good job of capturing the seasonal variation in our observations: ```{r} plot(notrend_mod, type = "forecast", series = 1) ``` ```{r} plot(notrend_mod, type = "forecast", series = 2) ``` ```{r} plot(notrend_mod, type = "forecast", series = 3) ``` This basic model gives us confidence that we can capture the seasonal variation in the observations. But the model has not captured the remaining temporal dynamics, which is obvious when we inspect Dunn-Smyth residuals for a few series: ```{r} plot(notrend_mod, type = "residuals", series = 1) ``` ```{r} plot(notrend_mod, type = "residuals", series = 3) ``` ### Multiseries dynamics Now it is time to get into multivariate State-Space models. We will fit two models that can both incorporate lagged cross-dependencies in the latent process models. The first model assumes that the process errors operate independently from one another, while the second assumes that there may be contemporaneous correlations in the process errors. Both models include a Vector Autoregressive component for the process means, and so both can model complex community dynamics. The models can be described mathematically as follows: \begin{align*} \boldsymbol{count}_t & \sim \text{Normal}(\mu_{obs[t]}, \sigma_{obs}) \\ \mu_{obs[t]} & = process_t \\ process_t & \sim \text{MVNormal}(\mu_{process[t]}, \Sigma_{process}) \\ \mu_{process[t]} & = A * process_{t-1} + f_{global}(\boldsymbol{month},\boldsymbol{temp})_t + f_{series}(\boldsymbol{month},\boldsymbol{temp})_t \\ f_{global}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{global} * \beta_{global} \\ f_{series}(\boldsymbol{month},\boldsymbol{temp}) & = \sum_{k=1}^{K}b_{series} * \beta_{series} \end{align*} Here you can see that there are no terms in the observation model apart from the underlying process model. But we could easily add covariates into the observation model if we felt that they could explain some of the systematic observation errors. We also assume independent observation processes (there is no covariance structure in the observation errors $\sigma_{obs}$). At present, `mvgam` does not support multivariate observation models. But this feature will be added in future versions. However the underlying process model is multivariate, and there is a lot going on here. This component has a Vector Autoregressive part, where the process mean at time $t$ $(\mu_{process[t]})$ is a vector that evolves as a function of where the vector-valued process model was at time $t-1$. The $A$ matrix captures these dynamics with self-dependencies on the diagonal and possibly asymmetric cross-dependencies on the off-diagonals, while also incorporating the nonlinear smooth functions that capture seasonality for each series. The contemporaneous process errors are modeled by $\Sigma_{process}$, which can be constrained so that process errors are independent (i.e. setting the off-diagonals to 0) or can be fully parameterized using a Cholesky decomposition (using `Stan`'s $LKJcorr$ distribution to place a prior on the strength of inter-species correlations). For those that are interested in the inner-workings, `mvgam` makes use of a recent breakthrough by [Sarah Heaps to enforce stationarity of Bayesian VAR processes](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). This is advantageous as we often don't expect forecast variance to increase without bound forever into the future, but many estimated VARs tend to behave this way.
Ok that was a lot to take in. Let's fit some models to try and inspect what is going on and what they assume. But first, we need to update `mvgam`'s default priors for the observation and process errors. By default, `mvgam` uses a fairly wide Student-T prior on these parameters to avoid being overly informative. But our observations are z-scored and so we do not expect very large process or observation errors. However, we also do not expect very small observation errors either as we know these measurements are not perfect. So let's update the priors for these parameters. In doing so, you will get to see how the formula for the latent process (i.e. trend) model is used in `mvgam`: ```{r} priors <- get_mvgam_priors( # observation formula, which has no terms in it y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train ) ``` Get names of all parameters whose priors can be modified: ```{r} priors[, 3] ``` And their default prior distributions: ```{r} priors[, 4] ``` Setting priors is easy in `mvgam` as you can use `brms` routines. Here we use more informative Normal priors for both error components, but we impose a lower bound of 0.2 for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` You may have noticed something else unique about this model: there is no intercept term in the observation formula. This is because a shared intercept parameter can sometimes be unidentifiable with respect to the latent VAR process, particularly if our series have similar long-run averages (which they do in this case because they were z-scored). We will often get better convergence in these State-Space models if we drop this parameter. `mvgam` accomplishes this by fixing the coefficient for the intercept to zero. Now we can fit the first model, which assumes that process errors are contemporaneously uncorrelated ```{r var_mod, include = FALSE, results='hide'} var_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(), priors = priors, adapt_delta = 0.99, burnin = 1000 ) ``` ```{r eval=FALSE} var_mod <- mvgam( # observation formula, which is empty forumla = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with uncorrelated process errors trend_model = VAR(), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` ### Inspecting SS models This model's summary is a bit different to other `mvgam` summaries. It separates parameters based on whether they belong to the observation model or to the latent process model. This is because we may often have covariates that impact the observations but not the latent process, so we can have fairly complex models for each component. You will notice that some parameters have not fully converged, particularly for the VAR coefficients (called `A` in the output) and for the process errors (`Sigma`). Note that we set `include_betas = FALSE` to stop the summary from printing output for all of the spline coefficients, which can be dense and hard to interpret: ```{r} summary(var_mod, include_betas = FALSE) ``` The convergence of this model isn't fabulous (more on this in a moment). But we can again plot the smooth functions, which this time operate on the process model. We can see the same plot using `trend_effects = TRUE` in the plotting functions: ```{r} plot(var_mod, "smooths", trend_effects = TRUE) ``` The autoregressive coefficient matrix is of particular interest here, as it captures lagged dependencies and cross-dependencies in the latent process model. Unfortunately `bayesplot` doesn't know this is a matrix of parameters so what we see is actually the transpose of the VAR matrix. Using `dir = 'v'` in the `facet_args` argument will accomplish this: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` There is a lot happening in this matrix. Each cell captures the lagged effect of the process in the column on the process in the row in the next timestep. So for example, the effect in cell [1,3] shows how an *increase* in the process for series 3 (Greens) at time $t$ is expected to impact the process for series 1 (Bluegreens) at time $t+1$. The latent process model is now capturing these effects and the smooth seasonal effects. The process error $(\Sigma)$ captures unmodelled variation in the process models. Again, we fixed the off-diagonals to 0, so the histograms for these will look like flat boxes: ```{r warning=FALSE, message=FALSE} mcmc_plot( var_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` The observation error estimates $(\sigma_{obs})$ represent how much the model thinks we might miss the true count when we take our imperfect measurements: ```{r warning=FALSE, message=FALSE} mcmc_plot(var_mod, variable = "sigma_obs", regex = TRUE, type = "hist") ``` These are still a bit hard to identify overall, especially when trying to estimate both process and observation error. Often we need to make some strong assumptions about which of these is more important for determining unexplained variation in our observations. ### Correlated process errors Let's see if these estimates improve when we allow the process errors to be correlated. Once again, we need to first update the priors for the observation errors: ```{r} priors <- c( prior(normal(0.5, 0.1), class = sigma_obs, lb = 0.2), prior(normal(0.5, 0.25), class = sigma) ) ``` And now we can fit the correlated process error model ```{r varcor_mod, include = FALSE, results='hide'} varcor_mod <- mvgam(y ~ -1, trend_formula = ~ # tensor of temp and month should capture # seasonality te(temp, month, k = c(4, 4)) + # need to use 'trend' rather than series # here te(temp, month, k = c(4, 4), by = trend) - 1, family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = VAR(cor = TRUE), burnin = 1000, adapt_delta = 0.99, priors = priors ) ``` ```{r eval=FALSE} varcor_mod <- mvgam( # observation formula, which remains empty formula = y ~ -1, # process model formula, which includes the smooth functions trend_formula = ~ te(temp, month, k = c(4, 4)) + te(temp, month, k = c(4, 4), by = trend) - 1, # VAR1 model with correlated process errors trend_model = VAR(cor = TRUE), family = gaussian(), data = plankton_train, newdata = plankton_test, # include the updated priors priors = priors, silent = 2 ) ``` The $(\Sigma)$ matrix now captures any evidence of contemporaneously correlated process error: ```{r warning=FALSE, message=FALSE} mcmc_plot( varcor_mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) ``` This symmetric matrix tells us there is support for correlated process errors, as several of the off-diagonal entries are strongly non-zero. But it is easier to interpret these estimates if we convert the covariance matrix to a correlation matrix. Here we compute the posterior median process error correlations: ```{r} Sigma_post <- as.matrix( varcor_mod, variable = "Sigma", regex = TRUE ) median_correlations <- cov2cor( matrix(apply(Sigma_post, 2, median), nrow = 5, ncol = 5 ) ) rownames(median_correlations) <- colnames(median_correlations) <- levels(plankton_train$series) round(median_correlations, 2) ``` ### Impulse response functions Because Vector Autoregressions can capture complex lagged dependencies, it is often difficult to understand how the member time series are thought to interact with one another. A method that is commonly used to directly test for possible interactions is to compute an [Impulse Response Function](https://en.wikipedia.org/wiki/Impulse_response) (IRF). If $h$ represents the simulated forecast horizon, an IRF asks how each of the remaining series might respond over times $(t+1):h$ if a focal series is given an innovation "shock" at time $t = 0$. `mvgam` can compute Generalized and Orthogonalized IRFs from models that included latent VAR dynamics. We simply feed the fitted model to the `irf()` function and then use the S3 `plot()` function to view the estimated responses. By default, `irf()` will compute IRFs by separately imposing positive shocks of one standard deviation to each series in the VAR process. Here we compute Generalized IRFs over a horizon of 12 timesteps: ```{r} irfs <- irf(varcor_mod, h = 12) ``` A summary of the IRFs can be computed using the `summary()` function: ```{r} summary(irfs) ``` But it is easier to understand these responses using plots. For example, we can plot the expected responses of the remaining series to a positive shock for series 3 (Greens) using the `plot()` function: ```{r} plot(irfs, series = 3) ``` This series of plots makes it clear that some of the other series would be expected to show both instantaneous responses to a shock for the Greens (due to their correlated process errors) as well as delayed and nonlinear responses over time (due to the complex lagged dependence structure captured by the $A$ matrix). This hopefully makes it clear why IRFs are an important tool in the analysis of multivariate autoregressive models. You can also use these IRFs to calculate a relative contribution from each shock to the forecast error variance for a focal series. This method, known as a [Forecast Error Variance Decomposition](https://en.wikipedia.org/wiki/Variance_decomposition_of_forecast_errors) (FEVD), is useful to get an idea about the amount of information that each series contributes to the evolution of all other series in a Vector Autoregression: ```{r} fevds <- fevd(varcor_mod, h = 12) plot(fevds) ``` The plot above shows the median contribution to forecast error variance for each series. ### Comparing forecast scores But which model is better? We can compute the variogram score for out of sample forecasts to get a sense of which model does a better job of capturing the dependence structure in the true evaluation set: ```{r} # create forecast objects for each model fcvar <- forecast(var_mod) fcvarcor <- forecast(varcor_mod) # plot the difference in variogram scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "variogram")$all_series$score - score(fcvar, score = "variogram")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(variogram[VAR1cor] ~ -~ variogram[VAR1]) ) abline(h = 0, lty = "dashed") ``` And we can also compute the energy score for out of sample forecasts to get a sense of which model provides forecasts that are better calibrated: ```{r} # plot the difference in energy scores; a negative value means the VAR1cor model is better, while a positive value means the VAR1 model is better diff_scores <- score(fcvarcor, score = "energy")$all_series$score - score(fcvar, score = "energy")$all_series$score plot(diff_scores, pch = 16, cex = 1.25, col = "darkred", ylim = c( -1 * max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE) ), bty = "l", xlab = "Forecast horizon", ylab = expression(energy[VAR1cor] ~ -~ energy[VAR1]) ) abline(h = 0, lty = "dashed") ``` The models tend to provide similar forecasts, though the correlated error model does slightly better overall. We would probably need to use a more extensive rolling forecast evaluation exercise if we felt like we needed to only choose one for production. `mvgam` offers some utilities for doing this (i.e. see `?lfo_cv` for guidance). Alternatively, we could use forecasts from *both* models by creating an evenly-weighted ensemble forecast distribution. This capability is available using the `ensemble()` function in `mvgam` (see `?ensemble` for guidance). Using `how_to_cite()` for models with VAR dynamics will give you information on how they are restricted to remain stationary: ```{r} description <- how_to_cite(varcor_mod) ``` ```{r, eval = FALSE} description ``` ```{r, echo=FALSE} cat("Methods text skeleton\n") cat(insight::format_message(description$methods_text)) ``` ```{r echo=FALSE} cat("\nPrimary references\n") for (i in seq_along(description$citations)) { cat(insight::format_message(description$citations[[i]])) cat('\n') } cat("\nOther useful references\n") for (i in seq_along(description$other_citations)) { cat(insight::format_message(description$other_citations[[i]])) cat('\n') } ``` More advanced hierarchical panel VAR models can also be handled by using the `gr` and `subgr` arguments in `VAR()`. These models are useful if you have a data for the same set of series (`subgr`) that are measured in different regions (`gr`), such as species measured in different sampling regions or financial series measured in different countries. ## Further reading The following papers and resources offer a lot of useful material about multivariate State-Space models and how they can be applied in practice: Auger‐Méthé, Marie, et al. [A guide to state–space modeling of ecological time series](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1470). *Ecological Monographs* 91.4 (2021): e01470. Clark, Nicholas J., et al. [Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability](https://peerj.com/articles/18929/). *PeerJ*. (2025): 13:e18929 Heaps, Sarah E. [Enforcing stationarity through the prior in vector autoregressions](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648). *Journal of Computational and Graphical Statistics* 32.1 (2023): 74-83. Hannaford, Naomi E., et al. [A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant](https://doi.org/10.1016/j.csda.2022.107659). *Computational Statistics & Data Analysis* 179 (2023): 107659. Holmes, Elizabeth E., Eric J. Ward, and Wills Kellie. [MARSS: multivariate autoregressive state-space models for analyzing time-series data](https://journal.r-project.org/articles/RJ-2012-002/). *R Journal*. 4.1 (2012): 11. Karunarathna, K.A.N.K., et al. [Modelling nonlinear responses of a desert rodent species to environmental change with hierarchical dynamic generalized additive models](https://doi.org/10.1016/j.ecolmodel.2024.110648). *Ecological Modelling* (2024): 490, 110648. Ward, Eric J., et al. [Inferring spatial structure from time‐series data: using multivariate state‐space models to detect metapopulation structure of California sea lions in the Gulf of California, Mexico](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2009.01745.x). *Journal of Applied Ecology* 47.1 (2010): 47-56. ## Interested in contributing? I'm actively seeking PhD students and other researchers to work in the areas of ecological forecasting, multivariate model evaluation and development of `mvgam`. Please see [this small list of opportunities on my website](https://ecogambler.netlify.app/opportunities/) and do reach out if you are interested (n.clark'at'uq.edu.au)