Repository: jamesotto852/ggdensity
Branch: main
Commit: cc94a18d5449
Files: 67
Total size: 247.2 KB
Directory structure:
gitextract_xvq7co1p/
├── .Rbuildignore
├── .github/
│ ├── .gitignore
│ └── workflows/
│ ├── R-CMD-check.yaml
│ ├── pkgdown.yaml
│ └── test-coverage.yaml
├── .gitignore
├── CRAN-SUBMISSION
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│ ├── attach.R
│ ├── get_hdr.R
│ ├── get_hdr_1d.R
│ ├── ggdensity-package.R
│ ├── hdr.R
│ ├── hdr_fun.R
│ ├── hdr_lines.R
│ ├── hdr_lines_fun.R
│ ├── hdr_points.R
│ ├── hdr_points_fun.R
│ ├── hdr_rug.R
│ ├── hdr_rug_fun.R
│ ├── helpers-ggplot2.R
│ ├── helpers.R
│ ├── method.R
│ └── method_1d.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── cran-comments.md
├── ggdensity.Rproj
├── man/
│ ├── geom_hdr.Rd
│ ├── geom_hdr_fun.Rd
│ ├── geom_hdr_points.Rd
│ ├── geom_hdr_points_fun.Rd
│ ├── geom_hdr_rug.Rd
│ ├── geom_hdr_rug_fun.Rd
│ ├── get_hdr.Rd
│ ├── get_hdr_1d.Rd
│ ├── ggdensity.Rd
│ ├── method_freqpoly.Rd
│ ├── method_freqpoly_1d.Rd
│ ├── method_histogram.Rd
│ ├── method_histogram_1d.Rd
│ ├── method_kde.Rd
│ ├── method_kde_1d.Rd
│ ├── method_mvnorm.Rd
│ └── method_norm_1d.Rd
├── revdep/
│ ├── .gitignore
│ ├── README.md
│ ├── cran.md
│ ├── email.yml
│ ├── failures.md
│ └── problems.md
├── tests/
│ ├── testthat/
│ │ ├── fixtures/
│ │ │ └── df_norm.rds
│ │ ├── test-fix_probs.R
│ │ ├── test-get_hdr.R
│ │ ├── test-get_hdr_1d.R
│ │ ├── test-layer-wrappers.R
│ │ ├── test-res_to_df.R
│ │ ├── test-res_to_df_1d.R
│ │ └── test-visual-tests.R
│ └── testthat.R
└── vignettes/
├── .gitignore
└── method.Rmd
================================================
FILE CONTENTS
================================================
================================================
FILE: .Rbuildignore
================================================
^ggdensity\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^README\.Rmd$
^README\.md$
^README_cache$
^man/figures$
^_pkgdown\.yml$
^docs$
^pkgdown$
^\.github$
^CRAN-SUBMISSION$
^doc$
^Meta$
^revdep$
^cran-comments\.md$
================================================
FILE: .github/.gitignore
================================================
*.html
================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
name: R-CMD-check.yaml
permissions: read-all
jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.os }} (${{ matrix.config.r }})
strategy:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-pandoc@v2
- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check
- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
================================================
FILE: .github/workflows/pkgdown.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
release:
types: [published]
workflow_dispatch:
name: pkgdown
jobs:
pkgdown:
runs-on: ubuntu-latest
# Only restrict concurrency for non-PR jobs
concurrency:
group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-pandoc@v2
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::pkgdown, local::.
needs: website
- name: Build site
run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
shell: Rscript {0}
- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/github-pages-deploy-action@v4.4.1
with:
clean: false
branch: gh-pages
folder: docs
================================================
FILE: .github/workflows/test-coverage.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
name: test-coverage.yaml
permissions: read-all
jobs:
test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr, any::xml2
needs: coverage
- name: Test coverage
run: |
cov <- covr::package_coverage(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
print(cov)
covr::to_cobertura(cov)
shell: Rscript {0}
- uses: codecov/codecov-action@v5
with:
# Fail if error if not on PR, or if on PR and token is given
fail_ci_if_error: ${{ github.event_name != 'pull_request' || secrets.CODECOV_TOKEN }}
files: ./cobertura.xml
plugins: noop
disable_search: true
token: ${{ secrets.CODECOV_TOKEN }}
- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
================================================
FILE: .gitignore
================================================
.Rproj.user
.Rhistory
.Rdata
.httr-oauth
.DS_Store
docs
inst/doc
/doc/
/Meta/
/README_cache/
revdep/checks
revdep/library
revdep/checks.noindex
revdep/library.noindex
revdep/data.sqlite
revdep/cloud.noindex
================================================
FILE: CRAN-SUBMISSION
================================================
Version: 1.0.0
Date: 2023-02-09 22:57:39 UTC
SHA: 54e4677246f7f7d4e50b02d4a5d61b993900c46f
================================================
FILE: DESCRIPTION
================================================
Package: ggdensity
Title: Interpretable Bivariate Density Visualization with 'ggplot2'
Version: 1.0.1
Authors@R:
c(person(given = "James",
family = "Otto",
role = c("aut", "cre", "cph"),
email = "jamesotto852@gmail.com",
comment = c(ORCID = "0000-0002-0665-2515")),
person(given = "David",
family = "Kahle",
role = c("aut"),
email = "david@kahle.io",
comment = c(ORCID = "0000-0002-9999-1558")))
Description: The 'ggplot2' package provides simple functions for visualizing contours
of 2-d kernel density estimates. 'ggdensity' implements several additional density estimators
as well as more interpretable visualizations based on highest density regions instead of
the traditional height of the estimated density surface.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3
Depends:
ggplot2
Imports:
isoband,
vctrs,
tibble,
MASS,
stats,
scales
URL: https://jamesotto852.github.io/ggdensity/, https://github.com/jamesotto852/ggdensity/
BugReports: https://github.com/jamesotto852/ggdensity/issues/
Suggests:
vdiffr,
testthat (>= 3.0.0),
knitr,
rmarkdown
Config/testthat/edition: 3
VignetteBuilder: knitr
================================================
FILE: LICENSE
================================================
YEAR: 2021
COPYRIGHT HOLDER: ggdensity authors
================================================
FILE: LICENSE.md
================================================
# MIT License
Copyright (c) 2021 ggdensity authors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand
export(GeomHdr)
export(GeomHdrFun)
export(GeomHdrLines)
export(GeomHdrLinesFun)
export(GeomHdrRug)
export(GeomHdrRugFun)
export(StatHdr)
export(StatHdrFun)
export(StatHdrLines)
export(StatHdrLinesFun)
export(StatHdrPoints)
export(StatHdrPointsFun)
export(StatHdrRug)
export(StatHdrRugFun)
export(geom_hdr)
export(geom_hdr_fun)
export(geom_hdr_lines)
export(geom_hdr_lines_fun)
export(geom_hdr_points)
export(geom_hdr_points_fun)
export(geom_hdr_rug)
export(geom_hdr_rug_fun)
export(get_hdr)
export(get_hdr_1d)
export(method_freqpoly)
export(method_freqpoly_1d)
export(method_histogram)
export(method_histogram_1d)
export(method_kde)
export(method_kde_1d)
export(method_mvnorm)
export(method_norm_1d)
export(stat_hdr)
export(stat_hdr_fun)
export(stat_hdr_lines)
export(stat_hdr_lines_fun)
export(stat_hdr_points)
export(stat_hdr_points_fun)
export(stat_hdr_rug)
export(stat_hdr_rug_fun)
import(ggplot2)
importFrom(MASS,bandwidth.nrd)
importFrom(MASS,kde2d)
importFrom(scales,percent)
importFrom(scales,percent_format)
importFrom(stats,cor)
importFrom(stats,cov)
importFrom(stats,dnorm)
importFrom(stats,pchisq)
importFrom(stats,sd)
importFrom(stats,setNames)
importFrom(stats,uniroot)
================================================
FILE: NEWS.md
================================================
# ggdensity 1.0.1
## Fixes
* Package startup message no longer effects the sessions RNG (Reported by @TimTaylor #34)
* Fixed ordering of probabilities in the plot legend to be independent of order specified in `probs` argument (Reported by @z3tt #32)
# ggdensity 1.0.0
## Features
* Added `get_hdr()` and `get_hdr_1d()` functions,
exporting implementation of HDR computations (Suggested by @eliocamp #28)
* Reworked `method` argument, allowing for either character or function call specification.
Implemented related `method_*()` and `method_*_1d()` functions (e.g. `method_kde()` and `method_kde_1d()`).
See `?get_hdr` or `vignette("method", "ggdensity")` for details (Suggested by @eliocamp #29)
* Added unit tests (Suggested by @eliocamp, #30)
## Breaking Changes
* Removed arguments governing density estimators from `stat_hdr()` and other layer functions--these
are now specified with `method_*()` and `method_*_1d()` functions
## Fixes
* [Added support](https://tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/) for the new `linewidth` aesthetic (Reported by @eliocamp, #23)
# ggdensity 0.1.1
## Fixes
* Removed **ggplot2** build-time dependencies (Reported by @thomasp85, #21)
* Fixed bug in `stat_hdr_lines_fun()` which drew lines between components of disconnected HDRs (Reported by @afranks86, #20)
# ggdensity 0.1.0
## Features
* Added `geom`/`stat_hdr_rug()` for visualizing marginal HDRs via "rug plot"
style graphics along plot axes (#14)
* Added `geom`/`stat_hdr_points()` and `geom`/`stat_hdr_points_fun()` for
visualizing HDR membership of points via a colored scatterplot (#15)
## Fixes
* Changed name of computed variable in all stat functions from `level` to `probs`
================================================
FILE: R/attach.R
================================================
.onAttach <- function(...) {
random_digit <- function() {
time <- as.character(Sys.time())
digit <- substr(time, nchar(time), nchar(time))
as.integer(digit)
}
if(!interactive() || random_digit() != 1L) return()
packageStartupMessage(' Please cite ggdensity! See citation("ggdensity") for details.')
}
================================================
FILE: R/get_hdr.R
================================================
#' Computing the highest density regions of a 2D density
#'
#' `get_hdr` is used to estimate a 2-dimensional density and compute
#' corresponding HDRs. The estimated density and HDRs are represented in a
#' discrete form as a grid, defined by arguments `rangex`, `rangey`, and `n`.
#' `get_hdr` is used internally by layer functions `stat_hdr()`,
#' `stat_hdr_points()`, `stat_hdr_fun()`, etc.
#'
#' @param method Either a character (`"kde"`, `"mvnorm"`, `"histogram"`,
#' `"freqpoly"`, or `"fun"`) or `method_*()` function. See the "The `method`
#' argument" section below for details.
#' @param data A data frame with columns `x` and `y`.
#' @param probs Probabilities to compute HDRs for.
#' @param rangex,rangey Range of grid representing estimated density and HDRs,
#' along the x- and y-axes.
#' @param n Resolution of grid representing estimated density and HDRs.
#' @param hdr_membership Should HDR membership of data points (`data`) be
#' computed? Defaults to `TRUE`, although it is computationally expensive for
#' large data sets.
#' @param fun Optional, a joint probability density function, must be vectorized
#' in its first two arguments. See the "The `fun` argument" section below for
#' details.
#' @param args Optional, a list of arguments to be provided to `fun`.
#'
#' @section The `method` argument: The density estimator used to estimate the
#' HDRs is specified with the `method` argument. The simplest way to specify
#' an estimator is to provide a character value to `method`, for example
#' `method = "kde"` specifies a kernel density estimator. However, this
#' specification is limited to the default behavior of the estimator.
#'
#' Instead, it is possible to provide a function call, for example: `method =
#' method_kde()`. In many cases, these functions accept parameters governing
#' the density estimation procedure. Here, `method_kde()` accepts parameters
#' `h` and `adjust`, both related to the kernel's bandwidth. For details, see
#' `?method_kde`. Every method of bivariate density estimation implemented has
#' such corresponding `method_*()` function, each with an associated help
#' page.
#'
#' Note: `geom_hdr()` and other layer functions also have `method` arguments
#' which behave in the same way. For more details on the use and
#' implementation of the `method_*()` functions, see `vignette("method",
#' "ggdensity")`.
#'
#' @section The `fun` argument: If `method` is set to `"fun"`, `get_hdr()`
#' expects a bivariate probability density function to be specified with the
#' `fun` argument. It is required that `fun` be a function of at least two
#' arguments (`x` and `y`). Beyond these first two arguments, `fun` can have
#' arbitrarily many arguments; these can be set in `get_hdr()` as a named list
#' via the `args` parameter.
#'
#' Note: `get_hdr()` requires that `fun` be vectorized in `x` and `y`. For an
#' example of an appropriate choice of `fun`, see the final example below.
#'
#' @returns
#'
#' `get_hdr` returns a list with elements `df_est` (`data.frame`), `breaks`
#' (named `numeric`), and `data` (`data.frame`).
#'
#' * `df_est`: the estimated HDRs and density evaluated on the grid defined by `rangex`, `rangey`, and `n`.
#' The column of estimated HDRs (`df_est$hdr`) is a numeric vector with values
#' from `probs`. The columns `df_est$fhat` and `df_est$fhat_discretized`
#' correspond to the estimated density on the original scale and rescaled to sum
#' to 1, respectively.
#'
#' * `breaks`: the heights of the estimated density (`df_est$fhat`) corresponding to the HDRs specified by `probs`.
#' Will always have additional element `Inf` representing the cutoff for the
#' 100% HDR.
#'
#' * `data`: the original data provided in the `data` argument.
#' If `hdr_membership` is set to `TRUE`, this includes a column
#' (`data$hdr_membership`) with the HDR corresponding to each data point.
#'
#' @examples
#' df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
#'
#' # Two ways to specify `method`
#' get_hdr(df, method = "kde")
#' get_hdr(df, method = method_kde())
#'
#' \dontrun{
#'
#' # If parenthesis are omitted, `get_hdr()` errors
#' get_hdr(df, method = method_kde)
#' }
#'
#' # Estimate different HDRs with `probs`
#' get_hdr(df, method = method_kde(), probs = c(.975, .6, .2))
#'
#' # Adjust estimator parameters with arguments to `method_kde()`
#' get_hdr(df, method = method_kde(h = 1))
#'
#' # Parametric normal estimator of density
#' get_hdr(df, method = "mvnorm")
#' get_hdr(df, method = method_mvnorm())
#'
#' # Compute "population" HDRs of specified bivariate pdf with `method = "fun"`
#' f <- function(x, y, sd_x = 1, sd_y = 1) dnorm(x, sd = sd_x) * dnorm(y, sd = sd_y)
#'
#' get_hdr(
#' method = "fun", fun = f,
#' rangex = c(-5, 5), rangey = c(-5, 5)
#' )
#'
#' get_hdr(
#' method = "fun", fun = f,
#' rangex = c(-5, 5), rangey = c(-5, 5),
#' args = list(sd_x = .5, sd_y = .5) # specify additional arguments w/ `args`
#' )
#'
#' @export
get_hdr <- function(data = NULL, method = "kde", probs = c(.99, .95, .8, .5), n = 100, rangex = NULL, rangey = NULL, hdr_membership = TRUE, fun, args = list()) {
# Deal with missing data argument
if (is.null(data)) {
if (!is.character(method) | (is.character(method) && method != "fun")) {
stop('`data` must be provided unless `method = "fun"`')
} else {
if (is.null(rangex) | is.null(rangey)) {
stop('If `data` is unspecified, `rangex` and `rangey` must be provided when `method = "fun"`')
}
}
}
rangex <- rangex %||% range(data$x)
rangey <- rangey %||% range(data$y)
probs <- fix_probs(probs)
# Create df_est (estimated density evaluated on a grid) depending on specified method:
if (is.character(method) && method == "fun") {
df_est <- f_est(method = NULL, n = n, rangex = rangex, rangey = rangey, fun = fun, args = args)
} else {
if (is.character(method)) {
if (!method %in% c("kde", "mvnorm", "histogram", "freqpoly")) stop("Invalid method specified")
# If method is provided as a character, re-assign correct function output:
method <- switch(method,
"kde" = method_kde(),
"histogram" = method_histogram(),
"freqpoly" = method_freqpoly(),
"mvnorm" = method_mvnorm()
)
}
# parse args of method to determine strategy of `method`
method_formals <- names(formals(method))
# If `data` is the only argument to `method`, we know `method`
# is a function factory, returning a closure of pdf in terms of x, y:
if (length(method_formals) == 1 && method_formals == "data") {
df_est <- f_est(method, data, n, rangex, rangey)
# Otherwise `method` computes a grid for us, shortcutting
# representing pdf in terms of x, y:
} else if (length(method_formals) == 4 && all(method_formals == c("data", "n", "rangex", "rangey"))) {
df_est <- method(data, n, rangex, rangey)
} else {
stop("Invalid `method` argument -- did you forget the `()`?")
}
}
# remove unneeded attributes
attr(df_est, "out.attrs") <- NULL
# Manipulate df_est to get information about HDRs:
# force estimate to integrate to 1
df_est$fhat_discretized <- normalize(df_est$fhat)
# temporarily rescale df$fhat for stability
fhat_max <- max(df_est$fhat)
df_est$fhat <- df_est$fhat / fhat_max
# find cutoffs (in terms of rescaled fhat)
breaks <- c(find_cutoff(df_est, probs), Inf)
# find HDRs for points in the grid
df_est$hdr <- vapply(df_est$fhat, get_hdr_val, numeric(1), breaks, probs)
# find hdr membership of points from data
if (!is.null(data) & hdr_membership) {
data$hdr_membership <- mapply(get_hdr_membership, data$x, data$y, MoreArgs = list(df_est, breaks, probs), SIMPLIFY = TRUE)
}
# transforming df_est$fhat and breaks back to original scale:
df_est$fhat <- df_est$fhat * fhat_max
breaks <- breaks * fhat_max
# Give breaks nicely formatted names, corresponding to HDRs:
names(breaks) <- scales::percent_format(accuracy = 1)(probs)
# bundle everything together
list(
df_est = df_est,
breaks = breaks,
data = data
)
}
fix_probs <- function(probs) {
stopifnot("Probabilities must be between 0 and 1, exclusive" = all(probs > 0) & all(probs < 1))
sort(probs, decreasing = TRUE)
}
get_hdr_val <- function(fhat, breaks, probs) {
hdrs <- which(fhat >= breaks)
if (length(hdrs) == 0) return(1)
probs[max(hdrs)]
}
get_hdr_membership <- function(x, y, df_est, breaks, probs) {
df_est$dist <- (x - df_est$x)^2 + (y - df_est$y)^2
fhat <- df_est[which.min(df_est$dist), "fhat"]
get_hdr_val(fhat, breaks, probs)
}
# method is a function of data
# fun is a function of vectors x, y
f_est <- function(method, data, n, rangex, rangey, fun = NULL, args = list()) {
# If `fun` isn't specified, method returns a closure
# representing closed form of density estimate
fun <- fun %||% method(data)
# grid to evaluate fun
df <- expand.grid(
"x" = seq(rangex[1], rangex[2], length.out = n),
"y" = seq(rangey[1], rangey[2], length.out = n)
)
# evaluate method on the grid, f required to be vectorized in x, y:
# (args is only non-empty if fun was specified)
df$fhat <- do.call(fun, c(quote(df$x), quote(df$y), args))
df
}
================================================
FILE: R/get_hdr_1d.R
================================================
#' Computing the highest density regions of a 1D density
#'
#' `get_hdr_1d` is used to estimate a 1-dimensional density and compute corresponding HDRs.
#' The estimated density and HDRs are represented in a discrete form as a grid, defined by arguments `range` and `n`.
#' `get_hdr_1d` is used internally by layer functions `stat_hdr_rug()` and `stat_hdr_rug_fun()`.
#'
#' @inheritParams get_hdr
#' @param method Either a character (`"kde"`, `"norm"`, `"histogram"`, `"freqpoly"`, or `"fun"`) or `method_*_1d()` function.
#' See the "The `method` argument" section below for details.
#' @param x A vector of data
#' @param hdr_membership Should HDR membership of data points (`x`) be computed?
#' @param range Range of grid representing estimated density and HDRs.
#' @param n Resolution of grid representing estimated density and HDRs.
#' @param fun Optional, a probability density function, must be vectorized in its first argument.
#' See the "The `fun` argument" section below for details.
#'
#' @section The `method` argument:
#' The density estimator used to estimate the HDRs is specified with the `method` argument.
#' The simplest way to specify an estimator is to provide a character value to `method`,
#' for example `method = "kde"` specifies a kernel density estimator.
#' However, this specification is limited to the default behavior of the estimator.
#'
#' Instead, it is possible to provide a function call, for example: `method = method_kde_1d()`.
#' This is slightly different from the function calls provided in `get_hdr()`, note the `_1d` suffix.
#' In many cases, these functions accept parameters governing the density estimation procedure.
#' Here, `method_kde_1d()` accepts several parameters related to the choice of kernel.
#' For details, see `?method_kde_1d`.
#' Every method of univariate density estimation implemented has such corresponding `method_*_1d()` function,
#' each with an associated help page.
#'
#' Note: `geom_hdr_rug()` and other layer functions also have `method` arguments which behave in the same way.
#' For more details on the use and implementation of the `method_*_1d()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @section The `fun` argument:
#' If `method` is set to `"fun"`, `get_hdr_1d()` expects a univariate probability
#' density function to be specified with the `fun` argument.
#' It is required that `fun` be a function of at least one argument (`x`).
#' Beyond this first argument, `fun` can have arbitrarily many arguments;
#' these can be set in `get_hdr_1d()` as a named list via the `args` parameter.
#'
#' Note: `get_hdr_1d()` requires that `fun` be vectorized in `x`.
#' For an example of an appropriate choice of `fun`, see the final example below.
#'
#' @returns
#'
#' `get_hdr_1d` returns a list with elements `df_est` (`data.frame`), `breaks` (named `numeric`), and `data` (`data.frame`).
#'
#' * `df_est`: the estimated HDRs and density evaluated on the grid defined by `range` and `n`.
#' The column of estimated HDRs (`df_est$hdr`) is a numeric vector with values from `probs`.
#' The columns `df_est$fhat` and `df_est$fhat_discretized` correspond to the estimated density
#' on the original scale and rescaled to sum to 1, respectively.
#'
#' * `breaks`: the heights of the estimated density (`df_est$fhat`) corresponding to the HDRs specified by `probs`.
#' Will always have additional element `Inf` representing the cutoff for the 100% HDR.
#'
#' * `data`: the original data provided in the `data` argument.
#' If `hdr_membership` is set to `TRUE`, this includes a column (`data$hdr_membership`)
#' with the HDR corresponding to each data point.
#'
#' @examples
#' x <- rnorm(1e3)
#'
#' # Two ways to specify `method`
#' get_hdr_1d(x, method = "kde")
#' get_hdr_1d(x, method = method_kde_1d())
#'
#' \dontrun{
#'
#' # If parenthesis are omitted, `get_hdr_1d()` errors
#' get_hdr_1d(df, method = method_kde_1d)
#'
#' # If the `_1d` suffix is omitted, `get_hdr_1d()` errors
#' get_hdr_1d(x, method = method_kde())
#' }
#'
#' # Adjust estimator parameters with arguments to `method_kde_1d()`
#' get_hdr_1d(x, method = method_kde_1d(kernel = "triangular"))
#'
#' # Estimate different HDRs with `probs`
#' get_hdr_1d(x, method = method_kde_1d(), probs = c(.975, .6, .2))
#'
#' # Compute "population" HDRs of specified univariate pdf with `method = "fun"`
#' f <- function(x, sd = 1) dnorm(x, sd = sd)
#' get_hdr_1d(method = "fun", fun = f, range = c(-5, 5))
#' get_hdr_1d(method = "fun", fun = f, range = c(-5, 5), args = list(sd = .5))
#'
#'
#' @export
get_hdr_1d <- function(x = NULL, method = "kde", probs = c(.99, .95, .8, .5), n = 512, range = NULL, hdr_membership = TRUE, fun, args = list()) {
# Deal with missing data argument
if (is.null(x)) {
if (!is.character(method) | (is.character(method) && method != "fun")) {
stop('`x` must be provided unless `method = "fun"`')
} else {
if (is.null(range)) {
stop('If `x` is unspecified, `range` must be provided when `method = "fun"`')
}
}
}
range <- range %||% range(x)
probs <- fix_probs(probs)
# Create df_est (estimated density evaluated on a grid) depending on specified method:
if (is.character(method) && method == "fun") {
df_est <- f_est_1d(method = NULL, x = x, n, range = range, fun = fun, args = args)
} else {
if (is.character(method)) {
if (!method %in% c("kde", "norm", "histogram", "freqpoly")) stop("Invalid method specified")
# If method is provided as a character, re-assign correct function output:
method <- switch(method,
"kde" = method_kde_1d(),
"histogram" = method_histogram_1d(),
"freqpoly" = method_freqpoly_1d(),
"norm" = method_norm_1d()
)
}
# parse args of method to determine strategy of `method`
method_formals <- names(formals(method))
# If `data` is the only argument to `method`, we know `method`
# is a function factory, returning a closure of pdf in terms of x, y:
if (length(method_formals) == 1 && method_formals %in% c("x", "y")) {
df_est <- f_est_1d(method, x, n, range)
# Otherwise `method` computes a grid for us, shortcutting
# representing pdf in terms of x, y:
} else if (length(method_formals) == 3 && method_formals[1] %in% c("x", "y") & all(method_formals[2:3] == c("n", "range"))) {
df_est <- method(x, n, range)
} else if ("data" %in% method_formals) {
stop("Invalid `method` argument -- did you forget the `_1d()`?")
} else {
stop("Invalid `method` argument -- did you forget the `()`?")
}
}
# Manipulate df_est to get information about HDRs:
# force estimate to integrate to 1
df_est$fhat_discretized <- normalize(df_est$fhat)
# temporarily rescale df$fhat for stability
fhat_max <- max(df_est$fhat)
df_est$fhat <- df_est$fhat / fhat_max
# find cutoffs (in terms of rescaled fhat)
breaks <- c(find_cutoff(df_est, probs), Inf)
# find HDRs for points in the grid
df_est$hdr <- vapply(df_est$fhat, get_hdr_val, numeric(1), breaks, probs)
# find hdr membership of points from data
if (!is.null(x) & hdr_membership) {
data <- data.frame(x = x)
if (hdr_membership) {
hdr_membership <- vapply(x, get_hdr_membership_1d, numeric(1), df_est, breaks, probs)
# create data frame w/ input data (x) + HDR membership
data$hdr_membership <- hdr_membership
}
} else {
data <- NULL
}
# transforming df_est$fhat and breaks back to original scale:
df_est$fhat <- df_est$fhat * fhat_max
breaks <- breaks * fhat_max
# Give breaks nicely formatted names, corresponding to HDRs:
names(breaks) <- scales::percent_format(accuracy = 1)(probs)
# bundle everything together
list(
df_est = df_est,
breaks = breaks,
data = data
)
}
get_hdr_membership_1d <- function(x, df_est, breaks, probs) {
df_est$dist <- (x - df_est$x)^2
fhat <- df_est[which.min(df_est$dist), "fhat"]
get_hdr_val(fhat, breaks, probs)
}
# method is a function of data vector x
# fun is a function of vector x -- the grid
# Might need to be more careful w/ axis transformations here
f_est_1d <- function(method, x, n, range, fun = NULL, args = list()) {
# If fun isn't specified, method returns a closure
# representing closed form of density estimate
fun <- fun %||% method(x)
# grid to evaluate fun
df <- data.frame(x = seq(range[1], range[2], length.out = n))
# evaluate method on the grid, f required to be vectorized in x, y:
# (args is only non-empty if fun was specified)
df$fhat <- do.call(fun, c(quote(df$x), args))
df
}
================================================
FILE: R/ggdensity-package.R
================================================
#' ggdensity: Stats and Geoms for Density Estimation with ggplot2
#'
#' A package that allows more flexible computations for visualization of density
#' estimates with ggplot2.
#'
#' @seealso
#'
#' Useful links:
#' * \url{https://jamesotto852.github.io/ggdensity/}
#' * \url{https://github.com/jamesotto852/ggdensity/}
#'
#' @import ggplot2
#' @importFrom MASS bandwidth.nrd kde2d
#' @importFrom stats uniroot cov pchisq setNames sd cor dnorm
#' @name ggdensity
#' @aliases ggdensity package-ggdensity
NULL
================================================
FILE: R/hdr.R
================================================
#' Highest density regions of a 2D density estimate
#'
#' Perform 2D density estimation, compute and plot the resulting highest density regions.
#' `geom_hdr()` draws filled regions and `geom_hdr_lines()` draws lines outlining the regions.
#' Note, the plotted objects have probabilities mapped to the `alpha` aesthetic by default.
#'
#' @section Aesthetics: `geom_hdr()` and `geom_hdr_lines()` understand the following aesthetics (required
#' aesthetics are in bold):
#'
#' - **x**
#' - **y**
#' - alpha
#' - color
#' - fill (only `geom_hdr`)
#' - group
#' - linetype
#' - linewidth
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability associated with the highest density region, specified
#' by `probs` argument.} }
#'
#' @inheritParams ggplot2::geom_path
#' @inheritParams ggplot2::stat_identity
#' @inheritParams ggplot2::stat_density2d
#' @param method Density estimator to use, accepts character vector:
#' `"kde"`,`"histogram"`, `"freqpoly"`, or `"mvnorm"`.
#' Alternatively accepts functions which return closures corresponding to density estimates,
#' see `?get_hdr` or `vignette("method", "ggdensity")`.
#' @param probs Probabilities to compute highest density regions for.
#' @param xlim,ylim Range to compute and draw regions. If `NULL`, defaults to
#' range of data.
#' @param n Resolution of grid defined by `xlim` and `ylim`.
#' Ignored if `method = "histogram"` or `method = "freqpoly"`.
#' @name geom_hdr
#' @rdname geom_hdr
#' @references Scott, David W. Multivariate Density Estimation (2e), Wiley.
#'
#' @import ggplot2
#'
#' @examples
#' # Basic simulated data with bivariate normal data and various methods
#' df <- data.frame(x = rnorm(1000), y = rnorm(1000))
#' p <- ggplot(df, aes(x, y)) + coord_equal()
#'
#' p + geom_hdr()
#' p + geom_hdr(method = "mvnorm")
#' p + geom_hdr(method = "freqpoly")
#' # p + geom_hdr(method = "histogram")
#'
#' # Adding point layers on top to visually assess region estimates
#' pts <- geom_point(size = .2, color = "red")
#'
#' p + geom_hdr() + pts
#' p + geom_hdr(method = "mvnorm") + pts
#' p + geom_hdr(method = "freqpoly") + pts
#' # p + geom_hdr(method = "histogram") + pts
#'
#' # Highest density region boundary lines
#' p + geom_hdr_lines()
#' p + geom_hdr_lines(method = "mvnorm")
#' p + geom_hdr_lines(method = "freqpoly")
#' # p + geom_hdr_lines(method = "histogram")
#'
#' \dontrun{
#'
#' # 2+ groups - mapping other aesthetics in the geom
#' rdata <- function(n, n_groups = 3, radius = 3) {
#' list_of_dfs <- lapply(0:(n_groups-1), function(k) {
#' mu <- c(cos(2*k*pi/n_groups), sin(2*k*pi/n_groups))
#' m <- MASS::mvrnorm(n, radius*mu, diag(2))
#' structure(data.frame(m, as.character(k)), names = c("x", "y", "c"))
#' })
#' do.call("rbind", list_of_dfs)
#' }
#'
#' dfc <- rdata(1000, n_groups = 5)
#' pf <- ggplot(dfc, aes(x, y, fill = c)) + coord_equal()
#'
#' pf + geom_hdr()
#' pf + geom_hdr(method = "mvnorm")
#' pf + geom_hdr(method = "mvnorm", probs = .90, alpha = .5)
#' pf + geom_hdr(method = "histogram")
#' pf + geom_hdr(method = "freqpoly")
#'
#' pc <- ggplot(dfc, aes(x, y, color = c)) +
#' coord_equal() +
#' theme_minimal() +
#' theme(panel.grid.minor = element_blank())
#'
#' pc + geom_hdr_lines()
#' pc + geom_hdr_lines(method = "mvnorm")
#'
#'
#' # Data with boundaries
#' ggplot(df, aes(x^2)) + geom_histogram(bins = 30)
#' ggplot(df, aes(x^2)) + geom_histogram(bins = 30, boundary = 0)
#' ggplot(df, aes(x^2, y^2)) + geom_hdr(method = "histogram")
#'
#' }
#'
NULL
#' @rdname geom_hdr
#' @export
stat_hdr <- function(mapping = NULL, data = NULL,
geom = "hdr", position = "identity",
...,
method = "kde",
probs = c(.99, .95, .8, .5),
n = 100,
xlim = NULL,
ylim = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatHdr,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method,
probs = probs,
n = n,
xlim = xlim,
ylim = ylim,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr
#' @format NULL
#' @usage NULL
#' @importFrom scales percent
#' @export
StatHdr <- ggproto("StatHdr", Stat,
required_aes = c("x", "y"),
default_aes = aes(order = after_stat(probs), alpha = after_stat(probs)),
output = "bands",
compute_group = function(self, data, scales, na.rm = FALSE,
method = "kde", probs = c(.99, .95, .8, .5),
n = 100, xlim = NULL, ylim = NULL) {
rangex <- xlim %||% scales$x$dimension()
rangey <- ylim %||% scales$y$dimension()
# Only calculate HDR membership if we need to
need_membership <- (self$output == "points")
res <- get_hdr(data, method, probs, n, rangex, rangey, hdr_membership = need_membership)
res_to_df(res, probs, data$group[1], self$output)
}
)
# internal helper function to convert output of `get_hdr[_1d]()` into
# what `GeomHdr*$draw_group()` methods need
res_to_df <- function(res, probs, group, output) {
probs <- fix_probs(probs)
# Need z for xyz_to_isobands/lines()
res$df_est$z <- res$df_est$fhat
if (output == "bands") {
isobands <- xyz_to_isobands(res$df_est, res$breaks)
names(isobands) <- scales::percent_format(accuracy = 1)(probs)
df <- iso_to_polygon(isobands, group)
df$probs <- ordered(df$level, levels = names(isobands))
df$level <- NULL
} else if (output == "lines") {
isolines <- xyz_to_isolines(res$df_est, res$breaks)
names(isolines) <- scales::percent_format(accuracy = 1)(probs)
df <- iso_to_path(isolines, group)
df$probs <- ordered(df$level, levels = names(isolines))
df$level <- NULL
} else if (output == "points") {
df <- res$data
df$hdr_membership <- scales::percent_format(accuracy = 1)(df$hdr_membership)
df$probs <- ordered(df$hdr_membership, levels = scales::percent_format(accuracy = 1)(c(1, probs)))
df$hdr_membership <- NULL
}
df
}
#' @rdname geom_hdr
#' @export
geom_hdr <- function(mapping = NULL, data = NULL,
stat = "hdr", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdr,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr
#' @format NULL
#' @usage NULL
#' @export
GeomHdr <- ggproto("GeomHdr", GeomPolygon)
================================================
FILE: R/hdr_fun.R
================================================
#' Highest density regions of a bivariate pdf
#'
#' Compute and plot the highest density regions (HDRs) of a bivariate pdf.
#' `geom_hdr_fun()` draws filled regions, and `geom_hdr_lines_fun()` draws lines outlining the regions.
#' Note, the plotted objects have probabilities mapped to the `alpha` aesthetic by default.
#'
#' @section Aesthetics: `geom_hdr_fun()` and `geom_hdr_lines_fun()` understand the following aesthetics (required
#' aesthetics are in bold):
#'
#' - x
#' - y
#' - alpha
#' - color
#' - fill (only `geom_hdr_fun`)
#' - group
#' - linetype
#' - linewidth
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability associated with the highest density region, specified
#' by `probs`.} }
#'
#' @inheritParams ggplot2::geom_path
#' @inheritParams ggplot2::stat_identity
#' @inheritParams ggplot2::stat_density2d
#' @param fun A function, the joint probability density function, must be
#' vectorized in its first two arguments; see examples.
#' @param args Named list of additional arguments passed on to `fun`.
#' @param probs Probabilities to compute highest density regions for.
#' @param n Resolution of grid `fun` is evaluated on.
#' @param xlim,ylim Range to compute and draw regions. If `NULL`, defaults to
#' range of data if present.
#' @name geom_hdr_fun
#' @rdname geom_hdr_fun
#'
#' @import ggplot2
#'
#' @examples
#' # HDRs of the bivariate exponential
#' f <- function(x, y) dexp(x) * dexp(y)
#' ggplot() + geom_hdr_fun(fun = f, xlim = c(0, 10), ylim = c(0, 10))
#'
#'
#' # HDRs of a custom parametric model
#'
#' # generate example data
#' n <- 1000
#' th_true <- c(3, 8)
#'
#' rdata <- function(n, th) {
#' gen_single_obs <- function(th) {
#' rchisq(2, df = th) # can be anything
#' }
#' df <- replicate(n, gen_single_obs(th))
#' setNames(as.data.frame(t(df)), c("x", "y"))
#' }
#' data <- rdata(n, th_true)
#'
#' # estimate unknown parameters via maximum likelihood
#' likelihood <- function(th) {
#' th <- abs(th) # hack to enforce parameter space boundary
#' log_f <- function(v) {
#' x <- v[1]; y <- v[2]
#' dchisq(x, df = th[1], log = TRUE) + dchisq(y, df = th[2], log = TRUE)
#' }
#' sum(apply(data, 1, log_f))
#' }
#' (th_hat <- optim(c(1, 1), likelihood, control = list(fnscale = -1))$par)
#'
#' # plot f for the give model
#' f <- function(x, y, th) dchisq(x, df = th[1]) * dchisq(y, df = th[2])
#'
#' ggplot(data, aes(x, y)) +
#' geom_hdr_fun(fun = f, args = list(th = th_hat)) +
#' geom_point(size = .25, color = "red") +
#' xlim(0, 30) + ylim(c(0, 30))
#'
#' ggplot(data, aes(x, y)) +
#' geom_hdr_lines_fun(fun = f, args = list(th = th_hat)) +
#' geom_point(size = .25, color = "red") +
#' xlim(0, 30) + ylim(c(0, 30))
#'
#'
NULL
#' @rdname geom_hdr_fun
#' @export
stat_hdr_fun <- function(mapping = NULL, data = NULL,
geom = "hdr_fun", position = "identity",
...,
fun, args = list(),
probs = c(.99, .95, .8, .5),
xlim = NULL, ylim = NULL, n = 100,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = StatHdrFun,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
fun = fun,
args = args,
probs = probs,
xlim = xlim,
ylim = ylim,
n = n,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_fun
#' @format NULL
#' @usage NULL
#' @importFrom scales percent
#' @export
StatHdrFun <- ggproto("StatHdrFun", Stat,
default_aes = aes(order = after_stat(probs), alpha = after_stat(probs)),
output = "bands",
# very similar to StatHdr$compute_group(),
# only difference are the parameters fun + args (vs. method + parameters)
# -- this prevents factoring into one compute_group() method,
# compute_group()'s arguments are different.
compute_group = function(self, data, scales, na.rm = FALSE,
fun, args = list(), probs = c(.99, .95, .8, .5),
n = 100, xlim = NULL, ylim = NULL) {
if ((is.null(xlim) & is.null(scales$x)) | (is.null(ylim) & is.null(scales$y))) {
stop("If no data is provided to StatHdrFun, xlim and ylim must be specified")
}
rangex <- xlim %||% scales$x$dimension()
rangey <- ylim %||% scales$y$dimension()
# Only calculate HDR membership if we need to
need_membership <- (self$output == "points")
res <- get_hdr(data, method = "fun", probs, n, rangex, rangey, hdr_membership = need_membership, fun = fun, args = args)
res_to_df(res, probs, data$group[1], self$output)
}
)
#' @rdname geom_hdr_fun
#' @export
geom_hdr_fun <- function(mapping = NULL, data = NULL,
stat = "hdr_fun", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdrFun,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_fun
#' @format NULL
#' @usage NULL
#' @export
GeomHdrFun <- ggproto("GeomHdrFun", GeomHdr)
================================================
FILE: R/hdr_lines.R
================================================
#' @rdname geom_hdr
#' @usage NULL
#' @export
stat_hdr_lines <- function(mapping = NULL, data = NULL,
geom = "hdr_lines", position = "identity",
...,
method = "kde",
probs = c(.99, .95, .8, .5),
n = 100,
xlim = NULL,
ylim = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatHdrLines,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method,
probs = probs,
n = n,
xlim = xlim,
ylim = ylim,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr
#' @format NULL
#' @usage NULL
#' @importFrom scales percent_format
#' @export
StatHdrLines <- ggproto("StatHdrLines", StatHdr,
output = "lines"
)
#' @rdname geom_hdr
#' @usage NULL
#' @export
geom_hdr_lines <- function(mapping = NULL, data = NULL,
stat = "hdr_lines", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdrLines,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr
#' @format NULL
#' @usage NULL
#' @export
GeomHdrLines <- ggproto("GeomHdrLines", GeomPath,
default_aes = aes(
colour = "#000000",
linewidth = 1,
linetype = 1,
alpha = NA
))
================================================
FILE: R/hdr_lines_fun.R
================================================
#' @rdname geom_hdr_fun
#' @usage NULL
#' @export
stat_hdr_lines_fun <- function(mapping = NULL, data = NULL,
geom = "hdr_lines_fun", position = "identity",
...,
fun, args = list(),
probs = c(.99, .95, .8, .5),
xlim = NULL, ylim = NULL, n = 100,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = StatHdrLinesFun,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
fun = fun,
args = args,
probs = probs,
xlim = xlim,
ylim = ylim,
n = n,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_fun
#' @format NULL
#' @usage NULL
#' @importFrom scales percent
#' @export
StatHdrLinesFun <- ggproto("StatHdrLinesFun", StatHdrFun,
output = "lines"
)
#' @rdname geom_hdr_fun
#' @usage NULL
#' @export
geom_hdr_lines_fun <- function(mapping = NULL, data = NULL,
stat = "hdr_lines_fun", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdrLinesFun,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_fun
#' @format NULL
#' @usage NULL
#' @export
GeomHdrLinesFun <- ggproto("GeomHdrlinesfun", GeomHdrLines)
================================================
FILE: R/hdr_points.R
================================================
#' Scatterplot colored by highest density regions of a 2D density estimate
#'
#' Perform 2D density estimation, compute the resulting highest density regions (HDRs),
#' and plot the provided data as a scatterplot with points colored according to
#' their corresponding HDR.
#'
#' @section Aesthetics: geom_hdr_points understands the following aesthetics (required
#' aesthetics are in bold):
#'
#' - **x**
#' - **y**
#' - alpha
#' - color
#' - fill
#' - group
#' - linetype
#' - size
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability associated with the highest density region, specified
#' by `probs`.} }
#'
#' @inheritParams ggplot2::stat_identity
#' @inheritParams ggplot2::stat_density2d
#' @inheritParams geom_hdr
#'
#' @name geom_hdr_points
#' @rdname geom_hdr_points
#'
#' @import ggplot2
#'
#' @examples
#' set.seed(1)
#' df <- data.frame(x = rnorm(500), y = rnorm(500))
#' p <- ggplot(df, aes(x, y)) +
#' coord_equal()
#'
#' p + geom_hdr_points()
#'
#' # Setting aes(fill = after_stat(probs)), color = "black", and
#' # shape = 21 helps alleviate overplotting:
#' p + geom_hdr_points(aes(fill = after_stat(probs)), color = "black", shape = 21, size = 2)
#'
#' # Also works well with geom_hdr_lines()
#' p +
#' geom_hdr_lines(
#' aes(color = after_stat(probs)), alpha = 1,
#' xlim = c(-5, 5), ylim = c(-5, 5)
#' ) +
#' geom_hdr_points(
#' aes(fill = after_stat(probs)), color = "black", shape = 21, size = 2,
#' xlim = c(-5, 5), ylim = c(-5, 5)
#' )
#'
NULL
#' @export
#' @rdname geom_hdr_points
stat_hdr_points <- function(mapping = NULL, data = NULL,
geom = "point", position = "identity",
...,
method = "kde",
probs = c(.99, .95, .8, .5),
n = 100,
xlim = NULL,
ylim = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatHdrPoints,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method,
probs = probs,
n = n,
xlim = xlim,
ylim = ylim,
na.rm = na.rm,
...
)
)
}
#' @export
#' @rdname geom_hdr_points
#' @format NULL
#' @usage NULL
StatHdrPoints <- ggproto("StatHdrPoints", StatHdr,
default_aes = aes(order = after_stat(probs), color = after_stat(probs)),
output = "points"
)
#' @export
#' @rdname geom_hdr_points
geom_hdr_points <- function(mapping = NULL, data = NULL,
stat = "hdr_points", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomPoint,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
================================================
FILE: R/hdr_points_fun.R
================================================
#' Scatterplot colored by highest density regions of a bivariate pdf
#'
#' Compute the highest density regions (HDRs) of a bivariate pdf and plot the provided
#' data as a scatterplot with points colored according to their corresponding HDR.
#'
#' @section Aesthetics: geom_hdr_points_fun understands the following aesthetics
#' (required aesthetics are in bold):
#'
#' - **x**
#' - **y**
#' - alpha
#' - color
#' - fill
#' - group
#' - linetype
#' - size
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability associated with the highest density region, specified
#' by `probs`.} }
#'
#' @inheritParams ggplot2::stat_identity
#' @inheritParams ggplot2::stat_density2d
#' @inheritParams geom_hdr_fun
#'
#' @name geom_hdr_points_fun
#' @rdname geom_hdr_points_fun
#'
#' @import ggplot2
#'
#' @examples
#' # Can plot points colored according to known pdf:
#' set.seed(1)
#' df <- data.frame(x = rexp(1000), y = rexp(1000))
#' f <- function(x, y) dexp(x) * dexp(y)
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr_points_fun(fun = f, xlim = c(0, 10), ylim = c(0, 10))
#'
#'
#' # Also allows for hdrs of a custom parametric model
#'
#' # generate example data
#' n <- 1000
#' th_true <- c(3, 8)
#'
#' rdata <- function(n, th) {
#' gen_single_obs <- function(th) {
#' rchisq(2, df = th) # can be anything
#' }
#' df <- replicate(n, gen_single_obs(th))
#' setNames(as.data.frame(t(df)), c("x", "y"))
#' }
#' data <- rdata(n, th_true)
#'
#' # estimate unknown parameters via maximum likelihood
#' likelihood <- function(th) {
#' th <- abs(th) # hack to enforce parameter space boundary
#' log_f <- function(v) {
#' x <- v[1]; y <- v[2]
#' dchisq(x, df = th[1], log = TRUE) + dchisq(y, df = th[2], log = TRUE)
#' }
#' sum(apply(data, 1, log_f))
#' }
#' (th_hat <- optim(c(1, 1), likelihood, control = list(fnscale = -1))$par)
#'
#' # plot f for the give model
#' f <- function(x, y, th) dchisq(x, df = th[1]) * dchisq(y, df = th[2])
#'
#' ggplot(data, aes(x, y)) +
#' geom_hdr_points_fun(fun = f, args = list(th = th_hat))
#'
#' ggplot(data, aes(x, y)) +
#' geom_hdr_points_fun(aes(fill = after_stat(probs)), shape = 21, color = "black",
#' fun = f, args = list(th = th_hat), na.rm = TRUE) +
#' geom_hdr_lines_fun(aes(color = after_stat(probs)), alpha = 1, fun = f, args = list(th = th_hat)) +
#' lims(x = c(0, 15), y = c(0, 25))
#'
NULL
#' @export
#' @rdname geom_hdr_points_fun
stat_hdr_points_fun <- function(mapping = NULL, data = NULL,
geom = "point", position = "identity",
...,
fun, args = list(),
probs = c(.99, .95, .8, .5),
xlim = NULL, ylim = NULL, n = 100,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatHdrPointsFun,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
fun = fun,
args = args,
probs = probs,
xlim = xlim,
ylim = ylim,
n = n,
na.rm = na.rm,
...
)
)
}
#' @export
#' @format NULL
#' @usage NULL
#' @rdname geom_hdr_points_fun
StatHdrPointsFun <- ggproto("StatHdrPointsFun", StatHdrFun,
default_aes = aes(order = after_stat(probs), color = after_stat(probs)),
output = "points"
)
#' @export
#' @rdname geom_hdr_points_fun
geom_hdr_points_fun <- function(mapping = NULL, data = NULL,
stat = "hdr_points_fun", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomPoint,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
================================================
FILE: R/hdr_rug.R
================================================
#' Rug plots of marginal highest density region estimates
#'
#' Perform 1D density estimation, compute and plot the resulting highest density
#' regions in a way similar to [ggplot2::geom_rug()].
#' Note, the plotted objects have probabilities mapped to the `alpha` aesthetic by default.
#'
#' @section Aesthetics: geom_hdr_rug understands the following aesthetics (required
#' aesthetics are in bold):
#'
#' - x
#' - y
#' - alpha
#' - fill
#' - group
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability of the highest density region, specified
#' by `probs`, corresponding to each point.} }
#'
#' @inheritParams ggplot2::geom_rug
#' @inheritParams stat_hdr
#' @param method,method_y Density estimator(s) to use.
#' By default `method` is used for both x- and y-axis.
#' If specified, `method_y` will be used for y-axis.
#' Accepts character vector: `"kde"`,`"histogram"`, `"freqpoly"`, or `"norm"`.
#' Alternatively accepts functions which return closures corresponding to density estimates,
#' see `?get_hdr_1d` or `vignette("method", "ggdensity")`.
#' @name geom_hdr_rug
#' @rdname geom_hdr_rug
#'
#' @import ggplot2
#'
#' @examples
#' set.seed(1)
#' df <- data.frame(x = rnorm(100), y = rnorm(100))
#'
#' # Plot marginal HDRs for bivariate data
#' ggplot(df, aes(x, y)) +
#' geom_point() +
#' geom_hdr_rug() +
#' coord_fixed()
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr() +
#' geom_hdr_rug() +
#' coord_fixed()
#'
#' # Plot HDR for univariate data
#' ggplot(df, aes(x)) +
#' geom_density() +
#' geom_hdr_rug()
#'
#' ggplot(df, aes(y = y)) +
#' geom_density() +
#' geom_hdr_rug()
#'
#' # Specify location of marginal HDRs as in ggplot2::geom_rug()
#' ggplot(df, aes(x, y)) +
#' geom_hdr() +
#' geom_hdr_rug(sides = "tr", outside = TRUE) +
#' coord_fixed(clip = "off")
#'
#' # Can use same methods of density estimation as geom_hdr().
#' # For data with constrained support, we suggest setting method = "histogram":
#' ggplot(df, aes(x^2)) +
#' geom_histogram(bins = 30, boundary = 0) +
#' geom_hdr_rug(method = "histogram")
#'
#' ggplot(df, aes(x^2, y^2)) +
#' geom_hdr(method = "histogram") +
#' geom_hdr_rug(method = "histogram") +
#' coord_fixed()
#'
NULL
#' @rdname geom_hdr_rug
#' @export
stat_hdr_rug <- function(mapping = NULL, data = NULL,
geom = "hdr_rug", position = "identity",
...,
method = "kde",
method_y = "kde",
probs = c(.99, .95, .8, .5),
xlim = NULL,
ylim = NULL,
n = 512,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatHdrRug,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method,
method_y = method_y,
probs = probs,
xlim = xlim,
ylim = ylim,
n = n,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_rug
#' @format NULL
#' @usage NULL
#' @export
StatHdrRug <- ggproto("StatHdrRug", Stat,
required_aes = c("x|y"),
default_aes = aes(alpha = after_stat(probs)),
compute_group = function(data, scales, na.rm = FALSE,
method = "kde", method_y = NULL,
probs = c(.99, .95, .8, .5),
xlim = NULL, ylim = NULL, n = 512) {
# Recycle for both x, y
if (length(n) == 1) n <- rep(n, 2)
# If no alternative method_y, use method
if (is.null(method_y)) method_y <- method
# Estimate marginal densities
# Initialize dfs for x and y axes,
# in case only x or y are supplied:
df_x <- data.frame()
df_y <- data.frame()
if (!is.null(data$x)) {
rangex <- xlim %||% scales$x$dimension()
res_x <- get_hdr_1d(data$x, method, probs, n[1], rangex, hdr_membership = FALSE)
df_x <- res_to_df_1d(res_x, probs, data$group[1], output = "rug")
# Needs correct name for ggplot2 internals
df_x$axis <- "x"
df_x$y <- NA
}
if (!is.null(data$y)) {
rangey <- ylim %||% scales$y$dimension()
res_y <- get_hdr_1d(data$y, method_y, probs, n[2], rangey, hdr_membership = FALSE)
df_y <- res_to_df_1d(res_y, probs, data$group[1], output = "rug")
# Needs correct name for ggplot2 internals
df_y$axis <- "y"
df_y$y <- df_y$x
df_y$x <- NA
}
df <- rbind(df_x, df_y)
# Need to remove extra col if only plotting x or y rug
if (is.null(data$x)) df$x <- NULL
if (is.null(data$y)) df$y <- NULL
df
}
)
res_to_df_1d <- function(res, probs, group, output) {
probs <- fix_probs(probs)
if (output == "rug") {
probs_formatted <- scales::percent_format(accuracy = 1)(probs)
df <- res$df_est
# alpha will be mapped to df$probs
df$probs <- scales::percent_format(accuracy = 1)(df$hdr)
df$probs <- ordered(df$probs, levels = probs_formatted)
df$hdr <- NULL
# Discard 100% HDR if it's not in probs:
df <- df[!is.na(df$probs),]
}
df
}
#' @rdname geom_hdr_rug
#' @export
geom_hdr_rug <- function(mapping = NULL, data = NULL,
stat = "hdr_rug", position = "identity",
...,
outside = FALSE,
sides = "bl",
length = unit(0.03, "npc"),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdrRug,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
outside = outside,
sides = sides,
length = length,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_rug
#' @format NULL
#' @usage NULL
#' @export
GeomHdrRug <- ggproto("GeomHdrRug", Geom,
optional_aes = c("x", "y"),
draw_panel = function(data, panel_params, coord, sides = "bl",
outside = FALSE, length = unit(0.03, "npc")) {
if (!inherits(length, "unit")) {
abort("'length' must be a 'unit' object.")
}
rugs <- list()
# For coord_flip, coord$tranform does not flip the sides where to
# draw the rugs. We have to flip them.
if (inherits(coord, 'CoordFlip')) {
sides <- chartr('tblr', 'rlbt', sides)
}
# move the rug to outside the main plot space
if (outside) length <- -length
# Set up data frames for x and y:
data_x <- data[data$axis == "x",]
data_y <- data[data$axis == "y",]
if (nrow(data_x) > 0) {
data_x <- coord$transform(data_x, panel_params)
data_x$width <- resolution(data_x$x, FALSE)
gp_x <- grid::gpar(
col = alpha(data_x$fill, data_x$alpha),
fill = alpha(data_x$fill, data_x$alpha),
lwd = 0
)
# set up x axis rug rasters
if (grepl("b", sides)) {
rugs$x_b <- grid::rectGrob(
x = unit(data_x$x, "native"),
y = unit(0, "npc"),
width = data_x$width,
height = length,
just = "bottom",
gp = gp_x
)
}
if (grepl("t", sides)) {
rugs$x_t <- grid::rectGrob(
x = unit(data_x$x, "native"),
y = unit(1, "npc"),
width = data_x$width,
height = length,
just = "top",
gp = gp_x
)
}
}
if (nrow(data_y) > 0) {
data_y <- coord$transform(data_y, panel_params)
data_y$height <- resolution(data_y$y, FALSE)
gp_y <- grid::gpar(
col = alpha(data_y$fill, data_y$alpha),
fill = alpha(data_y$fill, data_y$alpha),
lwd = 0
)
# set up y axis rug rasters
if (grepl("l", sides)) {
rugs$y_l <- grid::rectGrob(
x = unit(0, "npc"),
y = unit(data_y$y, "native"),
width = length,
height = data_y$height,
just = "left",
gp = gp_y
)
}
if (grepl("r", sides)) {
rugs$y_r <- grid::rectGrob(
x = unit(1, "npc"),
y = unit(data_y$y, "native"),
width = length,
height = data_y$height,
just = "right",
gp = gp_y
)
}
}
grid::gTree(children = do.call(grid::gList, rugs))
},
default_aes = aes(fill = "grey20", alpha = NA),
draw_key = draw_key_rect
)
================================================
FILE: R/hdr_rug_fun.R
================================================
#' Rug plots of highest density region estimates of univariate pdfs
#'
#' Compute and plot the highest density regions (HDRs) of specified univariate pdf(s).
#' Note, the plotted objects have probabilities mapped to the `alpha` aesthetic by default.
#'
#' @section Aesthetics: `geom_hdr_rug_fun()` understands the following aesthetics (required
#' aesthetics are in bold):
#'
#' - x
#' - y
#' - alpha
#' - fill
#' - group
#' - subgroup
#'
#' @section Computed variables:
#'
#' \describe{ \item{probs}{The probability of the highest density region, specified
#' by `probs`, corresponding to each point.} }
#'
#' @inheritParams ggplot2::geom_rug
#' @inheritParams stat_hdr_rug
#' @param fun_x,fun_y Functions, the univariate probability density function for the x- and/or y-axis.
#' First argument must be vectorized.
#' @param args_x,args_y Named list of additional arguments passed on to `fun_x` and/or `fun_y`.
#' @name geom_hdr_rug_fun
#' @rdname geom_hdr_rug_fun
#'
#' @examples
#' # Plotting data with exponential marginals
#' df <- data.frame(x = rexp(1e3), y = rexp(1e3))
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr_rug_fun(fun_x = dexp, fun_y = dexp) +
#' geom_point(size = .5) +
#' coord_fixed()
#'
#' # without data/aesthetic mappings
#' ggplot() +
#' geom_hdr_rug_fun(fun_x = dexp, fun_y = dexp, xlim = c(0, 7), ylim = c(0, 7)) +
#' coord_fixed()
#'
#'
#' # Plotting univariate normal data, estimating mean and sd
#' df <- data.frame(x = rnorm(1e4, mean = 1, sd = 3))
#'
#' # estimating parameters
#' mu_hat <- mean(df$x)
#' sd_hat <- sd(df$x)
#'
#' ggplot(df, aes(x)) +
#' geom_hdr_rug_fun(fun_x = dnorm, args_x = list(mean = mu_hat, sd = sd_hat)) +
#' geom_density()
#'
#' # Equivalent to `method_norm_1d()` with `geom_hdr_rug()`
#' ggplot(df, aes(x)) +
#' geom_hdr_rug(method = method_norm_1d()) +
#' geom_density()
NULL
#' @rdname geom_hdr_rug_fun
#' @export
stat_hdr_rug_fun <- function(mapping = NULL, data = NULL,
geom = "hdr_rug_fun", position = "identity",
...,
fun_x = NULL, fun_y = NULL,
args_x = list(), args_y = list(),
probs = c(.99, .95, .8, .5),
xlim = NULL, ylim = NULL, n = 512,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = StatHdrRugFun,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
fun_x = fun_x,
fun_y = fun_y,
args_x = args_x,
args_y = args_y,
probs = probs,
xlim = xlim,
ylim = ylim,
n = n,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_rug_fun
#' @format NULL
#' @usage NULL
#' @export
StatHdrRugFun <- ggproto("StatHdrRugFun", Stat,
default_aes = aes(order = after_stat(probs), alpha = after_stat(probs)),
# if fun_x or fun_y are unspecified data might be dropped
dropped_aes = c("x", "y"),
# very similar to StatHdrRug$compute_group(),
# only difference are the parameters fun + args (vs. method + parameters)
# -- this prevents factoring into one compute_group() method,
# compute_group()'s arguments are different.
compute_group = function(self, data, scales, na.rm = FALSE,
fun_x = NULL, fun_y = NULL, args_x = list(), args_y = list(),
probs = c(.99, .95, .8, .5),
n = 512, xlim = NULL, ylim = NULL) {
# Recycle for both x, y
if (length(n) == 1) n <- rep(n, 2)
# Estimate marginal densities
# Initialize dfs for x and y axes,
# in case only x or y are supplied:
df_x <- data.frame()
df_y <- data.frame()
if (!is.null(fun_x)) {
if (is.null(xlim) & is.null(scales$x)) {
stop("`xlim` must be specified if `x` aesthetic not provided to `StatHdrRugFun`")
}
rangex <- xlim %||% scales$x$dimension()
res_x <- get_hdr_1d(data$x, method = "fun", probs, n[1], rangex, hdr_membership = FALSE, fun = fun_x, args = args_x)
df_x <- res_to_df_1d(res_x, probs, data$group[1], output = "rug")
# Needs correct name for ggplot2 internals
df_x$axis <- "x"
df_x$y <- NA
}
if (!is.null(fun_y)) {
if (is.null(ylim) & is.null(scales$y)) {
stop("`ylim` must be specified if `y` aesthetic not provided to `StatHdrRugFun`")
}
rangey <- ylim %||% scales$y$dimension()
res_y <- get_hdr_1d(data$y, method = "fun", probs, n[1], rangey, hdr_membership = FALSE, fun = fun_y, args = args_y)
df_y <- res_to_df_1d(res_y, probs, data$group[1], output = "rug")
# Needs correct name for ggplot2 internals
df_y$axis <- "y"
df_y$y <- df_y$x
df_y$x <- NA
}
df <- rbind(df_x, df_y)
# Need to remove extra col if only plotting x or y rug
if (is.null(fun_x)) df$x <- NULL
if (is.null(fun_y)) df$y <- NULL
df
}
)
#' @rdname geom_hdr_rug_fun
#' @export
geom_hdr_rug_fun <- function(mapping = NULL, data = NULL,
stat = "hdr_rug_fun", position = "identity",
...,
outside = FALSE,
sides = "bl",
length = unit(0.03, "npc"),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (is.null(data)) data <- ensure_nonempty_data
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomHdrRugFun,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
outside = outside,
sides = sides,
length = length,
na.rm = na.rm,
...
)
)
}
#' @rdname geom_hdr_rug_fun
#' @format NULL
#' @usage NULL
#' @export
GeomHdrRugFun <- ggproto("GeomHdrRugFun", GeomHdrRug)
================================================
FILE: R/helpers-ggplot2.R
================================================
# unexported functions from ggplot2
`%||%` <- function(x, y) {
if (is.null(x)) y else x
}
tibble0 <- function(...) {
tibble::tibble(..., .name_repair = "minimal")
}
unique0 <- function(x, ...) {
if (is.null(x)) x else vctrs::vec_unique(x, ...)
}
isoband_z_matrix <- function(data) {
x_pos <- as.integer(factor(data$x, levels = sort(unique0(data$x))))
y_pos <- as.integer(factor(data$y, levels = sort(unique0(data$y))))
nrow <- max(y_pos)
ncol <- max(x_pos)
raster <- matrix(NA_real_, nrow = nrow, ncol = ncol)
raster[cbind(y_pos, x_pos)] <- data$z
raster
}
xyz_to_isobands <- function(data, breaks) {
isoband::isobands(x = sort(unique0(data$x)), y = sort(unique0(data$y)),
z = isoband_z_matrix(data), levels_low = breaks[-length(breaks)],
levels_high = breaks[-1])
}
xyz_to_isolines <- function(data, breaks) {
isoband::isolines(x = sort(unique0(data$x)), y = sort(unique0(data$y)),
z = isoband_z_matrix(data), levels = breaks)
}
iso_to_polygon <- function(iso, group = 1) {
lengths <- vapply(iso, function(x) length(x$x), integer(1))
if (all(lengths == 0)) {
warning("Zero contours were generated")
return(tibble0())
}
levels <- names(iso)
xs <- unlist(lapply(iso, "[[", "x"), use.names = FALSE)
ys <- unlist(lapply(iso, "[[", "y"), use.names = FALSE)
ids <- unlist(lapply(iso, "[[", "id"), use.names = FALSE)
item_id <- rep(seq_along(iso), lengths)
groups <- paste(group, sprintf("%03d", item_id), sep = "-")
groups <- factor(groups)
tibble0(level = rep(levels, lengths), x = xs, y = ys,
piece = as.integer(groups), group = groups, subgroup = ids,
.size = length(xs))
}
iso_to_path <- function(iso, group = 1) {
lengths <- vapply(iso, function(x) length(x$x), integer(1))
if (all(lengths == 0)) {
warning("Zero contours were generated")
return(tibble0())
}
levels <- names(iso)
xs <- unlist(lapply(iso, "[[", "x"), use.names = FALSE)
ys <- unlist(lapply(iso, "[[", "y"), use.names = FALSE)
ids <- unlist(lapply(iso, "[[", "id"), use.names = FALSE)
item_id <- rep(seq_along(iso), lengths)
groups <- paste(group, sprintf("%03d", item_id), sprintf("%03d",
ids), sep = "-")
groups <- factor(groups)
tibble0(level = rep(levels, lengths), x = xs, y = ys,
piece = as.integer(groups), group = groups, .size = length(xs))
}
empty <- function(df) {
is.null(df) || nrow(df) == 0 || ncol(df) == 0 || inherits(df, "waiver")
}
ensure_nonempty_data <- function(data) {
if (empty(data)) {
tibble0(group = 1, .size = 1)
}
else {
data
}
}
================================================
FILE: R/helpers.R
================================================
# this script contains several unexported helper functions
# normalization/scaling functions
normalize <- function(v) v / sum(v)
# numerical approximation for finding hdr
# if method = "histogram", don't want to use uniroot, runs into issue if n is small
find_cutoff <- function(df, conf, uniroot = TRUE) {
if (length(conf) > 1) return(vapply(conf, function(x) find_cutoff(df, x, uniroot), numeric(1)))
# sort df rows by fhat
df <- df[order(df$fhat, decreasing = TRUE),]
# compute cumsum of probs
df$cumprob <- cumsum(df$fhat_discretized)
# determine cutoff
max(df[df$cumprob >= conf,]$fhat)
}
================================================
FILE: R/method.R
================================================
# methods that return est pdf as closure ---------------------------------
#' Bivariate parametric normal HDR estimator
#'
#' Function used to specify bivariate normal density estimator
#' for `get_hdr()` and layer functions (e.g. `geom_hdr()`).
#'
#' For more details on the use and implementation of the `method_*()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @examples
#' # Normal estimator is useful when an assumption of normality is appropriate
#' set.seed(1)
#' df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_mvnorm(), xlim = c(-4, 4), ylim = c(-4, 4)) +
#' geom_point(size = 1)
#'
#' # Can also be used with `get_hdr()` for numerical summary of HDRs
#' res <- get_hdr(df, method = method_mvnorm())
#' str(res)
#'
#' @export
method_mvnorm <- function() {
function(data) {
data_matrix <- with(data, cbind(x, y))
mu_hat <- colMeans(data_matrix)
R <- chol(cov(data_matrix)) # R'R = crossprod(R) = S
function(x, y) {
X <- cbind(x, y)
tmp <- backsolve(R, t(X) - mu_hat, transpose = TRUE)
logretval <- -sum(log(diag(R))) - log(2 * pi) - 0.5 * colSums(tmp^2)
exp( logretval )
}
}
}
# methods that return closures that compute a grid ------------------------
#' Bivariate kernel density HDR estimator
#'
#' Function used to specify bivariate kernel density estimator
#' for `get_hdr()` and layer functions (e.g. `geom_hdr()`).
#'
#' For more details on the use and implementation of the `method_*()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @inheritParams ggplot2::stat_density2d
#'
#' @examples
#' set.seed(1)
#' df <- data.frame(x = rnorm(1e3, sd = 3), y = rnorm(1e3, sd = 3))
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_kde()) +
#' geom_point(size = 1)
#'
#' # The defaults of `method_kde()` are the same as the estimator for `ggplot2::geom_density_2d()`
#' ggplot(df, aes(x, y)) +
#' geom_density_2d_filled() +
#' geom_hdr_lines(method = method_kde(), probs = seq(.1, .9, by = .1)) +
#' theme(legend.position = "none")
#'
#' # The bandwidth of the estimator can be set directly with `h` or scaled with `adjust`
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_kde(h = 1)) +
#' geom_point(size = 1)
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_kde(adjust = 1/2)) +
#' geom_point(size = 1)
#'
#' # Can also be used with `get_hdr()` for numerical summary of HDRs
#' res <- get_hdr(df, method = method_kde())
#' str(res)
#'
#' @export
method_kde <- function(h = NULL, adjust = c(1, 1)) {
function(data, n, rangex, rangey) {
if (is.null(h)) {
h <- c(MASS::bandwidth.nrd(data$x), MASS::bandwidth.nrd(data$y))
}
h <- h * adjust
kdeout <- MASS::kde2d(
x = data$x, y = data$y, n = n, h = h,
lims = c(rangex, rangey)
)
df <- with(kdeout, expand.grid("x" = x, "y" = y))
df$fhat <- as.vector(kdeout$z)
df
}
}
#' Bivariate histogram HDR estimator
#'
#' Function used to specify bivariate histogram density estimator
#' for `get_hdr()` and layer functions (e.g. `geom_hdr()`).
#'
#' For more details on the use and implementation of the `method_*()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @param bins Number of bins along each axis.
#' Either a vector of length 2 or a scalar value which is recycled for both dimensions.
#' Defaults to normal reference rule (Scott, pg 87).
#' @param smooth If `TRUE`, HDRs are smoothed by the marching squares algorithm.
#' @param nudgex,nudgey Horizontal and vertical rules for choosing witness points when `smooth == TRUE`.
#' Accepts character vector: `"left"`, `"none"`, `"right"` (`nudgex`) or `"down"`, `"none"`, `"up"` (`nudgey`).
#'
#' @references Scott, David W. Multivariate Density Estimation (2e), Wiley.
#'
#' @examples
#' \dontrun{
#'
#' # Histogram estimators can be useful when data has boundary constraints
#' set.seed(1)
#' df <- data.frame(x = rexp(1e3), y = rexp(1e3))
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_histogram()) +
#' geom_point(size = 1)
#'
#' # The resolution of the histogram estimator can be set via `bins`
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_histogram(bins = c(8, 25))) +
#' geom_point(size = 1)
#'
#' # By setting `smooth = TRUE`, we can graphically smooth the "blocky" HDRs
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_histogram(smooth = TRUE)) +
#' geom_point(size = 1)
#'
#' # However, we need to set `nudgex` and `nudgey` to align the HDRs correctly
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_histogram(smooth = TRUE, nudgex = "left", nudgey = "down")) +
#' geom_point(size = 1)
#'
#' # Can also be used with `get_hdr()` for numerical summary of HDRs
#' res <- get_hdr(df, method = method_histogram())
#' str(res)
#' }
#'
#' @export
method_histogram <- function(bins = NULL, smooth = FALSE, nudgex = "none", nudgey = "none") {
# n is an argument, but it is not used
function(data, n, rangex, rangey) {
if (is.null(bins)) {
bins <- numeric(2)
# define histogram mesh according to Scott p. 87
rho <- cor(data$x, data$y)
hx <- 3.504 * sd(data$x) * (1 - rho^2)^(3/8) * nrow(data)^(-1/4)
hy <- 3.504 * sd(data$y) * (1 - rho^2)^(3/8) * nrow(data)^(-1/4)
bins[1] <- round((rangex[2] - rangex[1]) / hx)
bins[2] <- round((rangey[2] - rangey[1]) / hy)
} else if (length(bins == 1)) {
bins <- rep(bins, 2)
}
xvals <- data$x
yvals <- data$y
xbtwn <- (rangex[1] <= xvals & xvals <= rangex[2])
if (!all(xbtwn)) {
xvals <- xvals[xbtwn]
yvals <- yvals[xbtwn]
}
ybtwn <- (rangey[1] <= yvals & yvals <= rangey[2])
if (!all(ybtwn)) {
xvals <- xvals[ybtwn]
yvals <- yvals[ybtwn]
}
sx <- seq(rangex[1], rangex[2], length.out = bins[1] + 1)
sy <- seq(rangey[1], rangey[2], length.out = bins[2] + 1)
de_x <- sx[2] - sx[1]
de_y <- sy[2] - sy[1]
box_area <- de_x * de_y
xbin_mdpts <- sx[-(bins[1]+1)] + de_x/2
ybin_mdpts <- sy[-(bins[2]+1)] + de_y/2
xleft <- sx[-(bins[1]+1)]
xright <- sx[-1]
ybottom <- sy[-(bins[2]+1)]
ytop <- sy[-1]
df_cuts <- data.frame("xbin" = cut(xvals, sx), "ybin" = cut(yvals, sy))
df <- with(df_cuts, expand.grid("xbin" = levels(xbin), "ybin" = levels(ybin)))
df$n <- with(df_cuts, as.vector(table(xbin, ybin)))
df$xbin_midpt <- xbin_mdpts[as.integer(df$xbin)]
df$ybin_midpt <- ybin_mdpts[as.integer(df$ybin)]
df$xmin <- df$xbin_midpt - de_x/2
df$xmax <- df$xbin_midpt + de_x/2
df$de_x <- de_x
df$ymin <- df$ybin_midpt - de_y/2
df$ymax <- df$ybin_midpt + de_y/2
df$de_y <- de_y
df$fhat <- with(df, n / (sum(n) * box_area))
if (smooth) {
if(nudgex == "left") df$x <- df$xmin
if(nudgex == "none") df$x <- df$xbin_midpt
if(nudgex == "right") df$x <- df$xmax
if(nudgey == "down") df$y <- df$ymin
if(nudgey == "none") df$y <- df$ybin_midpt
if(nudgey == "up") df$y <- df$ymax
} else {
# No nudging if not smoothing
df$x <- df$xbin_midpt
df$y <- df$ybin_midpt
# Evaluate histogram on a grid
# For xyz_to_iso* funs, need tightly packed values for good isobands/lines
# k*k points per histogram footprint
# Higher values of k -> better visuals, more computationally expensive
# Currently determining k heuristically - not based on any theoretical results
# The necessary value of k seems to be O((bins[1]*bins[2])^(-1/3))
# found constant which yields k = 50 for bins[1]*bins[2] = 10^2
k <- if (bins[1] * bins[2] > 10^2) max(floor(225/((bins[1] * bins[2])^(1/3))), 5) else 50
bbins <- bins * k
ssx <- seq(rangex[1], rangex[2], length.out = bbins[1])
ssy <- seq(rangey[1], rangey[2], length.out = bbins[2])
ddf <- expand.grid(x = ssx, y = ssy)
# Need fhat repeated in very particular way for grid:
# e.g.
# k = 2
# df$fhat = 1, 2,
# 3, 4
# ddf$fhat = 1, 1, 2, 2,
# 1, 1, 2, 2,
# 3, 3, 4, 4,
# 3, 3, 4, 4
# m <- matrix(df$fhat, nrow = bins[2], byrow = TRUE)
# ddf$fhat <- as.vector(kronecker(m, matrix(1, k, k)))
fhat <- split(df$fhat, factor(rep(1:bins[2], each = bins[1]))) # split into rows
fhat <- lapply(fhat, function(x) rep(x, each = k)) # repeat within rows (horizontal)
fhat <- lapply(fhat, function(x) rep(x, times = k)) # repeat rows (vertical)
fhat <- unlist(fhat) # concatenate
ddf$fhat <- fhat
df <- ddf
}
df[c("x", "y", "fhat")]
}
}
#' Bivariate frequency polygon HDR estimator
#'
#' Function used to specify bivariate frequency polygon density estimator
#' for `get_hdr()` and layer functions (e.g. `geom_hdr()`).
#'
#' For more details on the use and implementation of the `method_*()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @inheritParams method_histogram
#'
#' @references Scott, David W. Multivariate Density Estimation (2e), Wiley.
#'
#' @examples
#' set.seed(1)
#' df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
#'
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_freqpoly()) +
#' geom_point(size = 1)
#'
#' # The resolution of the frequency polygon estimator can be set via `bins`
#' ggplot(df, aes(x, y)) +
#' geom_hdr(method = method_freqpoly(bins = c(8, 25))) +
#' geom_point(size = 1)
#'
#' # Can also be used with `get_hdr()` for numerical summary of HDRs
#' res <- get_hdr(df, method = method_freqpoly())
#' str(res)
#'
#' @export
method_freqpoly <- function(bins = NULL) {
# n is an argument, but it is not used
function(data, n, rangex, rangey) {
if (is.null(bins)) {
bins <- numeric(2)
# define histogram mesh according to Scott p. 87
# TODO: fill in with rules for frequency polygons
rho <- cor(data$x, data$y)
hx <- 3.504 * sd(data$x) * (1 - rho^2)^(3/8) * nrow(data)^(-1/4)
hy <- 3.504 * sd(data$y) * (1 - rho^2)^(3/8) * nrow(data)^(-1/4)
bins[1] <- round((rangex[2] - rangex[1]) / hx)
bins[2] <- round((rangey[2] - rangey[1]) / hy)
} else {
if (length(bins == 1)) bins <- rep(bins, 2)
}
xvals <- data$x
yvals <- data$y
xbtwn <- (rangex[1] <= xvals & xvals <= rangex[2])
if (!all(xbtwn)) {
xvals <- xvals[xbtwn]
yvals <- yvals[xbtwn]
}
ybtwn <- (rangey[1] <= yvals & yvals <= rangey[2])
if (!all(ybtwn)) {
xvals <- xvals[ybtwn]
yvals <- yvals[ybtwn]
}
de_x <- (rangex[2] - rangex[1]) / bins[1]
de_y <- (rangey[2] - rangey[1]) / bins[2]
rangex[1] <- rangex[1] - de_x
rangex[2] <- rangex[2] + de_x
rangey[1] <- rangey[1] - de_y
rangey[2] <- rangey[2] + de_y
bins <- bins + 2
sx <- seq(rangex[1], rangex[2], length.out = bins[1] + 1)
sy <- seq(rangey[1], rangey[2], length.out = bins[2] + 1)
box_area <- de_x * de_y
xbin_mdpts <- sx[-(bins[1]+1)] + de_x/2
ybin_mdpts <- sy[-(bins[2]+1)] + de_y/2
xleft <- sx[-(bins[1]+1)]
xright <- sx[-1]
ybottom <- sy[-(bins[2]+1)]
ytop <- sy[-1]
df_cuts <- data.frame("xbin" = cut(xvals, sx), "ybin" = cut(yvals, sy))
df <- with(df_cuts, expand.grid("xbin" = levels(xbin), "ybin" = levels(ybin)))
df$n <- with(df_cuts, as.vector(table(xbin, ybin)))
df$xbin_midpt <- xbin_mdpts[as.integer(df$xbin)]
df$ybin_midpt <- ybin_mdpts[as.integer(df$ybin)]
df$xmin <- df$xbin_midpt - de_x/2
df$xmax <- df$xbin_midpt + de_x/2
df$de_x <- de_x
df$ymin <- df$ybin_midpt - de_y/2
df$ymax <- df$ybin_midpt + de_y/2
df$de_y <- de_y
df$fhat <- with(df, n / (sum(n) * box_area))
df$fhat_discretized <- normalize(df$fhat)
grid <- expand.grid(
x = sx[2:bins[1]],
y = sy[2:bins[2]]
)
x_midpts <- unique(df$xbin_midpt)
y_midpts <- unique(df$ybin_midpt)
find_A <- function(coords) {
x <- coords[[1]]
y <- coords[[2]]
row <- data.frame(
x1 = max(x_midpts[x_midpts - x < 0]),
x2 = min(x_midpts[x_midpts - x >= 0]),
y1 = max(y_midpts[y_midpts - y < 0]),
y2 = min(y_midpts[y_midpts - y >= 0])
)
row$fQ11 <- df[df$xbin_midpt == row$x1 & df$ybin_midpt == row$y1, "fhat"]
row$fQ21 <- df[df$xbin_midpt == row$x2 & df$ybin_midpt == row$y1, "fhat"]
row$fQ12 <- df[df$xbin_midpt == row$x1 & df$ybin_midpt == row$y2, "fhat"]
row$fQ22 <- df[df$xbin_midpt == row$x2 & df$ybin_midpt == row$y2, "fhat"]
xy_mat <- with(row, matrix(c(
x2 * y2, -x2 * y1, -x1 * y2, x1 * y1,
-y2, y1, y2, -y1,
-x2, x2, x1, -x1,
1, -1, -1, 1
), nrow = 4, byrow = TRUE))
A <- with(row,
1 / ((x2 - x1) * (y2 - y1)) * xy_mat %*% c(fQ11, fQ12, fQ21, fQ22)
)
row$a00 <- A[1]
row$a10 <- A[2]
row$a01 <- A[3]
row$a11 <- A[4]
row
}
A_list <- apply(grid, 1, find_A, simplify = FALSE)
df_A <- do.call(rbind, A_list)
coeffs_to_surface <- function(row, k) {
sx <- seq(row[["x1"]], row[["x2"]], length.out = k)[-k]
sy <- seq(row[["y1"]], row[["y2"]], length.out = k)[-k]
fit <- function(x, y) row[["a00"]] + row[["a10"]] * x + row[["a01"]] * y + row[["a11"]] * x * y
df <- expand.grid(x = sx, y = sy)
df$fhat <- fit(df$x, df$y)
df
}
# Currently determining k heuristically - not based on any theoretical results
# The necessary value of k seems to be O((bins[1]*bins[2])^(-1/4))
k <- if (bins[1] * bins[2] > 10^2) max(floor(30/((bins[1] * bins[2])^(1/4))), 3) else 10
surface_list <- apply(df_A, 1, coeffs_to_surface, k, simplify = FALSE)
df <- do.call(rbind, surface_list)
df[c("x","y","fhat")]
}
}
================================================
FILE: R/method_1d.R
================================================
# methods that return est pdf as closure ---------------------------------
#' Univariate parametric normal HDR estimator
#'
#' Function used to specify univariate normal density estimator
#' for `get_hdr_1d()` and layer functions (e.g. `geom_hdr_rug()`).
#'
#' For more details on the use and implementation of the `method_*_1d()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @examples
#' # Normal estimators are useful when an assumption of normality is appropriate
#' df <- data.frame(x = rnorm(1e3))
#'
#' ggplot(df, aes(x)) +
#' geom_hdr_rug(method = method_norm_1d()) +
#' geom_density()
#'
#' # Can also be used with `get_hdr_1d()` for numerical summary of HDRs
#' res <- get_hdr_1d(df$x, method = method_norm_1d())
#' str(res)
#'
#' @export
method_norm_1d <- function() {
function(x) {
mu_hat <- mean(x)
sigma_hat <- sd(x)
function(x) dnorm(x, mu_hat, sigma_hat)
}
}
# methods that return closures that compute a grid ------------------------
#' Univariate kernel density HDR estimator
#'
#' Function used to specify univariate kernel density estimator
#' for `get_hdr_1d()` and layer functions (e.g. `geom_hdr_rug()`).
#'
#' For more details on the use and implementation of the `method_*_1d()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @inheritParams stats::density
#'
#' @examples
#' df <- data.frame(x = rnorm(1e3, sd = 3))
#'
#' ggplot(df, aes(x)) +
#' geom_hdr_rug(method = method_kde_1d()) +
#' geom_density()
#'
#' # Details of the KDE can be adjusted with arguments to `method_kde_1d()`
#' ggplot(df, aes(x)) +
#' geom_hdr_rug(method = method_kde_1d(adjust = 1/5)) +
#' geom_density(adjust = 1/5)
#'
#' ggplot(df, aes(x)) +
#' geom_hdr_rug(method = method_kde_1d(kernel = "triangular")) +
#' geom_density(kernel = "triangular")
#'
#' # Can also be used with `get_hdr_1d()` for numerical summary of HDRs
#' res <- get_hdr_1d(df$x, method = method_kde_1d())
#' str(res)
#'
#' @export
method_kde_1d <- function(bw = "nrd0", adjust = 1, kernel = "gaussian", weights = NULL, window = kernel) {
function(x, n, range) {
nx <- length(x)
if (is.null(weights)) {
weights <- rep(1 / nx, nx)
} else {
weights <- normalize(weights)
}
dens <- stats::density(
x,
bw = bw,
adjust = adjust,
kernel = kernel,
weights = weights,
window = window,
n = n,
from = range[1],
to = range[2]
)
data.frame(
x = dens$x,
fhat = dens$y
)
}
}
#' Univariate histogram HDR estimator
#'
#' Function used to specify univariate histogram density estimator
#' for `get_hdr_1d()` and layer functions (e.g. `geom_hdr_rug()`).
#'
#' For more details on the use and implementation of the `method_*_1d()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @param bins Number of bins. Defaults to normal reference rule (Scott, pg 59).
#'
#' @references Scott, David W. Multivariate Density Estimation (2e), Wiley.
#'
#' @examples
#' # Histogram estimators can be useful when data has boundary constraints
#' df <- data.frame(x = rexp(1e3))
#'
#' # Strip chart to visualize 1-d data
#' p <- ggplot(df, aes(x)) +
#' geom_jitter(aes(y = 0), width = 0, height = 2) +
#' scale_y_continuous(name = NULL, breaks = NULL) +
#' coord_cartesian(ylim = c(-3, 3))
#'
#' p
#'
#' p + geom_hdr_rug(method = method_histogram_1d())
#'
#' # The resolution of the histogram estimator can be set via `bins`
#' p + geom_hdr_rug(method = method_histogram_1d(bins = 5))
#'
#' # Can also be used with `get_hdr_1d()` for numerical summary of HDRs
#' res <- get_hdr_1d(df$x, method = method_histogram_1d())
#' str(res)
#'
#' @export
method_histogram_1d <- function(bins = NULL) {
function(x, n, range) {
nx <- length(x)
# Default to normal reference rule (Scott p. 59)
if (is.null(bins)) {
hx <- 3.504 * stats::sd(x) * nx^(-1/3)
bins <- round((range[2] - range[1]) / hx)
}
sx <- seq(range[1], range[2], length.out = bins + 1)
de_x <- sx[2] - sx[1]
midpts <- sx[-(bins+1)] + de_x/2
n <- as.numeric(table(cut(x, sx)))
data.frame(
x = midpts,
fhat = normalize(n)
)
}
}
#' Univariate frequency polygon HDR estimator
#'
#' Function used to specify univariate frequency polygon density estimator
#' for `get_hdr_1d()` and layer functions (e.g. `geom_hdr_rug()`).
#'
#' For more details on the use and implementation of the `method_*_1d()` functions,
#' see `vignette("method", "ggdensity")`.
#'
#' @inheritParams method_histogram_1d
#'
#' @references Scott, David W. Multivariate Density Estimation (2e), Wiley.
#'
#' @examples
#' df <- data.frame(x = rnorm(1e3))
#'
#' # Strip chart to visualize 1-d data
#' p <- ggplot(df, aes(x)) +
#' geom_jitter(aes(y = 0), width = 0, height = 2) +
#' scale_y_continuous(name = NULL, breaks = NULL) +
#' coord_cartesian(ylim = c(-3, 3))
#'
#' p
#'
#' p + geom_hdr_rug(method = method_freqpoly_1d())
#'
#' # The resolution of the frequency polygon estimator can be set via `bins`
#' p + geom_hdr_rug(method = method_freqpoly_1d(bins = 100))
#'
#' # Can also be used with `get_hdr_1d()` for numerical summary of HDRs
#' res <- get_hdr_1d(df$x, method = method_freqpoly_1d())
#' str(res)
#'
#' @export
method_freqpoly_1d <- function(bins = NULL) {
function(x, n, range) {
# Start with output from method_histogram
df <- method_histogram_1d(bins)(x, n, range)
hx <- df$x[2] - df$x[1]
# need to pad df from hist_marginal() w/ bins that have est prob of 0
# so that we can interpolate
df <- rbind(
# add initial bin w/ est prob of 0
data.frame(
x = min(df$x) - hx,
fhat = 0
),
# include original histogram estimator
df,
# add final bin w/ est prob of 0
data.frame(
x = max(df$x) + hx,
fhat = 0
)
)
sx <- seq(range[1], range[2], length.out = n)
interpolate_fhat <- function(x) {
lower_x <- df$x[max(which(df$x < x))]
upper_x <- df$x[min(which(df$x >= x))]
lower_fhat <- df$fhat[max(which(df$x < x))]
upper_fhat <- df$fhat[min(which(df$x >= x))]
lower_fhat + (x - lower_x) * (upper_fhat - lower_fhat) / (upper_x - lower_x)
}
dens <- vapply(sx, interpolate_fhat, numeric(1))
data.frame(
x = sx,
fhat = dens
)
}
}
================================================
FILE: README.Rmd
================================================
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dev = "png", dpi = 300,
fig.height = 3.5,
cache = TRUE
)
```
```{r, include = FALSE}
set.seed(1)
```
# ggdensity
[](https://github.com/jamesotto852/ggdensity/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/jamesotto852/ggdensity)
[](https://cran.r-project.org/package=ggdensity)
[](https://cran.r-project.org/package=ggdensity)
**ggdensity** extends [**ggplot2**](https://github.com/tidyverse/ggplot2) providing more interpretable visualizations of density estimates based on highest density regions (HDRs).
**ggdensity** offers drop-in replacements for [**ggplot2**](https://github.com/tidyverse/ggplot2) functions:
- instead of `ggplot2::geom_density_2d_filled()`{.R}, use `ggdensity::geom_hdr()`{.R};
- instead of `ggplot2::geom_density_2d()`{.R}, use `ggdensity::geom_hdr_lines()`{.R}.
Also included are the functions `geom_hdr_fun()` and `geom_hdr_lines_fun()` for plotting HDRs of user-specified bivariate probability density functions.
## Installation
**ggdensity** is available on CRAN and can be installed with:
``` r
install.packages("ggdensity")
```
Alternatively, you can install the latest development version from [GitHub](https://github.com/) with:
``` r
if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("jamesotto852/ggdensity")
```
## `geom_density_2d_filled()`{.R} vs. `geom_hdr()`{.R}
The standard way to visualize the joint distribution of two continuous variables in **ggplot2** is to use `ggplot2::geom_density_2d()`{.R} or `geom_density_2d_filled()`{.R}. Here's an example:
```{r ex0, message=FALSE}
library("ggplot2"); theme_set(theme_minimal())
theme_update(panel.grid.minor = element_blank())
library("ggdensity")
library("patchwork")
df <- data.frame("x" = rnorm(1000), "y" = rnorm(1000))
p <- ggplot(df, aes(x, y)) + coord_equal()
p + geom_density_2d_filled()
```
While it's a nice looking plot, it isn't immediately clear how we should understand it. That's because `geom_density_2d_filled()`{.R} generates its contours as equidistant level sets of the estimated bivariate density, i.e. taking horizontal slices of the 3d surface at equally-spaced heights, and projecting the intersections down into the plane. So you get a general feel of where the density is high, but not much else. To interpret a contour, you would need to multiply its height by the area it bounds, which of course is very challenging to do by just looking at it.
`geom_hdr()`{.R} tries to get around this problem by presenting you with regions of the estimated distribution that are immediately interpretable:
```{r ex1}
p + geom_hdr()
```
`probs` here tells us the probability bounded by the corresponding region, and the regions are computed to be the smallest such regions that bound that level of probability; these are called highest density regions or HDRs. By default, the plotted regions show the $50\%$, $80\%$, $95\%$, and $99\%$ HDRs of the estimated density, but this can be changed with the `probs` argument to `geom_hdr()`{.R}. Notice that your take-away from the plot made with `geom_density_2d_filled()`{.R} is subtlely yet significantly different than that of the plot made by `geom_hdr()`{.R}.
## Visualizing subpopulations and `geom_hdr_lines()`{.R}
**ggdensity**'s functions were designed to be seamlessly consistent with the rest of the **ggplot2** framework. As a consequence, pretty much everything you would expect to just work does. (Well, we hope! [Let us know](https://github.com/jamesotto852/ggdensity/issues/new) if that's not true.)
For example, because `geom_hdr()` maps probability to the `alpha` aesthetic, the `fill` and `color` aesthetics are available for mapping to variables.
You can use them to visualize subpopulations in your data. For example, in the `penguins` data from [**palmerpenguins**](https://github.com/allisonhorst/palmerpenguins) you may want to look at how the relationship between bill length and flipper length changes across different species of penguins. Here's one way you could look at that:
```{r ex_penguins, warning = FALSE}
library("palmerpenguins")
ggplot(penguins, aes(flipper_length_mm, bill_length_mm, fill = species)) +
geom_hdr(xlim = c(160, 240), ylim = c(30, 70)) +
geom_point(shape = 21)
```
[](https://github.com/jamesotto852/ggdensity/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/jamesotto852/ggdensity)
[](https://cran.r-project.org/package=ggdensity)
[](https://cran.r-project.org/package=ggdensity)
**ggdensity** extends
[**ggplot2**](https://github.com/tidyverse/ggplot2) providing more
interpretable visualizations of density estimates based on highest
density regions (HDRs). **ggdensity** offers drop-in replacements for
[**ggplot2**](https://github.com/tidyverse/ggplot2) functions:
- instead of `ggplot2::geom_density_2d_filled()`, use
`ggdensity::geom_hdr()`;
- instead of `ggplot2::geom_density_2d()`, use
`ggdensity::geom_hdr_lines()`.
Also included are the functions `geom_hdr_fun()` and
`geom_hdr_lines_fun()` for plotting HDRs of user-specified bivariate
probability density functions.
## Installation
**ggdensity** is available on CRAN and can be installed with:
``` r
install.packages("ggdensity")
```
Alternatively, you can install the latest development version from
[GitHub](https://github.com/) with:
``` r
if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("jamesotto852/ggdensity")
```
## `geom_density_2d_filled()` vs. `geom_hdr()`
The standard way to visualize the joint distribution of two continuous
variables in **ggplot2** is to use `ggplot2::geom_density_2d()` or
`geom_density_2d_filled()`. Here’s an example:
``` r
library("ggplot2"); theme_set(theme_minimal())
theme_update(panel.grid.minor = element_blank())
library("ggdensity")
library("patchwork")
df <- data.frame("x" = rnorm(1000), "y" = rnorm(1000))
p <- ggplot(df, aes(x, y)) + coord_equal()
p + geom_density_2d_filled()
```
While it’s a nice looking plot, it isn’t immediately clear how we should
understand it. That’s because `geom_density_2d_filled()` generates its
contours as equidistant level sets of the estimated bivariate density,
i.e. taking horizontal slices of the 3d surface at equally-spaced
heights, and projecting the intersections down into the plane. So you
get a general feel of where the density is high, but not much else. To
interpret a contour, you would need to multiply its height by the area
it bounds, which of course is very challenging to do by just looking at
it.
`geom_hdr()` tries to get around this problem by presenting you with
regions of the estimated distribution that are immediately
interpretable:
``` r
p + geom_hdr()
```
`probs` here tells us the probability bounded by the corresponding
region, and the regions are computed to be the smallest such regions
that bound that level of probability; these are called highest density
regions or HDRs. By default, the plotted regions show the $50\%$,
$80\%$, $95\%$, and $99\%$ HDRs of the estimated density, but this can
be changed with the `probs` argument to `geom_hdr()`. Notice that your
take-away from the plot made with `geom_density_2d_filled()` is subtlely
yet significantly different than that of the plot made by `geom_hdr()`.
## Visualizing subpopulations and `geom_hdr_lines()`
**ggdensity**’s functions were designed to be seamlessly consistent with
the rest of the **ggplot2** framework. As a consequence, pretty much
everything you would expect to just work does. (Well, we hope! [Let us
know](https://github.com/jamesotto852/ggdensity/issues/new) if that’s
not true.)
For example, because `geom_hdr()` maps probability to the `alpha`
aesthetic, the `fill` and `color` aesthetics are available for mapping
to variables. You can use them to visualize subpopulations in your data.
For example, in the `penguins` data from
[**palmerpenguins**](https://github.com/allisonhorst/palmerpenguins) you
may want to look at how the relationship between bill length and flipper
length changes across different species of penguins. Here’s one way you
could look at that:
``` r
library("palmerpenguins")
#>
#> Attaching package: 'palmerpenguins'
#> The following objects are masked from 'package:datasets':
#>
#> penguins, penguins_raw
ggplot(penguins, aes(flipper_length_mm, bill_length_mm, fill = species)) +
geom_hdr(xlim = c(160, 240), ylim = c(30, 70)) +
geom_point(shape = 21)
```
Or you could facet the plot:
The main point here is that you should really think of `geom_hdr()` and
`geom_hdr_lines()` as drop-in replacements for functions like
`geom_density_2d_filled()`, `geom_density2d()`, and so on, and you can
expect all of the rest of the **ggplot2** stuff to just work.
## A deeper cut illustrating **ggplot2** integration
The underlying stat used by `geom_hdr()` creates the computed variable
`probs` that can be mapped in the standard way you map computed
variables in **ggplot2**, with `after_stat()`.
For example, `geom_hdr()` and `geom_hdr_lines()` map `probs` to the
`alpha` aesthetic by default. But you can override it like this, just be
sure to override the `alpha` aesthetic by setting `alpha = 1`.
``` r
ggplot(faithful, aes(eruptions, waiting)) +
geom_hdr(
aes(fill = after_stat(probs)),
alpha = 1, xlim = c(0, 8), ylim = c(30, 110)
)
```
``` r
ggplot(faithful, aes(eruptions, waiting)) +
geom_hdr_lines(
aes(color = after_stat(probs)),
alpha = 1, xlim = c(0, 8), ylim = c(30, 110)
)
```
## Statistics details
In addition to trying to make the visuals clean and the functions what
you would expect as a **ggplot2** user, we’ve spent considerable effort
in trying to ensure that the graphics you’re getting with **ggdensity**
are statistically rigorous and provide a range of estimation options for
more detailed control.
To that end, you can pass a `method` argument into `geom_hdr()` and
`geom_hdr_lines()` that allows you to specify various nonparametric and
parametric ways to estimate the underlying bivariate distribution, and
we have plans for even more. Each of the estimators below offers
advantages in certain contexts. For example, histogram estimators result
in HDRs that obey constrained supports. Normal estimators can be helpful
in providing simplified visuals that give the viewer a sense of where
the distributions are, potentially at the expense of over-simplifying
and removing important features of how the variables (co-)vary.
The `method` argument may be specified either as a character vector
(`method = "kde"`) or as a function call (`method = method_kde()`). When
a function call is used, it may be possible to specify parameters
governing the density estimation procedure. For example, `method_kde()`
accepts parameters `h` and `adjust`, both related to the kernel’s
bandwidth. For details see `?method_kde` or
`vignette("method", "ggdensity")`.
## If you know your PDF
The above discussion has focused around densities that are estimated
from data. But in some instances, you have the distribution in the form
of a function that encodes the [joint
PDF](https://en.wikipedia.org/wiki/Probability_density_function). In
those circumstances, you can use `geom_hdr_fun()` and
`geom_hdr_lines_fun()` to make the analogous plots. These functions
behave similarly to `geom_function()` from
[**ggplot2**](https://github.com/tidyverse/ggplot2), accepting the
argument `fun` specifying the pdf to be summarized. Here’s an example:
``` r
f <- function(x, y) dnorm(x) * dgamma(y, 5, 3)
ggplot() +
geom_hdr_fun(fun = f, xlim = c(-4, 4), ylim = c(0, 5))
```
### Visualizing custom parametric density estimates with `geom_hdr_fun()`
In addition to all of the methods of density estimation available with
`geom_hdr()`, one of the perks of having `geom_hdr_fun()` is that it
allows you to plot parametric densities that you estimate outside the
**ggdensity** framework. The basic idea is that you fit your
distribution outside **ggdensity** calls with your method of choice, say
maximum likelihood, and then plug the maximum likelihood estimate into
the density formula to obtain a function to plug into `geom_hdr_fun()`.
Here’s an example of how you can do that that assuming that the
underlying data are independent and exponentially distributed with
unknown rates.
``` r
set.seed(123)
th <- c(3, 5)
df <- data.frame("x" = rexp(1000, th[1]), "y" = rexp(1000, th[2]))
# construct the likelihood function
l <- function(th) {
log_liks <- apply(df, 1, function(xy) {
dexp(xy[1], rate = th[1], log = TRUE) +
dexp(xy[2], rate = th[2], log = TRUE)
})
sum(log_liks)
}
# compute the mle
(th_hat <- optim(c(2, 2), l, control = list(fnscale = -1))$par)
#> [1] 2.912736 5.032125
# construct the parametric density estimate
f <- function(x, y, th) dexp(x, th[1]) * dexp(y, th[2])
# pass estimated density into geom_hdr_fun()
ggplot(df, aes(x, y)) +
geom_hdr_fun(fun = f, args = list(th = th_hat)) +
geom_point(shape = 21, fill = "lightgreen", alpha = .25) +
coord_equal()
```
## Other perks
### `geom_hdr_points()`
Inspired by
[**ggpointdensity**](https://github.com/LKremer/ggpointdensity),
**ggdensity** provides a scatterplot geom whereby the individual data
points can be seen simultaneously with HDRs. This is most useful in
situations with significant overplotting.
``` r
p_points <- ggplot(diamonds, aes(carat, price)) +
geom_point()
p_hdr_points <- ggplot(diamonds, aes(carat, price)) +
geom_hdr_points()
p_points + p_hdr_points
```
### `geom_hdr_rug()`
Rug plots are standard additions to plots with densities:
``` r
ggplot(cars, aes(speed, dist)) +
geom_density_2d() +
geom_point() +
geom_rug()
```
With HDRs, these can be used to visualize joint and marginal HDRs
simultaneously. The marginal HDRs are computed off of only the
corresponding `x` and `y` aesthetic variables. Note that these can be
substantially different: the joint HDR is *not* the
[product](https://en.wikipedia.org/wiki/Cartesian_product) of the
marginal HDRs.
``` r
ggplot(cars, aes(speed, dist)) +
geom_hdr() +
geom_point(color = "red") +
geom_hdr_rug()
```
Like `geom_rug()`, these can be placed on different sides of the object:
``` r
ggplot(cars, aes(speed, dist)) +
geom_hdr() +
geom_point(color = "red") +
geom_hdr_rug(sides = "tr", outside = TRUE) +
coord_cartesian(clip = "off")
```
We sometimes find it easier to view if the rug intervals are colored:
``` r
ggplot(cars, aes(speed, dist)) +
geom_hdr() +
geom_point(color = "red") +
geom_hdr_rug(aes(fill = after_stat(probs)), length = unit(.2, "cm"), alpha = 1) +
scale_fill_viridis_d(option = "magma", begin = .8, end = 0)
```
### Numerical summaries of HDRs
It is possible to access numerical summaries of the estimated densities
and HDRs computed by **ggdensity** with `get_hdr()`:
``` r
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
res <- get_hdr(df, method = "kde")
str(res)
#> List of 3
#> $ df_est:'data.frame': 10000 obs. of 5 variables:
#> ..$ x : num [1:10000] -3.05 -2.99 -2.93 -2.86 -2.8 ...
#> ..$ y : num [1:10000] -3.13 -3.13 -3.13 -3.13 -3.13 ...
#> ..$ fhat : num [1:10000] 1.58e-09 4.49e-09 1.30e-08 3.66e-08 9.83e-08 ...
#> ..$ fhat_discretized: num [1:10000] 6.43e-12 1.83e-11 5.29e-11 1.49e-10 4.00e-10 ...
#> ..$ hdr : num [1:10000] 1 1 1 1 1 1 1 1 1 1 ...
#> $ breaks: Named num [1:5] 0.00257 0.00887 0.02929 0.07574 Inf
#> ..- attr(*, "names")= chr [1:5] "99%" "95%" "80%" "50%" ...
#> $ data :'data.frame': 1000 obs. of 3 variables:
#> ..$ x : num [1:1000] -0.817 -2.463 -1.343 0.136 0.883 ...
#> ..$ y : num [1:1000] -0.5277 -1.4411 -1.9568 0.0287 1.5382 ...
#> ..$ hdr_membership: num [1:1000] 0.5 0.99 0.95 0.5 0.8 0.99 0.8 0.95 0.5 0.5 ...
```
Similarly, there is `get_hdr_1d()` for univariate data:
``` r
x <- rnorm(1e3)
res <- get_hdr_1d(x, method = "kde")
str(res)
#> List of 3
#> $ df_est:'data.frame': 512 obs. of 4 variables:
#> ..$ x : num [1:512] -2.89 -2.88 -2.86 -2.85 -2.84 ...
#> ..$ fhat : num [1:512] 0.0044 0.00459 0.00478 0.00499 0.00519 ...
#> ..$ fhat_discretized: num [1:512] 5.46e-05 5.70e-05 5.94e-05 6.19e-05 6.45e-05 ...
#> ..$ hdr : num [1:512] 1 1 1 1 1 1 1 1 1 1 ...
#> $ breaks: Named num [1:5] 0.0141 0.0562 0.1756 0.3167 Inf
#> ..- attr(*, "names")= chr [1:5] "99%" "95%" "80%" "50%" ...
#> $ data :'data.frame': 1000 obs. of 2 variables:
#> ..$ x : num [1:1000] -0.4301 -1.5792 0.1929 -0.4973 -0.0859 ...
#> ..$ hdr_membership: num [1:1000] 0.5 0.95 0.5 0.5 0.5 0.5 0.8 0.5 0.5 0.99 ...
```
For details on the objects returned by these functions, see `?get_hdr`
and `?get_hdr_1d`.
## A caveat and recommendation for cropped HDRs
`geom_hdr()` and related functions were written with the intent of
playing nicely with **ggplot2**, so that what the typical **ggplot2**
user would expect from the rest of the **ggplot2** ecosystem would work
in the same way with **ggdensity**.
One place where the effect isn’t ideal is in the limits of the `x` and
`y` scales. Without getting into too much detail, these key off of the
observed points themselves, and not properties of the estimated density.
This is consistent with `geom_density_2d()` and `stat_smooth()`, for
example: computed aesthetics don’t extend past the range of the data.
One potential danger here is that the estimated HDRs are computed based
on not the estimated density directly, but a discretization of it. This
is how all non-parametric density estimation in R works,
e.g. `MASS::kde2d()`, and most parametric density estimation, too. In
other words: the density estimate itself is only known at points on a
grid over the `x`-`y` aesthetic space. As a consequence, if that range
is too small, it’s possible that a probabilistically non-trivial
proportion of the density is excluded from the computations, biasing the
resulting HDRs.
The punch line is that whenever you see an HDR getting truncated by the
window of the plot, it’s probably a good idea to manually increase the
aesthetic limits and use `coord_cartesian()` to zoom in as needed.
Here’s an example using the previously created graphic. The limits given
to `coord_cartesian()` and the call to `scale_y_continuous()` is simply
an effort to make the third plot comparable to the first.
*Note:* The support of the data isn’t respected here-the estimated
density doesn’t know speed can’t go negative. That’s not an artifact of
the effect described above, that’s just because that’s what KDE’s do.
``` r
p1 <- ggplot(cars, aes(speed, dist)) +
geom_hdr() +
geom_point(color = "red") +
guides(alpha = "none") +
ggtitle("Default geom_hdr()")
p2 <- ggplot(cars, aes(speed, dist)) +
geom_hdr(xlim = c(-20, 50), ylim = c(-40, 140)) +
geom_point(color = "red") +
guides(alpha = "none") +
ggtitle("Manually set xlim, ylim")
p3 <- ggplot(cars, aes(speed, dist)) +
geom_hdr(xlim = c(-20, 50), ylim = c(-40, 140)) +
geom_point(color = "red") +
guides(alpha = "none") +
scale_y_continuous(breaks = 25*(0:5)) +
coord_cartesian(xlim = c(4, 25), ylim = c(-1, 120)) +
ggtitle("Zoom with coord_cartesian()")
(p1 / p2 / p3) & theme(title = element_text(size = 9))
```
## Related projects
There are a few other great packages out there you should know about if
you’re interested in **ggdensity**.
The [**ggdist**](https://mjskay.github.io/ggdist/) package provides
several flexible geoms for visualizing distributions of data, mostly
univariate data.
The [**hdrcde**](https://pkg.robjhyndman.com/hdrcde/index.html) package
allows you to make bivariate HDR plots as well. At the surface, the main
difference is that **hdrcde** doesn’t use **ggplot2** graphics; however,
under the hood there are many more differences. (More coming on
explaining these discrepancies.)
The code illustrating the two strategies is quite simple, but trying to
make the graphics more directly comparable requires some effort. Here’s
a pretty good rendition on the `faithful` dataset, which has 272
observations.
``` r
p_hdr_scale <- ggplot(faithful, aes(eruptions, waiting)) +
geom_hdr(
xlim = scales::expand_range(range(faithful$eruptions), mul = .25),
ylim = scales::expand_range(range(faithful$waiting), mul = .25)
) +
geom_point(color = "red") +
scale_x_continuous(breaks = 0:6) +
scale_y_continuous(breaks = (3:10)*10) +
guides(alpha = "none")
den <- with(faithful,
MASS::kde2d(eruptions, waiting, n = 100, lims = c(0,6,30,105))
)
if (!requireNamespace("hdrcde")) install.packages("hdrcde")
#> Loading required namespace: hdrcde
library("hdrcde")
#> This is hdrcde 3.5.0
p_den <- ~ with(faithful,
plot(
hdr.2d(eruptions, waiting, prob = c(50, 80, 95, 99), den = den),
pointcol = "red",
show.points = TRUE,
xlim = c(0, 6),
ylim = c(30, 105)
)
)
par(mar = c(0,1.75,0,0), bg = NA)
p_hdr_scale +
coord_cartesian(xlim = c(0, 6), ylim = c(30, 105), expand = FALSE) +
wrap_elements(panel = p_den, clip = FALSE)
```
These look quite different, and they are. It’s worth noting that even
within **hdrcde** there is variability as well:
``` r
par(mar = c(3, 3, 1, 1) + 0.1, mfrow = c(1, 2))
with(faithful,
plot(
hdr.2d(eruptions, waiting, prob = c(50, 80, 95, 99),
kde.package = "ash", xextend = .20),
pointcol = "red",
show.points = TRUE,
xlim = c(0, 6),
ylim = c(30, 105)
)
)
with(faithful,
plot(
hdr.2d(eruptions, waiting, prob = c(50, 80, 95, 99),
kde.package = "ks", xextend = .20),
pointcol = "red",
show.points = TRUE,
xlim = c(0, 6),
ylim = c(30, 105)
)
)
```
[**gghdr**](https://sayani07.github.io/gghdr/) is somewhat of a
**ggplot2** port of **hdrcde**, developed by some of the same team
members. In some ways, it’s very similar to **ggdensity**. For example,
it contains a function `gghdr::geom_hdr_rug()` that does effectively the
same as `ggdensity::geom_hdr_rug()`; it implements a kind of
`ggdensity::geom_hdr_pointdensity()` via a function `gghdr::hdr_bin()`
plus the color aesthetic to `geom_point()`; and it provides a boxplot
alternative `gghdr::geom_hdr_boxplot()`. To the extent the similarities
between **ggdensity** and **hdrcde**/**gghdr** exist (and they obviously
do), they are an example of [convergent
evolution](https://en.wikipedia.org/wiki/Convergent_evolution). The
present authors only discovered those projects after writing most of
**ggdensity**, unfortunately. Interestingly, we also had designs on the
CDE part as well (“conditional density estimation”, think models);
however had not implemented it before seeing **hdrcde**. You can expect
those to come down the road.
Perhaps the most important difference between **ggdensity** and
**gghdr** is that the latter doesn’t implement bivariate HDRs in the
**ggplot2** framework, which was the original motivation of
**ggdensity**. For that purpose, it seems the only project available is
**ggdensity**.
================================================
FILE: _pkgdown.yml
================================================
url: https://jamesotto852.github.io/ggdensity/
template:
bootstrap: 5
reference:
- title: "2-dimensional highest density regions"
- subtitle: "Plotting functions"
contents:
- geom_hdr
- geom_hdr_fun
- geom_hdr_points
- geom_hdr_points_fun
- subtitle: "Bivariate density estimation methods"
contents:
- method_freqpoly
- method_histogram
- method_kde
- method_mvnorm
- subtitle: "Accessing estimated bivariate densities and regions"
desc: "All 2D plotting functions use this function to determine HDRs"
contents:
- get_hdr
- title: "1-dimensional highest density regions"
- subtitle: "Plotting functions"
contents:
- geom_hdr_rug
- geom_hdr_rug_fun
- subtitle: "Univariate density estimation methods"
contents:
- method_kde_1d
- method_norm_1d
- method_freqpoly_1d
- method_histogram_1d
- subtitle: "Accessing estimated univariate densities and regions"
desc: "All 1D plotting functions use this function to determine HDRs"
contents:
- get_hdr_1d
- title: "Package-level documentation"
contents:
- ggdensity
- package-ggdensity
================================================
FILE: cran-comments.md
================================================
## R CMD check results
0 errors | 0 warnings | 0 notes
## revdepcheck results
We checked 6 reverse dependencies (5 from CRAN + 1 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.
* We saw 0 new problems
* We failed to check 0 packages
================================================
FILE: ggdensity.Rproj
================================================
Version: 1.0
RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: knitr
LaTeX: pdfLaTeX
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
LineEndingConversion: Posix
BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
================================================
FILE: man/geom_hdr.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr.R, R/hdr_lines.R
\docType{data}
\name{geom_hdr}
\alias{geom_hdr}
\alias{stat_hdr}
\alias{StatHdr}
\alias{GeomHdr}
\alias{stat_hdr_lines}
\alias{StatHdrLines}
\alias{geom_hdr_lines}
\alias{GeomHdrLines}
\title{Highest density regions of a 2D density estimate}
\usage{
stat_hdr(
mapping = NULL,
data = NULL,
geom = "hdr",
position = "identity",
...,
method = "kde",
probs = c(0.99, 0.95, 0.8, 0.5),
n = 100,
xlim = NULL,
ylim = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr(
mapping = NULL,
data = NULL,
stat = "hdr",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{method}{Density estimator to use, accepts character vector:
\code{"kde"},\code{"histogram"}, \code{"freqpoly"}, or \code{"mvnorm"}.
Alternatively accepts functions which return closures corresponding to density estimates,
see \code{?get_hdr} or \code{vignette("method", "ggdensity")}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{n}{Resolution of grid defined by \code{xlim} and \code{ylim}.
Ignored if \code{method = "histogram"} or \code{method = "freqpoly"}.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
}
\description{
Perform 2D density estimation, compute and plot the resulting highest density regions.
\code{geom_hdr()} draws filled regions and \code{geom_hdr_lines()} draws lines outlining the regions.
Note, the plotted objects have probabilities mapped to the \code{alpha} aesthetic by default.
}
\section{Aesthetics}{
\code{geom_hdr()} and \code{geom_hdr_lines()} understand the following aesthetics (required
aesthetics are in bold):
\itemize{
\item \strong{x}
\item \strong{y}
\item alpha
\item color
\item fill (only \code{geom_hdr})
\item group
\item linetype
\item linewidth
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability associated with the highest density region, specified
by \code{probs} argument.} }
}
\examples{
# Basic simulated data with bivariate normal data and various methods
df <- data.frame(x = rnorm(1000), y = rnorm(1000))
p <- ggplot(df, aes(x, y)) + coord_equal()
p + geom_hdr()
p + geom_hdr(method = "mvnorm")
p + geom_hdr(method = "freqpoly")
# p + geom_hdr(method = "histogram")
# Adding point layers on top to visually assess region estimates
pts <- geom_point(size = .2, color = "red")
p + geom_hdr() + pts
p + geom_hdr(method = "mvnorm") + pts
p + geom_hdr(method = "freqpoly") + pts
# p + geom_hdr(method = "histogram") + pts
# Highest density region boundary lines
p + geom_hdr_lines()
p + geom_hdr_lines(method = "mvnorm")
p + geom_hdr_lines(method = "freqpoly")
# p + geom_hdr_lines(method = "histogram")
\dontrun{
# 2+ groups - mapping other aesthetics in the geom
rdata <- function(n, n_groups = 3, radius = 3) {
list_of_dfs <- lapply(0:(n_groups-1), function(k) {
mu <- c(cos(2*k*pi/n_groups), sin(2*k*pi/n_groups))
m <- MASS::mvrnorm(n, radius*mu, diag(2))
structure(data.frame(m, as.character(k)), names = c("x", "y", "c"))
})
do.call("rbind", list_of_dfs)
}
dfc <- rdata(1000, n_groups = 5)
pf <- ggplot(dfc, aes(x, y, fill = c)) + coord_equal()
pf + geom_hdr()
pf + geom_hdr(method = "mvnorm")
pf + geom_hdr(method = "mvnorm", probs = .90, alpha = .5)
pf + geom_hdr(method = "histogram")
pf + geom_hdr(method = "freqpoly")
pc <- ggplot(dfc, aes(x, y, color = c)) +
coord_equal() +
theme_minimal() +
theme(panel.grid.minor = element_blank())
pc + geom_hdr_lines()
pc + geom_hdr_lines(method = "mvnorm")
# Data with boundaries
ggplot(df, aes(x^2)) + geom_histogram(bins = 30)
ggplot(df, aes(x^2)) + geom_histogram(bins = 30, boundary = 0)
ggplot(df, aes(x^2, y^2)) + geom_hdr(method = "histogram")
}
}
\references{
Scott, David W. Multivariate Density Estimation (2e), Wiley.
}
\keyword{datasets}
================================================
FILE: man/geom_hdr_fun.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr_fun.R, R/hdr_lines_fun.R
\docType{data}
\name{geom_hdr_fun}
\alias{geom_hdr_fun}
\alias{stat_hdr_fun}
\alias{StatHdrFun}
\alias{GeomHdrFun}
\alias{stat_hdr_lines_fun}
\alias{StatHdrLinesFun}
\alias{geom_hdr_lines_fun}
\alias{GeomHdrLinesFun}
\title{Highest density regions of a bivariate pdf}
\usage{
stat_hdr_fun(
mapping = NULL,
data = NULL,
geom = "hdr_fun",
position = "identity",
...,
fun,
args = list(),
probs = c(0.99, 0.95, 0.8, 0.5),
xlim = NULL,
ylim = NULL,
n = 100,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr_fun(
mapping = NULL,
data = NULL,
stat = "hdr_fun",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{fun}{A function, the joint probability density function, must be
vectorized in its first two arguments; see examples.}
\item{args}{Named list of additional arguments passed on to \code{fun}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data if present.}
\item{n}{Resolution of grid \code{fun} is evaluated on.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
}
\description{
Compute and plot the highest density regions (HDRs) of a bivariate pdf.
\code{geom_hdr_fun()} draws filled regions, and \code{geom_hdr_lines_fun()} draws lines outlining the regions.
Note, the plotted objects have probabilities mapped to the \code{alpha} aesthetic by default.
}
\section{Aesthetics}{
\code{geom_hdr_fun()} and \code{geom_hdr_lines_fun()} understand the following aesthetics (required
aesthetics are in bold):
\itemize{
\item x
\item y
\item alpha
\item color
\item fill (only \code{geom_hdr_fun})
\item group
\item linetype
\item linewidth
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability associated with the highest density region, specified
by \code{probs}.} }
}
\examples{
# HDRs of the bivariate exponential
f <- function(x, y) dexp(x) * dexp(y)
ggplot() + geom_hdr_fun(fun = f, xlim = c(0, 10), ylim = c(0, 10))
# HDRs of a custom parametric model
# generate example data
n <- 1000
th_true <- c(3, 8)
rdata <- function(n, th) {
gen_single_obs <- function(th) {
rchisq(2, df = th) # can be anything
}
df <- replicate(n, gen_single_obs(th))
setNames(as.data.frame(t(df)), c("x", "y"))
}
data <- rdata(n, th_true)
# estimate unknown parameters via maximum likelihood
likelihood <- function(th) {
th <- abs(th) # hack to enforce parameter space boundary
log_f <- function(v) {
x <- v[1]; y <- v[2]
dchisq(x, df = th[1], log = TRUE) + dchisq(y, df = th[2], log = TRUE)
}
sum(apply(data, 1, log_f))
}
(th_hat <- optim(c(1, 1), likelihood, control = list(fnscale = -1))$par)
# plot f for the give model
f <- function(x, y, th) dchisq(x, df = th[1]) * dchisq(y, df = th[2])
ggplot(data, aes(x, y)) +
geom_hdr_fun(fun = f, args = list(th = th_hat)) +
geom_point(size = .25, color = "red") +
xlim(0, 30) + ylim(c(0, 30))
ggplot(data, aes(x, y)) +
geom_hdr_lines_fun(fun = f, args = list(th = th_hat)) +
geom_point(size = .25, color = "red") +
xlim(0, 30) + ylim(c(0, 30))
}
\keyword{datasets}
================================================
FILE: man/geom_hdr_points.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr_points.R
\docType{data}
\name{geom_hdr_points}
\alias{geom_hdr_points}
\alias{stat_hdr_points}
\alias{StatHdrPoints}
\title{Scatterplot colored by highest density regions of a 2D density estimate}
\usage{
stat_hdr_points(
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
...,
method = "kde",
probs = c(0.99, 0.95, 0.8, 0.5),
n = 100,
xlim = NULL,
ylim = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr_points(
mapping = NULL,
data = NULL,
stat = "hdr_points",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{method}{Density estimator to use, accepts character vector:
\code{"kde"},\code{"histogram"}, \code{"freqpoly"}, or \code{"mvnorm"}.
Alternatively accepts functions which return closures corresponding to density estimates,
see \code{?get_hdr} or \code{vignette("method", "ggdensity")}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{n}{Number of grid points in each direction.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
}
\description{
Perform 2D density estimation, compute the resulting highest density regions (HDRs),
and plot the provided data as a scatterplot with points colored according to
their corresponding HDR.
}
\section{Aesthetics}{
geom_hdr_points understands the following aesthetics (required
aesthetics are in bold):
\itemize{
\item \strong{x}
\item \strong{y}
\item alpha
\item color
\item fill
\item group
\item linetype
\item size
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability associated with the highest density region, specified
by \code{probs}.} }
}
\examples{
set.seed(1)
df <- data.frame(x = rnorm(500), y = rnorm(500))
p <- ggplot(df, aes(x, y)) +
coord_equal()
p + geom_hdr_points()
# Setting aes(fill = after_stat(probs)), color = "black", and
# shape = 21 helps alleviate overplotting:
p + geom_hdr_points(aes(fill = after_stat(probs)), color = "black", shape = 21, size = 2)
# Also works well with geom_hdr_lines()
p +
geom_hdr_lines(
aes(color = after_stat(probs)), alpha = 1,
xlim = c(-5, 5), ylim = c(-5, 5)
) +
geom_hdr_points(
aes(fill = after_stat(probs)), color = "black", shape = 21, size = 2,
xlim = c(-5, 5), ylim = c(-5, 5)
)
}
\keyword{datasets}
================================================
FILE: man/geom_hdr_points_fun.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr_points_fun.R
\docType{data}
\name{geom_hdr_points_fun}
\alias{geom_hdr_points_fun}
\alias{stat_hdr_points_fun}
\alias{StatHdrPointsFun}
\title{Scatterplot colored by highest density regions of a bivariate pdf}
\usage{
stat_hdr_points_fun(
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
...,
fun,
args = list(),
probs = c(0.99, 0.95, 0.8, 0.5),
xlim = NULL,
ylim = NULL,
n = 100,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr_points_fun(
mapping = NULL,
data = NULL,
stat = "hdr_points_fun",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{fun}{A function, the joint probability density function, must be
vectorized in its first two arguments; see examples.}
\item{args}{Named list of additional arguments passed on to \code{fun}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data if present.}
\item{n}{Number of grid points in each direction.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
}
\description{
Compute the highest density regions (HDRs) of a bivariate pdf and plot the provided
data as a scatterplot with points colored according to their corresponding HDR.
}
\section{Aesthetics}{
geom_hdr_points_fun understands the following aesthetics
(required aesthetics are in bold):
\itemize{
\item \strong{x}
\item \strong{y}
\item alpha
\item color
\item fill
\item group
\item linetype
\item size
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability associated with the highest density region, specified
by \code{probs}.} }
}
\examples{
# Can plot points colored according to known pdf:
set.seed(1)
df <- data.frame(x = rexp(1000), y = rexp(1000))
f <- function(x, y) dexp(x) * dexp(y)
ggplot(df, aes(x, y)) +
geom_hdr_points_fun(fun = f, xlim = c(0, 10), ylim = c(0, 10))
# Also allows for hdrs of a custom parametric model
# generate example data
n <- 1000
th_true <- c(3, 8)
rdata <- function(n, th) {
gen_single_obs <- function(th) {
rchisq(2, df = th) # can be anything
}
df <- replicate(n, gen_single_obs(th))
setNames(as.data.frame(t(df)), c("x", "y"))
}
data <- rdata(n, th_true)
# estimate unknown parameters via maximum likelihood
likelihood <- function(th) {
th <- abs(th) # hack to enforce parameter space boundary
log_f <- function(v) {
x <- v[1]; y <- v[2]
dchisq(x, df = th[1], log = TRUE) + dchisq(y, df = th[2], log = TRUE)
}
sum(apply(data, 1, log_f))
}
(th_hat <- optim(c(1, 1), likelihood, control = list(fnscale = -1))$par)
# plot f for the give model
f <- function(x, y, th) dchisq(x, df = th[1]) * dchisq(y, df = th[2])
ggplot(data, aes(x, y)) +
geom_hdr_points_fun(fun = f, args = list(th = th_hat))
ggplot(data, aes(x, y)) +
geom_hdr_points_fun(aes(fill = after_stat(probs)), shape = 21, color = "black",
fun = f, args = list(th = th_hat), na.rm = TRUE) +
geom_hdr_lines_fun(aes(color = after_stat(probs)), alpha = 1, fun = f, args = list(th = th_hat)) +
lims(x = c(0, 15), y = c(0, 25))
}
\keyword{datasets}
================================================
FILE: man/geom_hdr_rug.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr_rug.R
\docType{data}
\name{geom_hdr_rug}
\alias{geom_hdr_rug}
\alias{stat_hdr_rug}
\alias{StatHdrRug}
\alias{GeomHdrRug}
\title{Rug plots of marginal highest density region estimates}
\usage{
stat_hdr_rug(
mapping = NULL,
data = NULL,
geom = "hdr_rug",
position = "identity",
...,
method = "kde",
method_y = "kde",
probs = c(0.99, 0.95, 0.8, 0.5),
xlim = NULL,
ylim = NULL,
n = 512,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr_rug(
mapping = NULL,
data = NULL,
stat = "hdr_rug",
position = "identity",
...,
outside = FALSE,
sides = "bl",
length = unit(0.03, "npc"),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{method, method_y}{Density estimator(s) to use.
By default \code{method} is used for both x- and y-axis.
If specified, \code{method_y} will be used for y-axis.
Accepts character vector: \code{"kde"},\code{"histogram"}, \code{"freqpoly"}, or \code{"norm"}.
Alternatively accepts functions which return closures corresponding to density estimates,
see \code{?get_hdr_1d} or \code{vignette("method", "ggdensity")}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data.}
\item{n}{Resolution of grid defined by \code{xlim} and \code{ylim}.
Ignored if \code{method = "histogram"} or \code{method = "freqpoly"}.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
\item{outside}{logical that controls whether to move the rug tassels outside of the plot area. Default is off (FALSE). You will also need to use \code{coord_cartesian(clip = "off")}. When set to TRUE, also consider changing the sides argument to "tr". See examples.}
\item{sides}{A string that controls which sides of the plot the rugs appear on.
It can be set to a string containing any of \code{"trbl"}, for top, right,
bottom, and left.}
\item{length}{A \code{\link[grid:unit]{grid::unit()}} object that sets the length of the rug lines. Use scale expansion to avoid overplotting of data.}
}
\description{
Perform 1D density estimation, compute and plot the resulting highest density
regions in a way similar to \code{\link[ggplot2:geom_rug]{ggplot2::geom_rug()}}.
Note, the plotted objects have probabilities mapped to the \code{alpha} aesthetic by default.
}
\section{Aesthetics}{
geom_hdr_rug understands the following aesthetics (required
aesthetics are in bold):
\itemize{
\item x
\item y
\item alpha
\item fill
\item group
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability of the highest density region, specified
by \code{probs}, corresponding to each point.} }
}
\examples{
set.seed(1)
df <- data.frame(x = rnorm(100), y = rnorm(100))
# Plot marginal HDRs for bivariate data
ggplot(df, aes(x, y)) +
geom_point() +
geom_hdr_rug() +
coord_fixed()
ggplot(df, aes(x, y)) +
geom_hdr() +
geom_hdr_rug() +
coord_fixed()
# Plot HDR for univariate data
ggplot(df, aes(x)) +
geom_density() +
geom_hdr_rug()
ggplot(df, aes(y = y)) +
geom_density() +
geom_hdr_rug()
# Specify location of marginal HDRs as in ggplot2::geom_rug()
ggplot(df, aes(x, y)) +
geom_hdr() +
geom_hdr_rug(sides = "tr", outside = TRUE) +
coord_fixed(clip = "off")
# Can use same methods of density estimation as geom_hdr().
# For data with constrained support, we suggest setting method = "histogram":
ggplot(df, aes(x^2)) +
geom_histogram(bins = 30, boundary = 0) +
geom_hdr_rug(method = "histogram")
ggplot(df, aes(x^2, y^2)) +
geom_hdr(method = "histogram") +
geom_hdr_rug(method = "histogram") +
coord_fixed()
}
\keyword{datasets}
================================================
FILE: man/geom_hdr_rug_fun.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdr_rug_fun.R
\docType{data}
\name{geom_hdr_rug_fun}
\alias{geom_hdr_rug_fun}
\alias{stat_hdr_rug_fun}
\alias{StatHdrRugFun}
\alias{GeomHdrRugFun}
\title{Rug plots of highest density region estimates of univariate pdfs}
\usage{
stat_hdr_rug_fun(
mapping = NULL,
data = NULL,
geom = "hdr_rug_fun",
position = "identity",
...,
fun_x = NULL,
fun_y = NULL,
args_x = list(),
args_y = list(),
probs = c(0.99, 0.95, 0.8, 0.5),
xlim = NULL,
ylim = NULL,
n = 512,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_hdr_rug_fun(
mapping = NULL,
data = NULL,
stat = "hdr_rug_fun",
position = "identity",
...,
outside = FALSE,
sides = "bl",
length = unit(0.03, "npc"),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
}
\arguments{
\item{mapping}{Set of aesthetic mappings created by \code{\link[ggplot2:aes]{aes()}}. If specified and
\code{inherit.aes = TRUE} (the default), it is combined with the default mapping
at the top level of the plot. You must supply \code{mapping} if there is no plot
mapping.}
\item{data}{The data to be displayed in this layer. There are three
options:
If \code{NULL}, the default, the data is inherited from the plot
data as specified in the call to \code{\link[ggplot2:ggplot]{ggplot()}}.
A \code{data.frame}, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
\code{\link[ggplot2:fortify]{fortify()}} for which variables will be created.
A \code{function} will be called with a single argument,
the plot data. The return value must be a \code{data.frame}, and
will be used as the layer data. A \code{function} can be created
from a \code{formula} (e.g. \code{~ head(.x, 10)}).}
\item{geom}{The geometric object to use to display the data for this layer.
When using a \verb{stat_*()} function to construct a layer, the \code{geom} argument
can be used to override the default coupling between stats and geoms. The
\code{geom} argument accepts the following:
\itemize{
\item A \code{Geom} ggproto subclass, for example \code{GeomPoint}.
\item A string naming the geom. To give the geom as a string, strip the
function name of the \code{geom_} prefix. For example, to use \code{geom_point()},
give the geom as \code{"point"}.
\item For more information and other ways to specify the geom, see the
\link[ggplot2:layer_geoms]{layer geom} documentation.
}}
\item{position}{A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The \code{position} argument accepts the following:
\itemize{
\item The result of calling a position function, such as \code{position_jitter()}.
This method allows for passing extra arguments to the position.
\item A string naming the position adjustment. To give the position as a
string, strip the function name of the \code{position_} prefix. For example,
to use \code{position_jitter()}, give the position as \code{"jitter"}.
\item For more information and other ways to specify the position, see the
\link[ggplot2:layer_positions]{layer position} documentation.
}}
\item{...}{Other arguments passed on to \code{\link[ggplot2:layer]{layer()}}'s \code{params} argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the \code{position} argument, or aesthetics that are required
can \emph{not} be passed through \code{...}. Unknown arguments that are not part
of the 4 categories below are ignored.
\itemize{
\item Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, \code{colour = "red"}
or \code{linewidth = 3}. The geom's documentation has an \strong{Aesthetics}
section that lists the available options. The 'required' aesthetics
cannot be passed on to the \code{params}. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
\item When constructing a layer using
a \verb{stat_*()} function, the \code{...} argument can be used to pass on
parameters to the \code{geom} part of the layer. An example of this is
\code{stat_density(geom = "area", outline.type = "both")}. The geom's
documentation lists which parameters it can accept.
\item Inversely, when constructing a layer using a
\verb{geom_*()} function, the \code{...} argument can be used to pass on parameters
to the \code{stat} part of the layer. An example of this is
\code{geom_area(stat = "density", adjust = 0.5)}. The stat's documentation
lists which parameters it can accept.
\item The \code{key_glyph} argument of \code{\link[ggplot2:layer]{layer()}} may also be passed on through
\code{...}. This can be one of the functions described as
\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.
}}
\item{fun_x, fun_y}{Functions, the univariate probability density function for the x- and/or y-axis.
First argument must be vectorized.}
\item{args_x, args_y}{Named list of additional arguments passed on to \code{fun_x} and/or \code{fun_y}.}
\item{probs}{Probabilities to compute highest density regions for.}
\item{xlim, ylim}{Range to compute and draw regions. If \code{NULL}, defaults to
range of data.}
\item{n}{Resolution of grid defined by \code{xlim} and \code{ylim}.
Ignored if \code{method = "histogram"} or \code{method = "freqpoly"}.}
\item{na.rm}{If \code{FALSE}, the default, missing values are removed with
a warning. If \code{TRUE}, missing values are silently removed.}
\item{show.legend}{logical. Should this layer be included in the legends?
\code{NA}, the default, includes if any aesthetics are mapped.
\code{FALSE} never includes, and \code{TRUE} always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use \code{TRUE}. If \code{NA}, all levels are shown in legend,
but unobserved levels are omitted.}
\item{inherit.aes}{If \code{FALSE}, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. \code{\link[ggplot2:annotation_borders]{annotation_borders()}}.}
\item{stat}{The statistical transformation to use on the data for this layer.
When using a \verb{geom_*()} function to construct a layer, the \code{stat}
argument can be used to override the default coupling between geoms and
stats. The \code{stat} argument accepts the following:
\itemize{
\item A \code{Stat} ggproto subclass, for example \code{StatCount}.
\item A string naming the stat. To give the stat as a string, strip the
function name of the \code{stat_} prefix. For example, to use \code{stat_count()},
give the stat as \code{"count"}.
\item For more information and other ways to specify the stat, see the
\link[ggplot2:layer_stats]{layer stat} documentation.
}}
\item{outside}{logical that controls whether to move the rug tassels outside of the plot area. Default is off (FALSE). You will also need to use \code{coord_cartesian(clip = "off")}. When set to TRUE, also consider changing the sides argument to "tr". See examples.}
\item{sides}{A string that controls which sides of the plot the rugs appear on.
It can be set to a string containing any of \code{"trbl"}, for top, right,
bottom, and left.}
\item{length}{A \code{\link[grid:unit]{grid::unit()}} object that sets the length of the rug lines. Use scale expansion to avoid overplotting of data.}
}
\description{
Compute and plot the highest density regions (HDRs) of specified univariate pdf(s).
Note, the plotted objects have probabilities mapped to the \code{alpha} aesthetic by default.
}
\section{Aesthetics}{
\code{geom_hdr_rug_fun()} understands the following aesthetics (required
aesthetics are in bold):
\itemize{
\item x
\item y
\item alpha
\item fill
\item group
\item subgroup
}
}
\section{Computed variables}{
\describe{ \item{probs}{The probability of the highest density region, specified
by \code{probs}, corresponding to each point.} }
}
\examples{
# Plotting data with exponential marginals
df <- data.frame(x = rexp(1e3), y = rexp(1e3))
ggplot(df, aes(x, y)) +
geom_hdr_rug_fun(fun_x = dexp, fun_y = dexp) +
geom_point(size = .5) +
coord_fixed()
# without data/aesthetic mappings
ggplot() +
geom_hdr_rug_fun(fun_x = dexp, fun_y = dexp, xlim = c(0, 7), ylim = c(0, 7)) +
coord_fixed()
# Plotting univariate normal data, estimating mean and sd
df <- data.frame(x = rnorm(1e4, mean = 1, sd = 3))
# estimating parameters
mu_hat <- mean(df$x)
sd_hat <- sd(df$x)
ggplot(df, aes(x)) +
geom_hdr_rug_fun(fun_x = dnorm, args_x = list(mean = mu_hat, sd = sd_hat)) +
geom_density()
# Equivalent to `method_norm_1d()` with `geom_hdr_rug()`
ggplot(df, aes(x)) +
geom_hdr_rug(method = method_norm_1d()) +
geom_density()
}
\keyword{datasets}
================================================
FILE: man/get_hdr.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_hdr.R
\name{get_hdr}
\alias{get_hdr}
\title{Computing the highest density regions of a 2D density}
\usage{
get_hdr(
data = NULL,
method = "kde",
probs = c(0.99, 0.95, 0.8, 0.5),
n = 100,
rangex = NULL,
rangey = NULL,
hdr_membership = TRUE,
fun,
args = list()
)
}
\arguments{
\item{data}{A data frame with columns \code{x} and \code{y}.}
\item{method}{Either a character (\code{"kde"}, \code{"mvnorm"}, \code{"histogram"},
\code{"freqpoly"}, or \code{"fun"}) or \verb{method_*()} function. See the "The \code{method}
argument" section below for details.}
\item{probs}{Probabilities to compute HDRs for.}
\item{n}{Resolution of grid representing estimated density and HDRs.}
\item{rangex, rangey}{Range of grid representing estimated density and HDRs,
along the x- and y-axes.}
\item{hdr_membership}{Should HDR membership of data points (\code{data}) be
computed? Defaults to \code{TRUE}, although it is computationally expensive for
large data sets.}
\item{fun}{Optional, a joint probability density function, must be vectorized
in its first two arguments. See the "The \code{fun} argument" section below for
details.}
\item{args}{Optional, a list of arguments to be provided to \code{fun}.}
}
\value{
\code{get_hdr} returns a list with elements \code{df_est} (\code{data.frame}), \code{breaks}
(named \code{numeric}), and \code{data} (\code{data.frame}).
\itemize{
\item \code{df_est}: the estimated HDRs and density evaluated on the grid defined by \code{rangex}, \code{rangey}, and \code{n}.
The column of estimated HDRs (\code{df_est$hdr}) is a numeric vector with values
from \code{probs}. The columns \code{df_est$fhat} and \code{df_est$fhat_discretized}
correspond to the estimated density on the original scale and rescaled to sum
to 1, respectively.
\item \code{breaks}: the heights of the estimated density (\code{df_est$fhat}) corresponding to the HDRs specified by \code{probs}.
Will always have additional element \code{Inf} representing the cutoff for the
100\% HDR.
\item \code{data}: the original data provided in the \code{data} argument.
If \code{hdr_membership} is set to \code{TRUE}, this includes a column
(\code{data$hdr_membership}) with the HDR corresponding to each data point.
}
}
\description{
\code{get_hdr} is used to estimate a 2-dimensional density and compute
corresponding HDRs. The estimated density and HDRs are represented in a
discrete form as a grid, defined by arguments \code{rangex}, \code{rangey}, and \code{n}.
\code{get_hdr} is used internally by layer functions \code{stat_hdr()},
\code{stat_hdr_points()}, \code{stat_hdr_fun()}, etc.
}
\section{The \code{method} argument}{
The density estimator used to estimate the
HDRs is specified with the \code{method} argument. The simplest way to specify
an estimator is to provide a character value to \code{method}, for example
\code{method = "kde"} specifies a kernel density estimator. However, this
specification is limited to the default behavior of the estimator.
Instead, it is possible to provide a function call, for example: \code{method = method_kde()}. In many cases, these functions accept parameters governing
the density estimation procedure. Here, \code{method_kde()} accepts parameters
\code{h} and \code{adjust}, both related to the kernel's bandwidth. For details, see
\code{?method_kde}. Every method of bivariate density estimation implemented has
such corresponding \verb{method_*()} function, each with an associated help
page.
Note: \code{geom_hdr()} and other layer functions also have \code{method} arguments
which behave in the same way. For more details on the use and
implementation of the \verb{method_*()} functions, see \code{vignette("method", "ggdensity")}.
}
\section{The \code{fun} argument}{
If \code{method} is set to \code{"fun"}, \code{get_hdr()}
expects a bivariate probability density function to be specified with the
\code{fun} argument. It is required that \code{fun} be a function of at least two
arguments (\code{x} and \code{y}). Beyond these first two arguments, \code{fun} can have
arbitrarily many arguments; these can be set in \code{get_hdr()} as a named list
via the \code{args} parameter.
Note: \code{get_hdr()} requires that \code{fun} be vectorized in \code{x} and \code{y}. For an
example of an appropriate choice of \code{fun}, see the final example below.
}
\examples{
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
# Two ways to specify `method`
get_hdr(df, method = "kde")
get_hdr(df, method = method_kde())
\dontrun{
# If parenthesis are omitted, `get_hdr()` errors
get_hdr(df, method = method_kde)
}
# Estimate different HDRs with `probs`
get_hdr(df, method = method_kde(), probs = c(.975, .6, .2))
# Adjust estimator parameters with arguments to `method_kde()`
get_hdr(df, method = method_kde(h = 1))
# Parametric normal estimator of density
get_hdr(df, method = "mvnorm")
get_hdr(df, method = method_mvnorm())
# Compute "population" HDRs of specified bivariate pdf with `method = "fun"`
f <- function(x, y, sd_x = 1, sd_y = 1) dnorm(x, sd = sd_x) * dnorm(y, sd = sd_y)
get_hdr(
method = "fun", fun = f,
rangex = c(-5, 5), rangey = c(-5, 5)
)
get_hdr(
method = "fun", fun = f,
rangex = c(-5, 5), rangey = c(-5, 5),
args = list(sd_x = .5, sd_y = .5) # specify additional arguments w/ `args`
)
}
================================================
FILE: man/get_hdr_1d.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_hdr_1d.R
\name{get_hdr_1d}
\alias{get_hdr_1d}
\title{Computing the highest density regions of a 1D density}
\usage{
get_hdr_1d(
x = NULL,
method = "kde",
probs = c(0.99, 0.95, 0.8, 0.5),
n = 512,
range = NULL,
hdr_membership = TRUE,
fun,
args = list()
)
}
\arguments{
\item{x}{A vector of data}
\item{method}{Either a character (\code{"kde"}, \code{"norm"}, \code{"histogram"}, \code{"freqpoly"}, or \code{"fun"}) or \verb{method_*_1d()} function.
See the "The \code{method} argument" section below for details.}
\item{probs}{Probabilities to compute HDRs for.}
\item{n}{Resolution of grid representing estimated density and HDRs.}
\item{range}{Range of grid representing estimated density and HDRs.}
\item{hdr_membership}{Should HDR membership of data points (\code{x}) be computed?}
\item{fun}{Optional, a probability density function, must be vectorized in its first argument.
See the "The \code{fun} argument" section below for details.}
\item{args}{Optional, a list of arguments to be provided to \code{fun}.}
}
\value{
\code{get_hdr_1d} returns a list with elements \code{df_est} (\code{data.frame}), \code{breaks} (named \code{numeric}), and \code{data} (\code{data.frame}).
\itemize{
\item \code{df_est}: the estimated HDRs and density evaluated on the grid defined by \code{range} and \code{n}.
The column of estimated HDRs (\code{df_est$hdr}) is a numeric vector with values from \code{probs}.
The columns \code{df_est$fhat} and \code{df_est$fhat_discretized} correspond to the estimated density
on the original scale and rescaled to sum to 1, respectively.
\item \code{breaks}: the heights of the estimated density (\code{df_est$fhat}) corresponding to the HDRs specified by \code{probs}.
Will always have additional element \code{Inf} representing the cutoff for the 100\% HDR.
\item \code{data}: the original data provided in the \code{data} argument.
If \code{hdr_membership} is set to \code{TRUE}, this includes a column (\code{data$hdr_membership})
with the HDR corresponding to each data point.
}
}
\description{
\code{get_hdr_1d} is used to estimate a 1-dimensional density and compute corresponding HDRs.
The estimated density and HDRs are represented in a discrete form as a grid, defined by arguments \code{range} and \code{n}.
\code{get_hdr_1d} is used internally by layer functions \code{stat_hdr_rug()} and \code{stat_hdr_rug_fun()}.
}
\section{The \code{method} argument}{
The density estimator used to estimate the HDRs is specified with the \code{method} argument.
The simplest way to specify an estimator is to provide a character value to \code{method},
for example \code{method = "kde"} specifies a kernel density estimator.
However, this specification is limited to the default behavior of the estimator.
Instead, it is possible to provide a function call, for example: \code{method = method_kde_1d()}.
This is slightly different from the function calls provided in \code{get_hdr()}, note the \verb{_1d} suffix.
In many cases, these functions accept parameters governing the density estimation procedure.
Here, \code{method_kde_1d()} accepts several parameters related to the choice of kernel.
For details, see \code{?method_kde_1d}.
Every method of univariate density estimation implemented has such corresponding \verb{method_*_1d()} function,
each with an associated help page.
Note: \code{geom_hdr_rug()} and other layer functions also have \code{method} arguments which behave in the same way.
For more details on the use and implementation of the \verb{method_*_1d()} functions,
see \code{vignette("method", "ggdensity")}.
}
\section{The \code{fun} argument}{
If \code{method} is set to \code{"fun"}, \code{get_hdr_1d()} expects a univariate probability
density function to be specified with the \code{fun} argument.
It is required that \code{fun} be a function of at least one argument (\code{x}).
Beyond this first argument, \code{fun} can have arbitrarily many arguments;
these can be set in \code{get_hdr_1d()} as a named list via the \code{args} parameter.
Note: \code{get_hdr_1d()} requires that \code{fun} be vectorized in \code{x}.
For an example of an appropriate choice of \code{fun}, see the final example below.
}
\examples{
x <- rnorm(1e3)
# Two ways to specify `method`
get_hdr_1d(x, method = "kde")
get_hdr_1d(x, method = method_kde_1d())
\dontrun{
# If parenthesis are omitted, `get_hdr_1d()` errors
get_hdr_1d(df, method = method_kde_1d)
# If the `_1d` suffix is omitted, `get_hdr_1d()` errors
get_hdr_1d(x, method = method_kde())
}
# Adjust estimator parameters with arguments to `method_kde_1d()`
get_hdr_1d(x, method = method_kde_1d(kernel = "triangular"))
# Estimate different HDRs with `probs`
get_hdr_1d(x, method = method_kde_1d(), probs = c(.975, .6, .2))
# Compute "population" HDRs of specified univariate pdf with `method = "fun"`
f <- function(x, sd = 1) dnorm(x, sd = sd)
get_hdr_1d(method = "fun", fun = f, range = c(-5, 5))
get_hdr_1d(method = "fun", fun = f, range = c(-5, 5), args = list(sd = .5))
}
================================================
FILE: man/ggdensity.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggdensity-package.R
\name{ggdensity}
\alias{ggdensity}
\alias{package-ggdensity}
\title{ggdensity: Stats and Geoms for Density Estimation with ggplot2}
\description{
A package that allows more flexible computations for visualization of density
estimates with ggplot2.
}
\seealso{
Useful links:
\itemize{
\item \url{https://jamesotto852.github.io/ggdensity/}
\item \url{https://github.com/jamesotto852/ggdensity/}
}
}
================================================
FILE: man/method_freqpoly.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method.R
\name{method_freqpoly}
\alias{method_freqpoly}
\title{Bivariate frequency polygon HDR estimator}
\usage{
method_freqpoly(bins = NULL)
}
\arguments{
\item{bins}{Number of bins along each axis.
Either a vector of length 2 or a scalar value which is recycled for both dimensions.
Defaults to normal reference rule (Scott, pg 87).}
}
\description{
Function used to specify bivariate frequency polygon density estimator
for \code{get_hdr()} and layer functions (e.g. \code{geom_hdr()}).
}
\details{
For more details on the use and implementation of the \verb{method_*()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
set.seed(1)
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
ggplot(df, aes(x, y)) +
geom_hdr(method = method_freqpoly()) +
geom_point(size = 1)
# The resolution of the frequency polygon estimator can be set via `bins`
ggplot(df, aes(x, y)) +
geom_hdr(method = method_freqpoly(bins = c(8, 25))) +
geom_point(size = 1)
# Can also be used with `get_hdr()` for numerical summary of HDRs
res <- get_hdr(df, method = method_freqpoly())
str(res)
}
\references{
Scott, David W. Multivariate Density Estimation (2e), Wiley.
}
================================================
FILE: man/method_freqpoly_1d.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method_1d.R
\name{method_freqpoly_1d}
\alias{method_freqpoly_1d}
\title{Univariate frequency polygon HDR estimator}
\usage{
method_freqpoly_1d(bins = NULL)
}
\arguments{
\item{bins}{Number of bins. Defaults to normal reference rule (Scott, pg 59).}
}
\description{
Function used to specify univariate frequency polygon density estimator
for \code{get_hdr_1d()} and layer functions (e.g. \code{geom_hdr_rug()}).
}
\details{
For more details on the use and implementation of the \verb{method_*_1d()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
df <- data.frame(x = rnorm(1e3))
# Strip chart to visualize 1-d data
p <- ggplot(df, aes(x)) +
geom_jitter(aes(y = 0), width = 0, height = 2) +
scale_y_continuous(name = NULL, breaks = NULL) +
coord_cartesian(ylim = c(-3, 3))
p
p + geom_hdr_rug(method = method_freqpoly_1d())
# The resolution of the frequency polygon estimator can be set via `bins`
p + geom_hdr_rug(method = method_freqpoly_1d(bins = 100))
# Can also be used with `get_hdr_1d()` for numerical summary of HDRs
res <- get_hdr_1d(df$x, method = method_freqpoly_1d())
str(res)
}
\references{
Scott, David W. Multivariate Density Estimation (2e), Wiley.
}
================================================
FILE: man/method_histogram.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method.R
\name{method_histogram}
\alias{method_histogram}
\title{Bivariate histogram HDR estimator}
\usage{
method_histogram(bins = NULL, smooth = FALSE, nudgex = "none", nudgey = "none")
}
\arguments{
\item{bins}{Number of bins along each axis.
Either a vector of length 2 or a scalar value which is recycled for both dimensions.
Defaults to normal reference rule (Scott, pg 87).}
\item{smooth}{If \code{TRUE}, HDRs are smoothed by the marching squares algorithm.}
\item{nudgex, nudgey}{Horizontal and vertical rules for choosing witness points when \code{smooth == TRUE}.
Accepts character vector: \code{"left"}, \code{"none"}, \code{"right"} (\code{nudgex}) or \code{"down"}, \code{"none"}, \code{"up"} (\code{nudgey}).}
}
\description{
Function used to specify bivariate histogram density estimator
for \code{get_hdr()} and layer functions (e.g. \code{geom_hdr()}).
}
\details{
For more details on the use and implementation of the \verb{method_*()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
\dontrun{
# Histogram estimators can be useful when data has boundary constraints
set.seed(1)
df <- data.frame(x = rexp(1e3), y = rexp(1e3))
ggplot(df, aes(x, y)) +
geom_hdr(method = method_histogram()) +
geom_point(size = 1)
# The resolution of the histogram estimator can be set via `bins`
ggplot(df, aes(x, y)) +
geom_hdr(method = method_histogram(bins = c(8, 25))) +
geom_point(size = 1)
# By setting `smooth = TRUE`, we can graphically smooth the "blocky" HDRs
ggplot(df, aes(x, y)) +
geom_hdr(method = method_histogram(smooth = TRUE)) +
geom_point(size = 1)
# However, we need to set `nudgex` and `nudgey` to align the HDRs correctly
ggplot(df, aes(x, y)) +
geom_hdr(method = method_histogram(smooth = TRUE, nudgex = "left", nudgey = "down")) +
geom_point(size = 1)
# Can also be used with `get_hdr()` for numerical summary of HDRs
res <- get_hdr(df, method = method_histogram())
str(res)
}
}
\references{
Scott, David W. Multivariate Density Estimation (2e), Wiley.
}
================================================
FILE: man/method_histogram_1d.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method_1d.R
\name{method_histogram_1d}
\alias{method_histogram_1d}
\title{Univariate histogram HDR estimator}
\usage{
method_histogram_1d(bins = NULL)
}
\arguments{
\item{bins}{Number of bins. Defaults to normal reference rule (Scott, pg 59).}
}
\description{
Function used to specify univariate histogram density estimator
for \code{get_hdr_1d()} and layer functions (e.g. \code{geom_hdr_rug()}).
}
\details{
For more details on the use and implementation of the \verb{method_*_1d()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
# Histogram estimators can be useful when data has boundary constraints
df <- data.frame(x = rexp(1e3))
# Strip chart to visualize 1-d data
p <- ggplot(df, aes(x)) +
geom_jitter(aes(y = 0), width = 0, height = 2) +
scale_y_continuous(name = NULL, breaks = NULL) +
coord_cartesian(ylim = c(-3, 3))
p
p + geom_hdr_rug(method = method_histogram_1d())
# The resolution of the histogram estimator can be set via `bins`
p + geom_hdr_rug(method = method_histogram_1d(bins = 5))
# Can also be used with `get_hdr_1d()` for numerical summary of HDRs
res <- get_hdr_1d(df$x, method = method_histogram_1d())
str(res)
}
\references{
Scott, David W. Multivariate Density Estimation (2e), Wiley.
}
================================================
FILE: man/method_kde.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method.R
\name{method_kde}
\alias{method_kde}
\title{Bivariate kernel density HDR estimator}
\usage{
method_kde(h = NULL, adjust = c(1, 1))
}
\arguments{
\item{h}{Bandwidth (vector of length two). If \code{NULL}, estimated
using \code{\link[MASS:bandwidth.nrd]{MASS::bandwidth.nrd()}}.}
\item{adjust}{A multiplicative bandwidth adjustment to be used if 'h' is
'NULL'. This makes it possible to adjust the bandwidth while still
using the a bandwidth estimator. For example, \code{adjust = 1/2} means
use half of the default bandwidth.}
}
\description{
Function used to specify bivariate kernel density estimator
for \code{get_hdr()} and layer functions (e.g. \code{geom_hdr()}).
}
\details{
For more details on the use and implementation of the \verb{method_*()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
set.seed(1)
df <- data.frame(x = rnorm(1e3, sd = 3), y = rnorm(1e3, sd = 3))
ggplot(df, aes(x, y)) +
geom_hdr(method = method_kde()) +
geom_point(size = 1)
# The defaults of `method_kde()` are the same as the estimator for `ggplot2::geom_density_2d()`
ggplot(df, aes(x, y)) +
geom_density_2d_filled() +
geom_hdr_lines(method = method_kde(), probs = seq(.1, .9, by = .1)) +
theme(legend.position = "none")
# The bandwidth of the estimator can be set directly with `h` or scaled with `adjust`
ggplot(df, aes(x, y)) +
geom_hdr(method = method_kde(h = 1)) +
geom_point(size = 1)
ggplot(df, aes(x, y)) +
geom_hdr(method = method_kde(adjust = 1/2)) +
geom_point(size = 1)
# Can also be used with `get_hdr()` for numerical summary of HDRs
res <- get_hdr(df, method = method_kde())
str(res)
}
================================================
FILE: man/method_kde_1d.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method_1d.R
\name{method_kde_1d}
\alias{method_kde_1d}
\title{Univariate kernel density HDR estimator}
\usage{
method_kde_1d(
bw = "nrd0",
adjust = 1,
kernel = "gaussian",
weights = NULL,
window = kernel
)
}
\arguments{
\item{bw}{the smoothing bandwidth to be used. The kernels are scaled
such that this is the standard deviation of the smoothing kernel.
(Note this differs from the reference books cited below.)
\code{bw} can also be a character string giving a rule to choose the
bandwidth. See \code{\link[stats]{bw.nrd}}. \cr The default,
\code{"nrd0"}, has remained the default for historical and
compatibility reasons, rather than as a general recommendation,
where e.g., \code{"SJ"} would rather fit, see also
Venables and Ripley (2002).
The specified (or computed) value of \code{bw} is multiplied by
\code{adjust}.
}
\item{adjust}{the bandwidth used is actually \code{adjust*bw}.
This makes it easy to specify values like \sQuote{half the default}
bandwidth.}
\item{kernel, window}{a character string giving the smoothing kernel
to be used. This must partially match one of \code{"gaussian"},
\code{"rectangular"}, \code{"triangular"}, \code{"epanechnikov"},
\code{"biweight"}, \code{"cosine"} or \code{"optcosine"}, with default
\code{"gaussian"}, and may be abbreviated to a unique prefix (single
letter).
\code{"cosine"} is smoother than \code{"optcosine"}, which is the
usual \sQuote{cosine} kernel in the literature and almost MSE-efficient.
However, \code{"cosine"} is the version used by S.
}
\item{weights}{numeric vector of non-negative observation weights,
hence of same length as \code{x}. The default \code{NULL} is
equivalent to \code{weights = rep(1/nx, nx)} where \code{nx} is the
length of (the finite entries of) \code{x[]}. If \code{na.rm = TRUE}
and there are \code{NA}'s in \code{x}, they \emph{and} the
corresponding weights are removed before computations. In that case,
when the original weights have summed to one, they are re-scaled to
keep doing so.
Note that weights are \emph{not} taken into account for automatic
bandwidth rules, i.e., when \code{bw} is a string. When the weights
are proportional to true counts \code{cn}, \code{density(x = rep(x, cn))}
may be used instead of \code{weights}.
}
}
\description{
Function used to specify univariate kernel density estimator
for \code{get_hdr_1d()} and layer functions (e.g. \code{geom_hdr_rug()}).
}
\details{
For more details on the use and implementation of the \verb{method_*_1d()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
df <- data.frame(x = rnorm(1e3, sd = 3))
ggplot(df, aes(x)) +
geom_hdr_rug(method = method_kde_1d()) +
geom_density()
# Details of the KDE can be adjusted with arguments to `method_kde_1d()`
ggplot(df, aes(x)) +
geom_hdr_rug(method = method_kde_1d(adjust = 1/5)) +
geom_density(adjust = 1/5)
ggplot(df, aes(x)) +
geom_hdr_rug(method = method_kde_1d(kernel = "triangular")) +
geom_density(kernel = "triangular")
# Can also be used with `get_hdr_1d()` for numerical summary of HDRs
res <- get_hdr_1d(df$x, method = method_kde_1d())
str(res)
}
================================================
FILE: man/method_mvnorm.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method.R
\name{method_mvnorm}
\alias{method_mvnorm}
\title{Bivariate parametric normal HDR estimator}
\usage{
method_mvnorm()
}
\description{
Function used to specify bivariate normal density estimator
for \code{get_hdr()} and layer functions (e.g. \code{geom_hdr()}).
}
\details{
For more details on the use and implementation of the \verb{method_*()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
# Normal estimator is useful when an assumption of normality is appropriate
set.seed(1)
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
ggplot(df, aes(x, y)) +
geom_hdr(method = method_mvnorm(), xlim = c(-4, 4), ylim = c(-4, 4)) +
geom_point(size = 1)
# Can also be used with `get_hdr()` for numerical summary of HDRs
res <- get_hdr(df, method = method_mvnorm())
str(res)
}
================================================
FILE: man/method_norm_1d.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method_1d.R
\name{method_norm_1d}
\alias{method_norm_1d}
\title{Univariate parametric normal HDR estimator}
\usage{
method_norm_1d()
}
\description{
Function used to specify univariate normal density estimator
for \code{get_hdr_1d()} and layer functions (e.g. \code{geom_hdr_rug()}).
}
\details{
For more details on the use and implementation of the \verb{method_*_1d()} functions,
see \code{vignette("method", "ggdensity")}.
}
\examples{
# Normal estimators are useful when an assumption of normality is appropriate
df <- data.frame(x = rnorm(1e3))
ggplot(df, aes(x)) +
geom_hdr_rug(method = method_norm_1d()) +
geom_density()
# Can also be used with `get_hdr_1d()` for numerical summary of HDRs
res <- get_hdr_1d(df$x, method = method_norm_1d())
str(res)
}
================================================
FILE: revdep/.gitignore
================================================
checks
library
checks.noindex
library.noindex
cloud.noindex
data.sqlite
*.html
================================================
FILE: revdep/README.md
================================================
# Platform
|field |value |
|:--------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|version |R version 4.5.2 (2025-10-31 ucrt) |
|os |Windows 11 x64 (build 26200) |
|system |x86_64, mingw32 |
|ui |RStudio |
|language |(EN) |
|collate |English_United States.utf8 |
|ctype |English_United States.utf8 |
|tz |America/Chicago |
|date |2026-02-24 |
|rstudio |2026.01.1+403 Apple Blossom (desktop) |
|pandoc |NA |
|quarto |ERROR: Unknown command "TMPDIR=C:/Users/james/AppData/Local/Temp/Rtmpon9F2w/file48108fa5255". Did you mean command "create"? @ C:\PROGRA~1\RStudio\RESOUR~1\app\bin\quarto\bin\quarto.exe |
# Dependencies
|package |old |new |Δ |
|:------------|:------|:------|:--|
|ggdensity |1.0.0 |1.0.1 |* |
|cli |3.6.5 |3.6.5 | |
|cpp11 |0.5.3 |0.5.3 | |
|farver |2.1.2 |2.1.2 | |
|ggplot2 |4.0.2 |4.0.2 | |
|glue |1.8.0 |1.8.0 | |
|gtable |0.3.6 |0.3.6 | |
|isoband |0.3.0 |0.3.0 | |
|labeling |0.4.3 |0.4.3 | |
|lifecycle |1.0.5 |1.0.5 | |
|magrittr |2.0.4 |2.0.4 | |
|pillar |1.11.1 |1.11.1 | |
|pkgconfig |2.0.3 |2.0.3 | |
|R6 |2.6.1 |2.6.1 | |
|RColorBrewer |1.1-3 |1.1-3 | |
|rlang |1.1.7 |1.1.7 | |
|S7 |0.2.1 |0.2.1 | |
|scales |1.4.0 |1.4.0 | |
|tibble |3.3.1 |3.3.1 | |
|utf8 |1.2.6 |1.2.6 | |
|vctrs |0.7.1 |0.7.1 | |
|viridisLite |0.4.3 |0.4.3 | |
|withr |3.0.2 |3.0.2 | |
# Revdeps
================================================
FILE: revdep/cran.md
================================================
## revdepcheck results
We checked 6 reverse dependencies (5 from CRAN + 1 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.
* We saw 0 new problems
* We failed to check 0 packages
================================================
FILE: revdep/email.yml
================================================
release_date: ???
rel_release_date: ???
my_news_url: ???
release_version: ???
release_details: ???
================================================
FILE: revdep/failures.md
================================================
*Wow, no problems at all. :)*
================================================
FILE: revdep/problems.md
================================================
*Wow, no problems at all. :)*
================================================
FILE: tests/testthat/test-fix_probs.R
================================================
test_that("fix_probs() works as intended", {
# Check defaults
expect_equal(fix_probs(c(.99, .95, .80, .50)), c(.99, .95, .80, .50))
# Reorders probabilities correctly
expect_equal(fix_probs(c(.80, .50, .99, .95)), c(.99, .95, .80, .50))
# Works with vectors of length 1
expect_equal(fix_probs(.50), .5)
# Issues error if any probabilites are outside (0, 1)
expect_error(fix_probs(c(1.1, .80, .5)), regexp = "must be between")
expect_error(fix_probs(c(.80, .5, -1)), regexp = "must be between")
expect_error(fix_probs(c(1)), regexp = "must be between")
expect_error(fix_probs(c(0)), regexp = "must be between")
})
================================================
FILE: tests/testthat/test-get_hdr.R
================================================
test_that("structure of get_hdr() return value is as expected", {
data <- data.frame(
x = 1:10,
y = rep(1:5, each = 2)
)
res <- get_hdr(data)
# Checking the top level of res
expect_type(res, "list")
expect_equal(length(res), 3)
expect_equal(names(res), c("df_est", "breaks", "data"))
# Checking res$df_est:
expect_type(res$df_est, "list")
expect_equal(ncol(res$df_est), 5)
expect_equal(colnames(res$df_est), c("x", "y", "fhat", "fhat_discretized", "hdr"))
# Checking res$data
expect_type(res$data, "list")
expect_equal(ncol(res$data), 3)
expect_equal(nrow(res$data), 10)
expect_equal(colnames(res$data), c("x", "y", "hdr_membership"))
# Checking res$breaks
expect_type(res$breaks, "double")
expect_equal(length(res$breaks), 5)
expect_equal(names(res$breaks), c("99%", "95%", "80%", "50%", NA))
# Now with non-default args -----------------------------------------
res <- get_hdr(data, probs = c(.989, .878, .67, .43, .21), hdr_membership = FALSE)
# Checking res$data
expect_equal(ncol(res$data), 2)
# Checking res$breaks
expect_type(res$breaks, "double")
expect_equal(length(res$breaks), 6)
expect_equal(names(res$breaks), c("99%", "88%", "67%", "43%", "21%", NA))
})
test_that("`method` can be provided as a character vector or function", {
data <- data.frame(
x = 1:10,
y = rep(1:5, each = 2)
)
expect_equal(get_hdr(data, "kde"), get_hdr(data, method_kde()))
expect_equal(get_hdr(data, "mvnorm"), get_hdr(data, method_mvnorm()))
expect_equal(get_hdr(data, "freqpoly"), get_hdr(data, method_freqpoly()))
expect_equal(get_hdr(data, "histogram"), get_hdr(data, method_histogram()))
})
test_that("get_hdr() errors informatively if bad `method` argument", {
data <- data.frame(
x = 1:10,
y = rep(1:5, each = 2)
)
expect_error(get_hdr(data, method = "not-a-method"), regexp = "Invalid method specified")
expect_error(get_hdr(data, method = method_kde), regexp = "did you forget")
})
# # The data used for tests:
#
# set.seed(1)
# df <- data.frame(
# x = rnorm(5e3),
# y = rnorm(5e3)
# )
#
# write_rds(df, here::here("tests/testthat/fixtures/df_norm.rds"))
test_that("get_hdr(method = method_kde()) calculations are consistent", {
data <- readRDS(test_path("fixtures", "df_norm.rds"))
res <- get_hdr(data, method_kde())
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
# By default, estimate is evaluated on the same range as original data
expect_equal(range(res$df_est$x), range(data$x))
expect_equal(range(res$df_est$y), range(data$y))
# default grid is 100 x 100:
expect_equal(nrow(res$df_est), 100 * 100)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 185)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0017, 0.0083, 0.0303, 0.0731, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.5, .8, .95, .99, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(858, 1149, 1597, 1771, 4625))
# Checking non-default args ------------------------
res <- get_hdr(data, method_kde(adjust = .4), probs = c(.97, .85, .4, .1), n = c(100, 200), rangex = c(-3, 2), rangey = c(-1, 3))
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
# Was the custom range used
expect_equal(range(res$df_est$x), c(-3, 2))
expect_equal(range(res$df_est$y), c(-1, 3))
# default grid is 100 x 100:
expect_equal(nrow(res$df_est), 100 * 200)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 808)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0105, 0.0352, 0.1036, 0.1522, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.1, .4, .85, .97, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(495, 1923, 5584, 4524, 7474))
})
# TODO: above, for other methods
test_that("get_hdr() works with custom function factory supplied to `method`", {
data <- readRDS(test_path("fixtures", "df_norm.rds"))
method_mvnorm_ind <- function() {
function(data) {
mean_x <- mean(data$x); s_x <- sd(data$x)
mean_y <- mean(data$y); s_y <- sd(data$y)
function(x, y) dnorm(x, mean = mean_x, sd = s_x) * dnorm(y, mean = mean_y, sd = s_y)
}
}
res <- get_hdr(data, method = method_mvnorm_ind())
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
# By default, estimate is evaluated on the same range as original data
expect_equal(range(res$df_est$x), range(data$x))
expect_equal(range(res$df_est$y), range(data$y))
# default grid is 100 x 100:
expect_equal(nrow(res$df_est), 100 * 100)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 185)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0017, 0.0078, 0.031, 0.078, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.5, .8, .95, .99, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(826, 1090, 1642, 1863, 4579))
})
test_that("get_hdr() works with custom function factory supplied to `method`", {
data <- readRDS(test_path("fixtures", "df_norm.rds"))
method_fixed_grid <- function() {
function(data, n, rangex, rangey) {
df_grid <- expand.grid(
x = seq(rangex[1], rangex[2], length.out = n),
y = seq(rangey[1], rangey[2], length.out = n)
)
df_grid$fhat <- dnorm(df_grid$x) * dnorm(df_grid$y)
df_grid
}
}
res <- get_hdr(data, method = method_fixed_grid())
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
# By default, estimate is evaluated on the same range as original data
expect_equal(range(res$df_est$x), range(data$x))
expect_equal(range(res$df_est$y), range(data$y))
# default grid is 100 x 100:
expect_equal(nrow(res$df_est), 100 * 100)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 185)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0017, 0.008, 0.0321, 0.0796, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.5, .8, .95, .99, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(806, 1065, 1603, 1824, 4702))
})
test_that("get_hdr() fails if `method != 'fun' and `data` isn't provided", {
expect_error(get_hdr(method = method_kde()), regexp = ".data. must be provided")
})
test_that("fun argument of get_hdr() requires rangex/y", {
expect_error(get_hdr(method = "fun", fun = function(x, y) dexp(x) * dexp(y)), regexp = ".rangey. must be provided")
})
test_that("fun argument of get_hdr() works", {
res <- get_hdr(method = "fun", fun = function(x, y) dexp(x) * dexp(y), rangex = c(0, 10), rangey = c(0, 10))
# Structure of res is as expected
expect_type(res, "list")
expect_equal(length(res), 3)
expect_equal(names(res), c("df_est", "breaks", "data"))
expect_null(res$data)
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
expect_equal(range(res$df_est$x), c(0, 10))
expect_equal(range(res$df_est$y), c(0, 10))
# default grid is 100 x 100:
expect_equal(nrow(res$df_est), 100 * 100)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 108)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0014, 0.0096, 0.0534, 0.1987, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.5, .8, .95, .99, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(145, 306, 669, 1045, 7835))
})
================================================
FILE: tests/testthat/test-get_hdr_1d.R
================================================
test_that("structure of get_hdr_1d() return value is as expected", {
x <- 1:10
res <- get_hdr_1d(x)
# Checking the top level of res
expect_type(res, "list")
expect_equal(length(res), 3)
expect_equal(names(res), c("df_est", "breaks", "data"))
# Checking res$df_est:
expect_type(res$df_est, "list")
expect_equal(ncol(res$df_est), 4)
expect_equal(colnames(res$df_est), c("x", "fhat", "fhat_discretized", "hdr"))
# Checking res$data
expect_type(res$data, "list")
expect_equal(ncol(res$data), 2)
expect_equal(nrow(res$data), 10)
expect_equal(colnames(res$data), c("x", "hdr_membership"))
# Checking res$breaks
expect_type(res$breaks, "double")
expect_equal(length(res$breaks), 5)
expect_equal(names(res$breaks), c("99%", "95%", "80%", "50%", NA))
})
test_that("`method` can be provided as a character vector or function", {
x <- 1:10
expect_equal(get_hdr_1d(x, "kde"), get_hdr_1d(x, method_kde_1d()))
expect_equal(get_hdr_1d(x, "norm"), get_hdr_1d(x, method_norm_1d()))
expect_equal(get_hdr_1d(x, "freqpoly"), get_hdr_1d(x, method_freqpoly_1d()))
expect_equal(get_hdr_1d(x, "histogram"), get_hdr_1d(x, method_histogram_1d()))
})
test_that("get_hdr() errors informatively if bad `method` argument", {
x <- 1:10
expect_error(get_hdr_1d(x, method = "not-a-method"), regexp = "Invalid method specified")
expect_error(get_hdr_1d(x, method = method_kde_1d), regexp = "did you forget")
expect_error(get_hdr_1d(x, method = method_kde()), regexp = "1d")
})
test_that("get_hdr_1d() fails if `method != 'fun' and `x` isn't provided", {
expect_error(get_hdr_1d(method = method_kde_1d()), regexp = ".x. must be provided")
})
test_that("fun argument of get_hdr_1d() requires range", {
expect_error(get_hdr_1d(method = "fun", fun = dexp), regexp = ".range. must be provided")
})
test_that("fun argument of get_hdr_1d() works", {
res <- get_hdr_1d(method = "fun", fun = dexp, range = c(0, 10))
# Structure of res is as expected
expect_type(res, "list")
expect_equal(length(res), 3)
expect_equal(names(res), c("df_est", "breaks", "data"))
expect_null(res$data)
# fhat_discretized should be normalized to sum to 1
expect_equal(sum(res$df_est$fhat_discretized), 1)
expect_equal(range(res$df_est$x), c(0, 10))
# default grid is 512:
expect_equal(nrow(res$df_est), 512)
# Checksums:
expect_equal(round(sum(res$df_est$fhat)), 52)
expect_equal(as.numeric(round(res$breaks, 4)), c(0.0101, 0.0501, 0.201, 0.5041, Inf))
expect_equal(sort(unique(res$df_est$hdr)), c(.5, .8, .95, .99, 1))
expect_equal(as.numeric(table(res$df_est$hdr)), c(36, 47, 71, 82, 276))
})
================================================
FILE: tests/testthat/test-layer-wrappers.R
================================================
test_that("wrapper functions for `layer()` are passing arguments on as expected", {
df <- readRDS(test_path("fixtures", "df_norm.rds"))
check_layer <- function(layer_fun, Geom, Stat, mapping = aes(x, y), data = df, ...) {
hdr_layer <- layer_fun(data = data, mapping = mapping, ...)
expect_type(hdr_layer, "environment")
expect_identical(hdr_layer$geom, Geom)
expect_identical(hdr_layer$stat, Stat)
expect_identical(hdr_layer$mapping, mapping)
}
# 2-d layer functions -----------------------------------------------------
# geom/stat_hdr()
check_layer(geom_hdr, GeomHdr, StatHdr)
check_layer(stat_hdr, GeomHdr, StatHdr)
# geom/stat_hdr_lines()
check_layer(geom_hdr_lines, GeomHdrLines, StatHdrLines)
check_layer(stat_hdr_lines, GeomHdrLines, StatHdrLines)
# geom/stat_hdr_points()
check_layer(geom_hdr_points, GeomPoint, StatHdrPoints)
check_layer(stat_hdr_points, GeomPoint, StatHdrPoints)
# geom/stat_hdr_lines_fun()
# -- stat_hdr_points_fun needs to have a `fun` arg provided
check_layer(geom_hdr_points_fun, GeomPoint, StatHdrPointsFun)
check_layer(stat_hdr_points_fun, GeomPoint, StatHdrPointsFun, fun = function(x, y) dnorm(x) * dnorm(y))
# geom/stat_hdr_fun()
# (stat_hdr_fun needs to have a `fun` arg provided)
check_layer(geom_hdr_fun, GeomHdrFun, StatHdrFun)
check_layer(stat_hdr_fun, GeomHdrFun, StatHdrFun, fun = function(x, y) dnorm(x) * dnorm(y))
# -- checking that data doesn't need to be provided
check_layer(geom_hdr_fun, GeomHdrFun, StatHdrFun, data = NULL, mapping = NULL)
check_layer(stat_hdr_fun, GeomHdrFun, StatHdrFun, data = NULL, mapping = NULL, fun = function(x, y) dnorm(x) * dnorm(y))
# geom/stat_hdr_lines_fun()
# -- stat_hdr_lines_fun needs to have a `fun` arg provided
check_layer(geom_hdr_lines_fun, GeomHdrLinesFun, StatHdrLinesFun)
check_layer(stat_hdr_lines_fun, GeomHdrLinesFun, StatHdrLinesFun, fun = function(x, y) dnorm(x) * dnorm(y))
# -- checking that data doesn't need to be provided
check_layer(geom_hdr_lines_fun, GeomHdrLinesFun, StatHdrLinesFun, data = NULL, mapping = NULL)
check_layer(stat_hdr_lines_fun, GeomHdrLinesFun, StatHdrLinesFun, data = NULL, mapping = NULL, fun = function(x, y) dnorm(x) * dnorm(y))
# 1-d layer functions -----------------------------------------------------
# geom/stat_hdr_rug()
check_layer(geom_hdr_rug, GeomHdrRug, StatHdrRug)
check_layer(stat_hdr_rug, GeomHdrRug, StatHdrRug)
# -- checking that single x/y aesthetics are allowed:
check_layer(geom_hdr_rug, GeomHdrRug, StatHdrRug, mapping = aes(x))
check_layer(stat_hdr_rug, GeomHdrRug, StatHdrRug, mapping = aes(x))
check_layer(geom_hdr_rug, GeomHdrRug, StatHdrRug, mapping = aes(y))
check_layer(stat_hdr_rug, GeomHdrRug, StatHdrRug, mapping = aes(y))
# geom/stat_hdr_rug_fun()
check_layer(geom_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun)
check_layer(stat_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun)
# -- checking that single x/y aesthetics are allowed:
check_layer(geom_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun, mapping = aes(x))
check_layer(stat_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun, mapping = aes(x))
check_layer(geom_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun, mapping = aes(y))
check_layer(stat_hdr_rug_fun, GeomHdrRugFun, StatHdrRugFun, mapping = aes(y))
})
================================================
FILE: tests/testthat/test-res_to_df.R
================================================
test_that("res_to_df returns correct structure for each value of output", {
data <- readRDS(test_path("fixtures", "df_norm.rds"))
probs <- c(.99, .95, .80, .50)
res <- get_hdr(data, method_kde(), probs)
# Checking output == "bands"
df_bands <- res_to_df(res, probs, group = 1, output = "bands")
expect_type(df_bands, "list")
expect_equal(colnames(df_bands), c("x", "y", "piece", "group", "subgroup", ".size", "probs"))
expect(is.ordered(df_bands$probs), "probs is an ordered object")
expect_equal(levels(df_bands$probs), scales::percent_format(accuracy = 1)(probs))
# Checking output == "lines"
df_lines <- res_to_df(res, probs, group = 1, output = "lines")
expect_type(df_lines, "list")
expect_equal(colnames(df_lines), c("x", "y", "piece", "group", ".size", "probs"))
expect(is.ordered(df_lines$probs), "probs is an ordered object")
expect_equal(levels(df_lines$probs), scales::percent_format(accuracy = 1)(probs))
# Checking output == "points"
df_points <- res_to_df(res, probs, group = 1, output = "points")
expect_type(df_points, "list")
expect_equal(colnames(df_points), c("x", "y", "probs"))
expect(is.ordered(df_points$probs), "probs is an ordered object")
expect_equal(levels(df_points$probs), scales::percent_format(accuracy = 1)(c(1, probs)))
})
================================================
FILE: tests/testthat/test-res_to_df_1d.R
================================================
test_that("res_to_df_1d returns correct structure for each value of output", {
data <- readRDS(test_path("fixtures", "df_norm.rds"))
probs <- c(.99, .95, .80, .50)
res <- get_hdr_1d(data$x, method_kde_1d(), probs)
# Checking output == "rug"
df_rug <- res_to_df_1d(res, probs, group = 1, output = "rug")
expect_type(df_rug, "list")
expect_equal(colnames(df_rug), c("x", "fhat", "fhat_discretized", "probs"))
expect(is.ordered(df_rug$probs), "probs is an ordered object")
expect_equal(levels(df_rug$probs), scales::percent_format(accuracy = 1)(probs))
})
================================================
FILE: tests/testthat/test-visual-tests.R
================================================
## Checking basic plots with vdiffr::expect_doppelganger()
test_that("Basic 2d HDRs render consistently", {
# platform-dependent snapshots for GitHub actions
# in non-CI context, use default directory for snapshots
if (Sys.getenv("CI") == "true") {
snapshot_variant <- Sys.getenv("RUNNER_OS")
} else {
snapshot_variant <- NULL
}
data <- readRDS(test_path("fixtures", "df_norm.rds"))
# geom/stat_hdr
geom_hdr_ggplot <- ggplot(data, aes(x, y)) + geom_hdr()
stat_hdr_ggplot <- ggplot(data, aes(x, y)) + stat_hdr()
vdiffr::expect_doppelganger("geom-hdr-ggplot", geom_hdr_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-ggplot", stat_hdr_ggplot, variant = snapshot_variant)
# geom/stat_hdr_lines
geom_hdr_lines_ggplot <- ggplot(data, aes(x, y)) + geom_hdr_lines()
stat_hdr_lines_ggplot <- ggplot(data, aes(x, y)) + stat_hdr_lines()
vdiffr::expect_doppelganger("geom-hdr_lines-ggplot", geom_hdr_lines_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr_lines-ggplot", stat_hdr_lines_ggplot, variant = snapshot_variant)
# geom/stat_hdr_points
geom_hdr_points_ggplot <- ggplot(data, aes(x, y)) + geom_hdr_points()
stat_hdr_points_ggplot <- ggplot(data, aes(x, y)) + stat_hdr_points()
vdiffr::expect_doppelganger("geom-hdr-points-ggplot", geom_hdr_points_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-points-ggplot", stat_hdr_points_ggplot, variant = snapshot_variant)
# geom/stat_hdr_points_fun
geom_hdr_points_fun_ggplot <- ggplot(data, aes(x, y)) + geom_hdr_points_fun(fun = function(x, y) dnorm(x) * dnorm(y))
stat_hdr_points_fun_ggplot <- ggplot(data, aes(x, y)) + stat_hdr_points_fun(fun = function(x, y) dnorm(x) * dnorm(y))
vdiffr::expect_doppelganger("geom-hdr-points-fun-ggplot", geom_hdr_points_fun_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-points-fun-ggplot", stat_hdr_points_fun_ggplot, variant = snapshot_variant)
# geom/stat_hdr_fun
geom_hdr_fun_ggplot <- ggplot() +
geom_hdr_fun(fun = function(x, y) dnorm(x) * dnorm(y), xlim = c(-5, 5), ylim = c(-5, 5))
stat_hdr_fun_ggplot <- ggplot() +
stat_hdr_fun(fun = function(x, y) dnorm(x) * dnorm(y), xlim = c(-5, 5), ylim = c(-5, 5))
vdiffr::expect_doppelganger("geom-hdr-fun-ggplot", geom_hdr_fun_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-fun-ggplot", stat_hdr_fun_ggplot, variant = snapshot_variant)
})
test_that("Basic 1d HDRs render consistently", {
# platform-dependent snapshots for GitHub actions
# in non-CI context, use default directory for snapshots
if (Sys.getenv("CI") == "true") {
snapshot_variant <- Sys.getenv("RUNNER_OS")
} else {
snapshot_variant <- NULL
}
data <- readRDS(test_path("fixtures", "df_norm.rds"))
# geom/stat_hdr_rug
geom_hdr_rug_ggplot <- ggplot(data, aes(x, y)) + geom_hdr_rug()
stat_hdr_rug_ggplot <- ggplot(data, aes(x, y)) + stat_hdr_rug()
vdiffr::expect_doppelganger("geom-hdr-rug-ggplot", geom_hdr_rug_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-rug-ggplot", stat_hdr_rug_ggplot, variant = snapshot_variant)
# geom/stat_hdr_rug_fun
geom_hdr_rug_fun_ggplot <- ggplot() +
geom_hdr_rug_fun(fun_x = dnorm, fun_y = dexp, xlim = c(-5, 5), ylim = c(0, 10))
stat_hdr_rug_fun_ggplot <- ggplot() +
stat_hdr_rug_fun(fun_x = dnorm, fun_y = dexp, xlim = c(-5, 5), ylim = c(0, 10))
vdiffr::expect_doppelganger("geom-hdr-rug-fun-ggplot", geom_hdr_rug_fun_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("stat-hdr-rug-fun-ggplot", stat_hdr_rug_fun_ggplot, variant = snapshot_variant)
})
test_that("Specified order of probabilities doesn't impact legend ordering", {
# platform-dependent snapshots for GitHub actions
# in non-CI context, use default directory for snapshots
if (Sys.getenv("CI") == "true") {
snapshot_variant <- Sys.getenv("RUNNER_OS")
} else {
snapshot_variant <- NULL
}
data <- readRDS(test_path("fixtures", "df_norm.rds"))
geom_hdr_prob_order_ggplot <- ggplot(data, aes(x, y)) +
geom_hdr(probs = c(.25, .5, .75, .95))
geom_hdr_rug_prob_order_ggplot <- ggplot(data, aes(x, y)) +
geom_hdr_rug(probs = c(.25, .5, .75, .95))
vdiffr::expect_doppelganger("geom_hdr_prob_order_ggplot", geom_hdr_prob_order_ggplot, variant = snapshot_variant)
vdiffr::expect_doppelganger("geom_hdr_rug_prob_order_ggplot", geom_hdr_rug_prob_order_ggplot, variant = snapshot_variant)
})
================================================
FILE: tests/testthat.R
================================================
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/tests.html
# * https://testthat.r-lib.org/reference/test_package.html#special-files
library(testthat)
library(ggdensity)
test_check("ggdensity")
================================================
FILE: vignettes/.gitignore
================================================
*.html
*.R
================================================
FILE: vignettes/method.Rmd
================================================
---
title: "The method argument"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{The method argument}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
fig.align = "center",
dpi = 160,
out.width = "80%",
collapse = TRUE,
comment = "#>"
)
```
Almost every function in **ggdensity** accepts a `method` argument---this is true for `geom_hdr()` and other layer functions
(`geom_hdr_lines()`, `geom_hdr_points()`, ...),
as well as `get_hdr()` and `get_hdr_1d()`.
This vignette summarizes the many ways in which the `method` argument can be specified;
first looking at it from a more basic perspective,
then from the perspective of a developer wanting to implement additional estimators.
## Using **ggdensity**'s `method_*()` functions
First, let's load the necessary packages and generate some sample data.
```{r setup}
library("ggdensity"); theme_set(theme_minimal(8))
theme_update(legend.position = "none") # Suppressing legends for readability
```
```{r}
set.seed(1)
df <- data.frame(x = rnorm(500), y = rnorm(500))
p <- ggplot(df, aes(x, y))
p + geom_point()
```
The easiest way to plot HDRs with `geom_hdr()` (or any other layer function from **ggdensity**) with a specified density estimator
is to provide a character object to the `method` argument:
```{r, fig.show="hold", out.width="45%", fig.align = "default"}
p + geom_hdr(method = "kde")
p + geom_hdr(method = "mvnorm")
p + geom_hdr(method = "histogram")
p + geom_hdr(method = "freqpoly")
```
However, as of **ggdensity** v1.0.0 there is an alternative approach---providing a `method_*()` function call:
```{r, fig.show="hold", out.width="45%", fig.align = "default"}
p + geom_hdr(method = method_kde())
p + geom_hdr(method = method_mvnorm())
p + geom_hdr(method = method_histogram())
p + geom_hdr(method = method_freqpoly())
```
The default behaviors of these two approaches are the same and always will be---in this way, they are completely interchangeable.
However, the `method_*()` function call is required to estimate HDRs with non-default estimator parameters.
For example, we can set the `adjust` parameter to apply a multiplicative adjustment to the heuristically determined bandwidth in `method_kde()` (which itself uses the one computed by `MASS::bandwidth.nrd()`):
```{r}
p + geom_hdr(method = method_kde(adjust = 1/2))
```
The relevant parameters for each method are documented in their respective `?method_*` help pages.
Note that these parameters can not be provided to `geom_hdr()` or `stat_hdr()` and thus are not accessible if
a character value is provided to `method`.
The `method` argument of `get_hdr()` functions in the same way:
```{r}
res <- get_hdr(df, method = method_kde(adjust = 1/2))
str(res)
```
For details on the output of `get_hdr()`, see `?get_hdr`.
### `method_*_1d()` functions
In **ggdensity**, it is possible to estimate and plot 1-dimensional HDRs with `geom_hdr_rug()` and `get_hdr_1d()`.
These functions also accept a `method` argument, but they do not accept the previously discussed `method_*()` functions.
Instead they accept the 1-dimensional analogues: `method_*_1d()`.
```{r, fig.show="hold", out.width="45%", fig.align = "default"}
p +
geom_point() +
geom_hdr_rug(method = method_kde_1d())
p +
geom_point() +
geom_hdr_rug(method = method_norm_1d())
p +
geom_point() +
geom_hdr_rug(method = method_histogram_1d())
p +
geom_point() +
geom_hdr_rug(method = method_freqpoly_1d())
```
Just like we saw with `geom_hdr()`, `geom_hdr_rug()` also accepts character values for `method`:
```{r, fig.show="hold", out.width="45%", fig.align = "default"}
p +
geom_point() +
geom_hdr_rug(method = "kde")
p +
geom_point() +
geom_hdr_rug(method = "norm")
p +
geom_point() +
geom_hdr_rug(method = "histogram")
p +
geom_point() +
geom_hdr_rug(method = "freqpoly")
```
Because the return values of the `method_*()` functions are incompatible with the 1-dimensional
HDR estimation procedure, if a 2-dimensional method is specified the following error message is issued:
```{r, fig.show = "hide"}
p +
geom_point() +
geom_hdr_rug(method = method_kde())
```
Lastly, we see that the `method` argument of `get_hdr_1d()` behaves similarly.
```{r}
res <- get_hdr_1d(df$x, method = method_kde_1d())
str(res)
```
Again, for details on the above output of `get_hdr_1d()`, see `?get_hdr_1d`.
## A detailed look at `method_*()` functions
Now that we understand the ways in which `method` can be specified
let's look at the internals of the `method_*()` functions.
Note: the implementations discussed in this section depend heavily on topics in functional programming,
especially
[closures](https://adv-r.hadley.nz/environments.html?q=closures#function-environments) and
[function factories](https://adv-r.hadley.nz/function-factories.html).
While not necessary, a good understanding of these ideas is helpful---the
linked chapters from Hadley Wickham's *Advanced R* are a great place to start.
Looking at the definition of `method_kde()`, we see that it is a function of `h` and `adjust`,
returning a closure with arguments `data`, `n`, `rangex`, and `rangey`.
The closure passes the `x` and `y` columns of `data` to `MASS::kde2d()`,
returning the estimated density evaluated on a grid with columns `x`, `y`, and `fhat`.
This closure is what `geom_hdr()` expects as its `method` argument,
and is how the HDRs are estimated (via `get_hdr()`).
```{r, collapse = TRUE, comment = ""}
method_kde
```
Both `method_histogram()` and `method_freqpoly()` behave similarly,
accepting parameters governing the density estimation procedure and returning a closure with arguments
`data`, `n`, `rangex`, and `rangey`.
However, these functions are significantly more complicated as
the density estimation procedures are implemented entirely in **ggdensity**.
`method_mvnorm()` is different in a few ways.
The closure it returns is a function of just one argument: `data`.
This is because it does not return the estimated density evaluated on a grid.
Instead, it returns yet another closure with (vectorized) arguments `x` and `y`.
As in `method_kde()`, the return value of the closure is a representation of the estimated pdf.
The difference is the manner in which the pdf is represented.
Whereas before we had a pdf defined by a discrete approximation on a grid,
we now have an explicit definition of the pdf in terms of `x` and `y`.
```{r, collapse = TRUE, comment = ""}
method_mvnorm
```
To summarize each of the above cases:
in the first example, the `method_*()` function returned a closure with arguments
`data`, `n`, `rangex`, and `rangey` which itself returned the estimated density evaluated on a grid;
in the second, the `method_*()` function returned a closure with a single argument, `data`,
which itself returned a closure with arguments `x` and `y`, representing the estimated density explicitly.
In both cases, the `method_*()` function can have any number of parameters governing the density estimation procedure.
These are the two ways the `method` argument may be specified.
The first is necessary for cases in which an explicit definition of the estimated density is not computationally feasible
(for example, KDEs).
The second is an easier option for the cases in which a closed form of the estimated density is available
(for example, parametric estimators).
Let's look at how we might define our own `method_*()` functions in each case,
beginning with a simple parametric estimator.
### Implementing a method returning a PDF
In **ggdensity**, `method_mvnorm()` estimates HDRs based on the parametric multivariate normal model.
If we wanted to fit a simpler model in which the data is further assumed to be independent,
we could implement `method_mvnorm_ind()`.
```{r}
method_mvnorm_ind <- function() {
function(data) {
xbar <- mean(data$x)
ybar <- mean(data$y)
sx <- sd(data$x)
sy <- sd(data$y)
# joint pdf is simply the product of the marginals
function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
}
}
```
To use our `method_mvnorm_ind()`, we just need to supply it to `geom_hdr()`'s `method` argument.
```{r}
ggplot(df, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind())
```
If we transform our data to have non-zero covariance
we still see the major and minor axes of the contours coincide with the plot axes---exactly
what we would expect with this (incorrectly) constrained model.
```{r}
A <- matrix(c(
2*cos(pi/6), -2*sin(pi/6),
1*sin(pi/6), 1*cos(pi/6)
), byrow = TRUE, ncol = 2)
df_rot <- as.data.frame(as.matrix(df) %*% A)
colnames(df_rot) <- c("x", "y")
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind()) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```
Notice, `method_mvnorm_ind()` accepts no arguments.
The density estimation procedure is so simple that there are no parameters to govern it.
To allow for circular models in which the fitted variances are required to be equal,
we can implement a `circular` argument.
```{r}
method_mvnorm_ind <- function(circular = FALSE) {
function(data) {
xbar <- mean(data$x)
ybar <- mean(data$y)
if (circular) {
sx <- sd(c(data$x - xbar, data$y - ybar))
sy <- sx
} else {
sx <- sd(data$x)
sy <- sd(data$y)
}
function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
}
}
```
Now, the contours are perfectly circular.
```{r}
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE)) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```
In the above plot, the upper and lower portions of the HDRs are cut off.
This is because the default behavior of **ggdensity** is to not draw HDRs outside of the "bounding box" around observed data.
This is *not* because we are using a custom `method_*()` function.
To fix this, we need to either set a better `ylim` value for `geom_hdr()` or specify a larger range in `scale_y_continuous()`.
```{r, fig.show="hold", out.width="45%", fig.align = "default"}
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE), ylim = c(-6, 6)) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE)) +
geom_point(size = .4) +
scale_y_continuous(limits = c(-6, 6)) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```
Notice, neither of these approaches involve arguments to `method_mvnorm_ind()`.
Internally, the closure returned by `method_mvnorm_ind()` is used by `get_hdr()`,
along with information from the `scales` associated with the `ggplot` object.
It is the `scales` that need adjusting, not anything related to the `method` argument.
### Implementing a method returning an evaluated PDF
To illustrate the other case, in which the object returned by the closure is the estimated density evaluated on a grid, we implement `method_mvnorm_ind_grid()`.
This estimates the same independent normal density as `method_mvnorm_ind()`,
the only difference is the behavior of the returned closure.
```{r}
method_mvnorm_ind_grid <- function() {
function(data, n, rangex, rangey) {
# First, we estimate the density -----------------------------
xbar <- mean(data$x)
ybar <- mean(data$y)
sx <- sd(data$x)
sy <- sd(data$y)
f_est <- function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
# Return the density evaluated on a grid ---------------------
# df_grid defined by rangex, rangey, and n
df_grid <- expand.grid(
x = seq(rangex[1], rangex[2], length.out = n),
y = seq(rangey[1], rangey[2], length.out = n)
)
df_grid$fhat <- f_est(df_grid$x, df_grid$y)
df_grid
}
}
```
See that returned closure has additional arguments `n`, `rangex`, and `rangey` which define the grid.
Also, the grid is represented a `data.frame` with columns `x`, `y`, and `fhat`, where `fhat` is the (potentially unnormalized) density estimate.
Again, to use our `method_mvnorm_ind_grid()` we provide it to `geom_hdr()`’s method argument.
```{r}
ggplot(df, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind_grid())
```
Like we saw in the previous example, we could prevent the HDRs from being "cut off" by specifying either the `x/ylim` arguments in `geom_hdr()` or
by setting a larger range in `scale_x/y_continuous()`.
## The `method_*_1d()` functions
We saw before that **ggdensity** uses `method_*_1d()` functions for the estimation of 1-dimensional densities.
The internals of these functions are very similar to the `method_*()` functions,
the only differences are slight changes to the arguments and return values of the returned closures.
Looking at the definition of `method_kde_1d()`, we see the returned closure has arguments `x`, `n`, and `range`.
This is very similar to `method_kde()`, the only difference is we are now dealing with univariate data:
the vector argument `x` is used instead of `data`, and we have a single `range` parameter instead of `rangex` and `rangey`.
Similarly, the closure now returns the estimated density evaluated on a univariate grid, with columns `x` and `fhat` instead of the bivariate grid with columns `x`, `y`, and `fhat`.
Finally, see that `method_kde_1d()` accepts several arguments governing the density estimation procedure just like `method_kde()`.
```{r, collapse = TRUE, comment = ""}
method_kde_1d
```
Estimated univariate densities can also be represented explicitly, as illustrated by `method_norm_1d()`.
Comparing this to the previously discussed `method_mvnorm()` we see that little has changed:
the closure is now a function of a vector `x` instead of `data` and returns a function of one variable (`x`) instead of two (`x` and `y`).
```{r, collapse = TRUE, comment = ""}
method_norm_1d
```
Additional `method_*_1d()` functions can be implemented in the same way as the 2-dimensional `method_*()` functions,
so long as the returned closure is structured in one of the two ways we have seen here.