} (character versus factor).
See \code{vignette("types")} for an overview of common
type abbreviations.
}
Printing can be tweaked for a one-off call by calling \code{print()} explicitly
and setting arguments like \code{n} and \code{width}. More persistent control is
available by setting the options described in \link[pillar:pillar_options]{pillar::pillar_options}.
See also \code{vignette("digits")} for a comparison to base options,
and \code{vignette("numbers")} that showcases \code{\link[tibble:num]{num()}} and \code{\link[tibble:char]{char()}}
for creating columns with custom formatting options.
As of tibble 3.1.0, printing is handled entirely by the \pkg{pillar} package.
If you implement a package that extends tibble,
the printed output can be customized in various ways.
See \code{vignette("extending", package = "pillar")} for details,
and \link[pillar:pillar_options]{pillar::pillar_options} for options that control the display in the console.
}
\examples{
data(pbmc_small)
print(pbmc_small)
}
================================================
FILE: man/fragments/intro.Rmd
================================================
**Brings Seurat to the tidyverse!**
website: [stemangiola.github.io/tidyseurat/](https://stemangiola.github.io/tidyseurat/)
Please also have a look at
- [tidyseurat](https://stemangiola.github.io/tidyseurat/) for tidy single-cell RNA sequencing analysis
- [tidySummarizedExperiment](https://tidyomics.github.io/tidySummarizedExperiment/) for tidy bulk RNA sequencing analysis
- [tidybulk](https://tidyomics.github.io/tidybulk/) for tidy bulk RNA-seq analysis
- [tidygate](https://github.com/stemangiola/tidygate/) for adding custom gate information to your tibble
- [tidyHeatmap](https://stemangiola.github.io/tidyHeatmap/) for heatmaps produced with tidy principles
```{r, echo=FALSE, include=FALSE, }
library(knitr)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```

# Introduction
tidyseurat provides a bridge between the Seurat single-cell package [@butler2018integrating; @stuart2019comprehensive] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the
Seurat object as a tidyverse tibble, and provides Seurat-compatible *dplyr*, *tidyr*, *ggplot* and *plotly* functions.
## Functions/utilities available
Seurat-compatible Functions | Description
------------ | -------------
`all` |
tidyverse Packages | Description
------------ | -------------
`dplyr` | All `dplyr` APIs like for any tibble
`tidyr` | All `tidyr` APIs like for any tibble
`ggplot2` | `ggplot` like for any tibble
`plotly` | `plot_ly` like for any tibble
Utilities | Description
------------ | -------------
`tidy` | Add `tidyseurat` invisible layer over a Seurat object
`as_tibble` | Convert cell-wise information to a `tbl_df`
`join_features` | Add feature-wise information, returns a `tbl_df`
`aggregate_cells`| Aggregate cell gene-transcription abundance as pseudobulk tissue |
## Installation
From CRAN
```{r eval=FALSE}
install.packages("tidyseurat")
```
From Github (development)
```{r, eval=FALSE}
devtools::install_github("stemangiola/tidyseurat")
```
```{r}
library(dplyr)
library(tidyr)
library(purrr)
library(magrittr)
library(ggplot2)
library(Seurat)
library(tidyseurat)
```
## Create `tidyseurat`, the best of both worlds!
This is a seurat object but it is evaluated as tibble. So it is fully compatible both with Seurat and tidyverse APIs.
```{r}
pbmc_small = SeuratObject::pbmc_small
```
**It looks like a tibble**
```{r}
pbmc_small
```
**But it is a Seurat object after all**
```{r}
pbmc_small@assays
```
# Preliminary plots
Set colours and theme for plots.
```{r}
# Use colourblind-friendly colours
friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")
# Set theme
my_theme <-
list(
scale_fill_manual(values = friendly_cols),
scale_color_manual(values = friendly_cols),
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(),
panel.grid.major = element_line(size = 0.2),
panel.grid.minor = element_line(size = 0.1),
text = element_text(size = 12),
legend.position = "bottom",
aspect.ratio = 1,
strip.background = element_blank(),
axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
)
)
```
We can treat `pbmc_small` effectively as a normal tibble for plotting.
Here we plot number of features per cell.
```{r plot1}
pbmc_small %>%
ggplot(aes(nFeature_RNA, fill = groups)) +
geom_histogram() +
my_theme
```
Here we plot total features per cell.
```{r plot2}
pbmc_small %>%
ggplot(aes(groups, nCount_RNA, fill = groups)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.1) +
my_theme
```
Here we plot abundance of two features for each group.
```{r}
pbmc_small %>%
join_features(features = c("HLA-DRA", "LYZ"), shape = "long") %>%
ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
scale_y_log10() +
my_theme
```
# Preprocess the dataset
Also you can treat the object as Seurat object and proceed with data processing.
```{r preprocess}
pbmc_small_pca <-
pbmc_small %>%
SCTransform(verbose = FALSE) %>%
FindVariableFeatures(verbose = FALSE) %>%
RunPCA(verbose = FALSE)
pbmc_small_pca
```
If a tool is not included in the tidyseurat collection, we can use `as_tibble` to permanently convert `tidyseurat` into tibble.
```{r pc_plot}
pbmc_small_pca %>%
as_tibble() %>%
select(contains("PC"), everything()) %>%
GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +
my_theme
```
# Identify clusters
We proceed with cluster identification with Seurat.
```{r cluster}
pbmc_small_cluster <-
pbmc_small_pca %>%
FindNeighbors(verbose = FALSE) %>%
FindClusters(method = "igraph", verbose = FALSE)
pbmc_small_cluster
```
Now we can interrogate the object as if it was a regular tibble data frame.
```{r cluster count}
pbmc_small_cluster %>%
count(groups, seurat_clusters)
```
We can identify cluster markers using Seurat.
`r if (packageVersion("Seurat") >= package_version("4.0.0")) {""}`
`r if (packageVersion("Seurat") < package_version("4.0.0")) {""}`
# Reduce dimensions
We can calculate the first 3 UMAP dimensions using the Seurat framework.
```{r umap, eval=FALSE}
pbmc_small_UMAP <-
pbmc_small_cluster %>%
RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L)
```
And we can plot them using 3D plot using plotly.
```{r umap plot, eval=FALSE}
pbmc_small_UMAP %>%
plot_ly(
x = ~`UMAP_1`,
y = ~`UMAP_2`,
z = ~`UMAP_3`,
color = ~seurat_clusters,
colors = friendly_cols[1:4]
)
```

## Cell type prediction
We can infer cell type identities using *SingleR* [@aran2019reference] and manipulate the output using tidyverse.
```{r eval=FALSE}
# Get cell type reference data
blueprint <- celldex::BlueprintEncodeData()
# Infer cell identities
cell_type_df <-
GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
log1p() %>%
Matrix::Matrix(sparse = TRUE) %>%
SingleR::SingleR(
ref = blueprint,
labels = blueprint$label.main,
method = "single"
) %>%
as.data.frame() %>%
as_tibble(rownames = "cell") %>%
select(cell, first.labels)
```
```{r, eval=FALSE}
# Join UMAP and cell type info
pbmc_small_cell_type <-
pbmc_small_UMAP %>%
left_join(cell_type_df, by = "cell")
# Reorder columns
pbmc_small_cell_type %>%
select(cell, first.labels, everything())
```
We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.
```{r, eval=FALSE}
pbmc_small_cell_type %>%
count(seurat_clusters, first.labels)
```
We can easily reshape the data for building information-rich faceted plots.
```{r eval=FALSE}
pbmc_small_cell_type %>%
# Reshape and add classifier column
pivot_longer(
cols = c(seurat_clusters, first.labels),
names_to = "classifier", values_to = "label"
) %>%
# UMAP plots for cell type and cluster
ggplot(aes(UMAP_1, UMAP_2, color = label)) +
geom_point() +
facet_wrap(~classifier) +
my_theme
```
We can easily plot gene correlation per cell category, adding multi-layer annotations.
```{r eval=FALSE}
pbmc_small_cell_type %>%
# Add some mitochondrial abundance values
mutate(mitochondrial = rnorm(n())) %>%
# Plot correlation
join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
geom_point() +
facet_wrap(~first.labels, scales = "free") +
scale_x_log10() +
scale_y_log10() +
my_theme
```
# Nested analyses
A powerful tool we can use with tidyseurat is `nest`. We can easily perform independent analyses on subsets of the dataset. First we classify cell types in lymphoid and myeloid; then, nest based on the new classification
```{r eval=FALSE}
pbmc_small_nested <-
pbmc_small_cell_type %>%
filter(first.labels != "Erythrocytes") %>%
mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
nest(data = -cell_class)
pbmc_small_nested
```
Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and Seurat seamlessly.
```{r eval=FALSE}
pbmc_small_nested_reanalysed <-
pbmc_small_nested %>%
mutate(data = map(
data, ~ .x %>%
FindVariableFeatures(verbose = FALSE) %>%
RunPCA(npcs = 10, verbose = FALSE) %>%
FindNeighbors(verbose = FALSE) %>%
FindClusters(method = "igraph", verbose = FALSE) %>%
RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)
))
pbmc_small_nested_reanalysed
```
Now we can unnest and plot the new classification.
```{r eval=FALSE}
pbmc_small_nested_reanalysed %>%
# Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
mutate(data = map(data, ~ .x %>% as_tibble())) %>%
unnest(data) %>%
# Define unique clusters
unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%
# Plotting
ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
geom_point() +
facet_wrap(~cell_class) +
my_theme
```
# Aggregating cells
Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.
In tidyseurat, cell aggregation can be achieved using the `aggregate_cells` function.
```{r, eval=FALSE}
pbmc_small %>%
aggregate_cells(groups, assays = "RNA")
```
================================================
FILE: man/full_join.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{full_join}
\alias{full_join}
\alias{full_join.Seurat}
\title{Mutating joins}
\usage{
\method{full_join}{Seurat}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
}
\arguments{
\item{x, y}{A pair of data frames, data frame extensions (e.g. a tibble), or
lazy data frames (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{by}{A join specification created with \code{\link[dplyr:join_by]{join_by()}}, or a character
vector of variables to join by.
If \code{NULL}, the default, \verb{*_join()} will perform a natural join, using all
variables in common across \code{x} and \code{y}. A message lists the variables so
that you can check they're correct; suppress the message by supplying \code{by}
explicitly.
To join on different variables between \code{x} and \code{y}, use a \code{\link[dplyr:join_by]{join_by()}}
specification. For example, \code{join_by(a == b)} will match \code{x$a} to \code{y$b}.
To join by multiple variables, use a \code{\link[dplyr:join_by]{join_by()}} specification with
multiple expressions. For example, \code{join_by(a == b, c == d)} will match
\code{x$a} to \code{y$b} and \code{x$c} to \code{y$d}. If the column names are the same between
\code{x} and \code{y}, you can shorten this by listing only the variable names, like
\code{join_by(a, c)}.
\code{\link[dplyr:join_by]{join_by()}} can also be used to perform inequality, rolling, and overlap
joins. See the documentation at \link[dplyr:join_by]{?join_by} for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, \code{by = c("a", "b")} joins \code{x$a}
to \code{y$a} and \code{x$b} to \code{y$b}. If variable names differ between \code{x} and \code{y},
use a named character vector like \code{by = c("x_a" = "y_a", "x_b" = "y_b")}.
To perform a cross-join, generating all combinations of \code{x} and \code{y}, see
\code{\link[dplyr:cross_join]{cross_join()}}.}
\item{copy}{If \code{x} and \code{y} are not from the same data source,
and \code{copy} is \code{TRUE}, then \code{y} will be copied into the
same src as \code{x}. This allows you to join tables across srcs, but
it is a potentially expensive operation so you must opt into it.}
\item{suffix}{If there are non-joined duplicate variables in \code{x} and
\code{y}, these suffixes will be added to the output to disambiguate them.
Should be a character vector of length 2.}
\item{...}{Other parameters passed onto methods.}
}
\value{
An object of the same type as \code{x} (including the same groups). The order of
the rows and columns of \code{x} is preserved as much as possible. The output has
the following properties:
\itemize{
\item The rows are affect by the join type.
\itemize{
\item \code{inner_join()} returns matched \code{x} rows.
\item \code{left_join()} returns all \code{x} rows.
\item \code{right_join()} returns matched of \code{x} rows, followed by unmatched \code{y} rows.
\item \code{full_join()} returns all \code{x} rows, followed by unmatched \code{y} rows.
}
\item Output columns include all columns from \code{x} and all non-key columns from
\code{y}. If \code{keep = TRUE}, the key columns from \code{y} are included as well.
\item If non-key columns in \code{x} and \code{y} have the same name, \code{suffix}es are added
to disambiguate. If \code{keep = TRUE} and key columns in \code{x} and \code{y} have
the same name, \code{suffix}es are added to disambiguate these as well.
\item If \code{keep = FALSE}, output columns included in \code{by} are coerced to their
common type between \code{x} and \code{y}.
}
}
\description{
Mutating joins add columns from \code{y} to \code{x}, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
\subsection{Inner join}{
An \code{inner_join()} only keeps observations from \code{x} that have a matching key
in \code{y}.
The most important property of an inner join is that unmatched rows in either
input are not included in the result. This means that generally inner joins
are not appropriate in most analyses, because it is too easy to lose
observations.
}
\subsection{Outer joins}{
The three outer joins keep observations that appear in at least one of the
data frames:
\itemize{
\item A \code{left_join()} keeps all observations in \code{x}.
\item A \code{right_join()} keeps all observations in \code{y}.
\item A \code{full_join()} keeps all observations in \code{x} and \code{y}.
}
}
}
\section{Many-to-many relationships}{
By default, dplyr guards against many-to-many relationships in equality joins
by throwing a warning. These occur when both of the following are true:
\itemize{
\item A row in \code{x} matches multiple rows in \code{y}.
\item A row in \code{y} matches multiple rows in \code{x}.
}
This is typically surprising, as most joins involve a relationship of
one-to-one, one-to-many, or many-to-one, and is often the result of an
improperly specified join. Many-to-many relationships are particularly
problematic because they can result in a Cartesian explosion of the number of
rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting \code{relationship = "many-to-many"}.
In production code, it is best to preemptively set \code{relationship} to whatever
relationship you expect to exist between the keys of \code{x} and \code{y}, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so
they don't warn on them by default, but you should still take extra care when
specifying an inequality join, because they also have the capability to
return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set \code{relationship = "many-to-one"} to enforce this.
Note that in SQL, most database providers won't let you specify a
many-to-many relationship between two tables, instead requiring that you
create a third \emph{junction table} that results in two one-to-many relationships
instead.
}
\section{Methods}{
These functions are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{inner_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("inner_join")}.
\item \code{left_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("left_join")}.
\item \code{right_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("right_join")}.
\item \code{full_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("full_join")}.
}
}
\examples{
data(pbmc_small)
tt <- pbmc_small
tt |> full_join(tibble::tibble(groups="g1", other=1:4))
}
\seealso{
Other joins:
\code{\link[dplyr]{cross_join}()},
\code{\link[dplyr]{filter-joins}},
\code{\link[dplyr]{nest_join}()}
}
================================================
FILE: man/get_abundance_sc_long.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utilities.R
\name{get_abundance_sc_long}
\alias{get_abundance_sc_long}
\title{get abundance long}
\usage{
get_abundance_sc_long(
.data,
features = NULL,
all = FALSE,
exclude_zeros = FALSE,
assay = Assays(.data),
slot = "data"
)
}
\arguments{
\item{.data}{A tidyseurat}
\item{features}{A character}
\item{all}{A boolean}
\item{exclude_zeros}{A boolean}
\item{assay}{assay name to extract feature abundance}
\item{slot}{slot in the assay, e.g. `data` and `scale.data`}
}
\value{
A Seurat object
}
\description{
get abundance long
}
\examples{
data(pbmc_small)
pbmc_small \%>\%
get_abundance_sc_long(features=c("HLA-DRA", "LYZ"))
}
\keyword{internal}
================================================
FILE: man/get_abundance_sc_wide.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utilities.R
\name{get_abundance_sc_wide}
\alias{get_abundance_sc_wide}
\title{get abundance wide}
\usage{
get_abundance_sc_wide(
.data,
features = NULL,
all = FALSE,
assay = .data@active.assay,
slot = "data",
prefix = ""
)
}
\arguments{
\item{.data}{A tidyseurat}
\item{features}{A character}
\item{all}{A boolean}
\item{assay}{assay name to extract feature abundance}
\item{slot}{slot in the assay, e.g. `data` and `scale.data`}
\item{prefix}{prefix for the feature names}
}
\value{
A Seurat object
}
\description{
get abundance wide
}
\examples{
data(pbmc_small)
pbmc_small \%>\%
get_abundance_sc_wide(features=c("HLA-DRA", "LYZ"))
}
\keyword{internal}
================================================
FILE: man/ggplot.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggplot2_methods.R
\name{ggplot}
\alias{ggplot}
\alias{ggplot.Seurat}
\title{Create a new \code{ggplot} from a \code{tidyseurat}}
\usage{
\method{ggplot}{Seurat}(data = NULL, mapping = aes(), ..., environment = parent.frame())
}
\arguments{
\item{data}{Default dataset to use for plot. If not already a data.frame,
will be converted to one by \code{\link[ggplot2:fortify]{fortify()}}. If not specified,
must be supplied in each layer added to the plot.}
\item{mapping}{Default list of aesthetic mappings to use for plot.
If not specified, must be supplied in each layer added to the plot.}
\item{...}{Other arguments passed on to methods. Not currently used.}
\item{environment}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} Used prior to tidy
evaluation.}
}
\value{
`ggplot`
}
\description{
\code{ggplot()} initializes a ggplot object. It can be used to
declare the input data frame for a graphic and to specify the
set of aesthetic mappings for the plot, intended to be common throughout all
subsequent layers unless specifically overridden.
}
\details{
\code{ggplot()} is used to construct the initial plot object,
and is almost always followed by a plus sign (\code{+}) to add
components to the plot.
There are three common patterns used to invoke \code{ggplot()}:
\itemize{
\item \verb{ggplot(data = df, mapping = aes(x, y, other aesthetics))}
\item \code{ggplot(data = df)}
\item \code{ggplot()}
}
The first pattern is recommended if all layers use the same
data and the same set of aesthetics, although this method
can also be used when adding a layer using data from another
data frame.
The second pattern specifies the default data frame to use
for the plot, but no aesthetics are defined up front. This
is useful when one data frame is used predominantly for the
plot, but the aesthetics vary from one layer to another.
The third pattern initializes a skeleton \code{ggplot} object, which
is fleshed out as layers are added. This is useful when
multiple data frames are used to produce different layers, as
is often the case in complex graphics.
The \verb{data =} and \verb{mapping =} specifications in the arguments are optional
(and are often omitted in practice), so long as the data and the mapping
values are passed into the function in the right order. In the examples
below, however, they are left in place for clarity.
}
\examples{
library(ggplot2)
data(pbmc_small)
pbmc_small |>
ggplot(aes(groups, nCount_RNA)) +
geom_boxplot()
}
\seealso{
The \href{https://ggplot2-book.org/getting-started}{first steps chapter} of the online ggplot2 book.
}
================================================
FILE: man/glimpse.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tibble_methods.R
\name{glimpse}
\alias{glimpse}
\alias{glimpse.tidyseurat}
\title{Get a glimpse of your data}
\usage{
\method{glimpse}{tidyseurat}(x, width = NULL, ...)
}
\arguments{
\item{x}{An object to glimpse at.}
\item{width}{Width of output: defaults to the setting of the
\code{width} \link[pillar:pillar_options]{option} (if finite)
or the width of the console.}
\item{...}{Unused, for extensibility.}
}
\value{
x original x is (invisibly) returned, allowing \code{glimpse()} to be
used within a data pipe line.
}
\description{
\code{glimpse()} is like a transposed version of \code{print()}:
columns run down the page, and data runs across.
This makes it possible to see every column in a data frame.
It's a little like \code{\link[=str]{str()}} applied to a data frame
but it tries to show you as much data as possible.
(And it always shows the underlying data, even when applied
to a remote data source.)
See \code{\link[pillar:format_glimpse]{format_glimpse()}} for details on the formatting.
}
\section{S3 methods}{
\code{glimpse} is an S3 generic with a customised method for \code{tbl}s and
\code{data.frames}, and a default method that calls \code{\link[=str]{str()}}.
}
\examples{
data(pbmc_small)
pbmc_small |> glimpse()
}
================================================
FILE: man/group_by.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{group_by}
\alias{group_by}
\alias{group_by.Seurat}
\title{Group by one or more variables}
\usage{
\method{group_by}{Seurat}(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> In \code{group_by()},
variables or computations to group by. Computations are always done on the
ungrouped data frame. To perform computations on the grouped data, you need
to use a separate \code{mutate()} step before the \code{group_by()}.
Computations are not allowed in \code{nest_by()}.
In \code{ungroup()}, variables to remove from the grouping.}
\item{.add}{When \code{FALSE}, the default, \code{group_by()} will
override existing groups. To add to the existing groups, use
\code{.add = TRUE}.}
\item{.drop}{Drop groups formed by factor levels that don't appear in the
data? The default is \code{TRUE} except when \code{.data} has been previously
grouped with \code{.drop = FALSE}. See \code{\link[dplyr:group_by_drop_default]{group_by_drop_default()}} for details.}
}
\value{
A grouped data frame with class \code{\link[dplyr]{grouped_df}},
unless the combination of \code{...} and \code{add} yields a empty set of
grouping columns, in which case a tibble will be returned.
}
\description{
Most data operations are done on groups defined by variables.
\code{group_by()} takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". \code{ungroup()} removes grouping.
}
\section{Methods}{
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{group_by()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("group_by")}.
\item \code{ungroup()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("ungroup")}.
}
}
\section{Ordering}{
Currently, \code{group_by()} internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
\code{\link[dplyr:summarise]{summarise()}}.
When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
\code{\link[dplyr:arrange]{arrange()}} and set the \code{.locale} argument. For example:
\if{html}{\out{}}\preformatted{data |>
group_by(chr) |>
summarise(avg = mean(x)) |>
arrange(chr, .locale = "en")
}\if{html}{\out{
}}
This is often useful as a preliminary step before generating content intended
for humans, such as an HTML table.
\subsection{Legacy behavior}{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. Setting the global option \code{dplyr.legacy_locale} to \code{TRUE}
retains this legacy behavior, but this has been deprecated. Update existing
code to explicitly call \code{arrange(.locale = )} instead. Run
\code{Sys.getlocale("LC_COLLATE")} to determine your system locale, and compare
that against the list in \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} to find an appropriate
value for \code{.locale}, i.e. for American English, \code{"en_US"}.
}
}
\examples{
data("pbmc_small")
pbmc_small |> group_by(groups)
}
\seealso{
Other grouping functions:
\code{\link[dplyr]{group_map}()},
\code{\link[dplyr]{group_nest}()},
\code{\link[dplyr]{group_split}()},
\code{\link[dplyr]{group_trim}()}
}
================================================
FILE: man/group_split.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{group_split}
\alias{group_split}
\alias{group_split.Seurat}
\title{Split data frame by groups}
\usage{
\method{group_split}{Seurat}(.tbl, ..., .keep = TRUE)
}
\arguments{
\item{.tbl}{A tbl.}
\item{...}{If \code{.tbl} is an ungrouped data frame, a grouping specification,
forwarded to \code{\link[dplyr:group_by]{group_by()}}.}
\item{.keep}{Should the grouping columns be kept?}
}
\value{
A list of tibbles. Each tibble contains the rows of \code{.tbl} for the
associated group and all the columns, including the grouping variables.
Note that this returns a \link[vctrs:list_of]{list_of} which is slightly
stricter than a simple list but is useful for representing lists where
every element has the same type.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
\code{\link[dplyr:group_split]{group_split()}} works like \code{\link[base:split]{base::split()}} but:
\itemize{
\item It uses the grouping structure from \code{\link[dplyr:group_by]{group_by()}} and therefore is subject
to the data mask
\item It does not name the elements of the list based on the grouping as this
only works well for a single character grouping variable. Instead,
use \code{\link[dplyr:group_keys]{group_keys()}} to access a data frame that defines the groups.
}
\code{group_split()} is primarily designed to work with grouped data frames.
You can pass \code{...} to group and split an ungrouped data frame, but this
is generally not very useful as you want have easy access to the group
metadata.
}
\section{Lifecycle}{
\code{group_split()} is not stable because you can achieve very similar results by
manipulating the nested column returned from
\code{\link[tidyr:nest]{tidyr::nest(.by =)}}. That also retains the group keys all
within a single data structure. \code{group_split()} may be deprecated in the
future.
}
\examples{
data(pbmc_small)
pbmc_small |> group_split(groups)
}
\seealso{
Other grouping functions:
\code{\link[dplyr]{group_by}()},
\code{\link[dplyr]{group_map}()},
\code{\link[dplyr]{group_nest}()},
\code{\link[dplyr]{group_trim}()}
}
================================================
FILE: man/inner_join.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{inner_join}
\alias{inner_join}
\alias{inner_join.Seurat}
\title{Mutating joins}
\usage{
\method{inner_join}{Seurat}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
}
\arguments{
\item{x, y}{A pair of data frames, data frame extensions (e.g. a tibble), or
lazy data frames (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{by}{A join specification created with \code{\link[dplyr:join_by]{join_by()}}, or a character
vector of variables to join by.
If \code{NULL}, the default, \verb{*_join()} will perform a natural join, using all
variables in common across \code{x} and \code{y}. A message lists the variables so
that you can check they're correct; suppress the message by supplying \code{by}
explicitly.
To join on different variables between \code{x} and \code{y}, use a \code{\link[dplyr:join_by]{join_by()}}
specification. For example, \code{join_by(a == b)} will match \code{x$a} to \code{y$b}.
To join by multiple variables, use a \code{\link[dplyr:join_by]{join_by()}} specification with
multiple expressions. For example, \code{join_by(a == b, c == d)} will match
\code{x$a} to \code{y$b} and \code{x$c} to \code{y$d}. If the column names are the same between
\code{x} and \code{y}, you can shorten this by listing only the variable names, like
\code{join_by(a, c)}.
\code{\link[dplyr:join_by]{join_by()}} can also be used to perform inequality, rolling, and overlap
joins. See the documentation at \link[dplyr:join_by]{?join_by} for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, \code{by = c("a", "b")} joins \code{x$a}
to \code{y$a} and \code{x$b} to \code{y$b}. If variable names differ between \code{x} and \code{y},
use a named character vector like \code{by = c("x_a" = "y_a", "x_b" = "y_b")}.
To perform a cross-join, generating all combinations of \code{x} and \code{y}, see
\code{\link[dplyr:cross_join]{cross_join()}}.}
\item{copy}{If \code{x} and \code{y} are not from the same data source,
and \code{copy} is \code{TRUE}, then \code{y} will be copied into the
same src as \code{x}. This allows you to join tables across srcs, but
it is a potentially expensive operation so you must opt into it.}
\item{suffix}{If there are non-joined duplicate variables in \code{x} and
\code{y}, these suffixes will be added to the output to disambiguate them.
Should be a character vector of length 2.}
\item{...}{Other parameters passed onto methods.}
}
\value{
An object of the same type as \code{x} (including the same groups). The order of
the rows and columns of \code{x} is preserved as much as possible. The output has
the following properties:
\itemize{
\item The rows are affect by the join type.
\itemize{
\item \code{inner_join()} returns matched \code{x} rows.
\item \code{left_join()} returns all \code{x} rows.
\item \code{right_join()} returns matched of \code{x} rows, followed by unmatched \code{y} rows.
\item \code{full_join()} returns all \code{x} rows, followed by unmatched \code{y} rows.
}
\item Output columns include all columns from \code{x} and all non-key columns from
\code{y}. If \code{keep = TRUE}, the key columns from \code{y} are included as well.
\item If non-key columns in \code{x} and \code{y} have the same name, \code{suffix}es are added
to disambiguate. If \code{keep = TRUE} and key columns in \code{x} and \code{y} have
the same name, \code{suffix}es are added to disambiguate these as well.
\item If \code{keep = FALSE}, output columns included in \code{by} are coerced to their
common type between \code{x} and \code{y}.
}
}
\description{
Mutating joins add columns from \code{y} to \code{x}, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
\subsection{Inner join}{
An \code{inner_join()} only keeps observations from \code{x} that have a matching key
in \code{y}.
The most important property of an inner join is that unmatched rows in either
input are not included in the result. This means that generally inner joins
are not appropriate in most analyses, because it is too easy to lose
observations.
}
\subsection{Outer joins}{
The three outer joins keep observations that appear in at least one of the
data frames:
\itemize{
\item A \code{left_join()} keeps all observations in \code{x}.
\item A \code{right_join()} keeps all observations in \code{y}.
\item A \code{full_join()} keeps all observations in \code{x} and \code{y}.
}
}
}
\section{Many-to-many relationships}{
By default, dplyr guards against many-to-many relationships in equality joins
by throwing a warning. These occur when both of the following are true:
\itemize{
\item A row in \code{x} matches multiple rows in \code{y}.
\item A row in \code{y} matches multiple rows in \code{x}.
}
This is typically surprising, as most joins involve a relationship of
one-to-one, one-to-many, or many-to-one, and is often the result of an
improperly specified join. Many-to-many relationships are particularly
problematic because they can result in a Cartesian explosion of the number of
rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting \code{relationship = "many-to-many"}.
In production code, it is best to preemptively set \code{relationship} to whatever
relationship you expect to exist between the keys of \code{x} and \code{y}, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so
they don't warn on them by default, but you should still take extra care when
specifying an inequality join, because they also have the capability to
return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set \code{relationship = "many-to-one"} to enforce this.
Note that in SQL, most database providers won't let you specify a
many-to-many relationship between two tables, instead requiring that you
create a third \emph{junction table} that results in two one-to-many relationships
instead.
}
\section{Methods}{
These functions are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{inner_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("inner_join")}.
\item \code{left_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("left_join")}.
\item \code{right_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("right_join")}.
\item \code{full_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("full_join")}.
}
}
\examples{
data(pbmc_small)
tt <- pbmc_small
tt |> inner_join(tt |>
distinct(groups) |>
mutate(new_column=1:2) |>
slice(1))
}
\seealso{
Other joins:
\code{\link[dplyr]{cross_join}()},
\code{\link[dplyr]{filter-joins}},
\code{\link[dplyr]{nest_join}()}
}
================================================
FILE: man/join_features.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methods.R
\name{join_features}
\alias{join_features}
\alias{join_features,Seurat-method}
\title{join_features}
\usage{
\S4method{join_features}{Seurat}(
.data,
features = NULL,
all = FALSE,
exclude_zeros = FALSE,
shape = "wide",
assay = NULL,
slot = "data",
...
)
}
\arguments{
\item{.data}{A tidyseurat object}
\item{features}{A vector of feature identifiers to join}
\item{all}{If TRUE return all}
\item{exclude_zeros}{If TRUE exclude zero values}
\item{shape}{Format of the returned table "long" or "wide"}
\item{assay}{assay name to extract feature abundance}
\item{slot}{slot name to extract feature abundance}
\item{...}{Parameters to pass to join wide, i.e. assay name to extract feature abundance from and gene prefix, for shape="wide"}
}
\value{
A `tidyseurat` object
containing information for the specified features.
}
\description{
join_features() extracts and joins information for specific
features
}
\details{
This function extracts information for specified features and
returns the information in either long or wide format.
}
\examples{
data(pbmc_small)
pbmc_small \%>\% join_features(
features=c("HLA-DRA", "LYZ"))
}
================================================
FILE: man/join_transcripts.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methods_DEPRECATED.R
\name{join_transcripts}
\alias{join_transcripts}
\title{(DEPRECATED) Extract and join information for transcripts.}
\usage{
join_transcripts(
.data,
transcripts = NULL,
all = FALSE,
exclude_zeros = FALSE,
shape = "wide",
...
)
}
\arguments{
\item{.data}{A tidyseurat object}
\item{transcripts}{A vector of transcript identifiers to join}
\item{all}{If TRUE return all}
\item{exclude_zeros}{If TRUE exclude zero values}
\item{shape}{Format of the returned table "long" or "wide"}
\item{...}{Parameters to pass to join wide, i.e. assay name to extract transcript abundance from}
}
\value{
A `tbl` containing the information.for the specified transcripts
}
\description{
join_transcripts() extracts and joins information for specified transcripts
}
\details{
DEPRECATED, please use join_features()
}
\examples{
print("DEPRECATED")
}
================================================
FILE: man/left_join.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{left_join}
\alias{left_join}
\alias{left_join.Seurat}
\title{Mutating joins}
\usage{
\method{left_join}{Seurat}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
}
\arguments{
\item{x, y}{A pair of data frames, data frame extensions (e.g. a tibble), or
lazy data frames (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{by}{A join specification created with \code{\link[dplyr:join_by]{join_by()}}, or a character
vector of variables to join by.
If \code{NULL}, the default, \verb{*_join()} will perform a natural join, using all
variables in common across \code{x} and \code{y}. A message lists the variables so
that you can check they're correct; suppress the message by supplying \code{by}
explicitly.
To join on different variables between \code{x} and \code{y}, use a \code{\link[dplyr:join_by]{join_by()}}
specification. For example, \code{join_by(a == b)} will match \code{x$a} to \code{y$b}.
To join by multiple variables, use a \code{\link[dplyr:join_by]{join_by()}} specification with
multiple expressions. For example, \code{join_by(a == b, c == d)} will match
\code{x$a} to \code{y$b} and \code{x$c} to \code{y$d}. If the column names are the same between
\code{x} and \code{y}, you can shorten this by listing only the variable names, like
\code{join_by(a, c)}.
\code{\link[dplyr:join_by]{join_by()}} can also be used to perform inequality, rolling, and overlap
joins. See the documentation at \link[dplyr:join_by]{?join_by} for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, \code{by = c("a", "b")} joins \code{x$a}
to \code{y$a} and \code{x$b} to \code{y$b}. If variable names differ between \code{x} and \code{y},
use a named character vector like \code{by = c("x_a" = "y_a", "x_b" = "y_b")}.
To perform a cross-join, generating all combinations of \code{x} and \code{y}, see
\code{\link[dplyr:cross_join]{cross_join()}}.}
\item{copy}{If \code{x} and \code{y} are not from the same data source,
and \code{copy} is \code{TRUE}, then \code{y} will be copied into the
same src as \code{x}. This allows you to join tables across srcs, but
it is a potentially expensive operation so you must opt into it.}
\item{suffix}{If there are non-joined duplicate variables in \code{x} and
\code{y}, these suffixes will be added to the output to disambiguate them.
Should be a character vector of length 2.}
\item{...}{Other parameters passed onto methods.}
}
\value{
An object of the same type as \code{x} (including the same groups). The order of
the rows and columns of \code{x} is preserved as much as possible. The output has
the following properties:
\itemize{
\item The rows are affect by the join type.
\itemize{
\item \code{inner_join()} returns matched \code{x} rows.
\item \code{left_join()} returns all \code{x} rows.
\item \code{right_join()} returns matched of \code{x} rows, followed by unmatched \code{y} rows.
\item \code{full_join()} returns all \code{x} rows, followed by unmatched \code{y} rows.
}
\item Output columns include all columns from \code{x} and all non-key columns from
\code{y}. If \code{keep = TRUE}, the key columns from \code{y} are included as well.
\item If non-key columns in \code{x} and \code{y} have the same name, \code{suffix}es are added
to disambiguate. If \code{keep = TRUE} and key columns in \code{x} and \code{y} have
the same name, \code{suffix}es are added to disambiguate these as well.
\item If \code{keep = FALSE}, output columns included in \code{by} are coerced to their
common type between \code{x} and \code{y}.
}
}
\description{
Mutating joins add columns from \code{y} to \code{x}, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
\subsection{Inner join}{
An \code{inner_join()} only keeps observations from \code{x} that have a matching key
in \code{y}.
The most important property of an inner join is that unmatched rows in either
input are not included in the result. This means that generally inner joins
are not appropriate in most analyses, because it is too easy to lose
observations.
}
\subsection{Outer joins}{
The three outer joins keep observations that appear in at least one of the
data frames:
\itemize{
\item A \code{left_join()} keeps all observations in \code{x}.
\item A \code{right_join()} keeps all observations in \code{y}.
\item A \code{full_join()} keeps all observations in \code{x} and \code{y}.
}
}
}
\section{Many-to-many relationships}{
By default, dplyr guards against many-to-many relationships in equality joins
by throwing a warning. These occur when both of the following are true:
\itemize{
\item A row in \code{x} matches multiple rows in \code{y}.
\item A row in \code{y} matches multiple rows in \code{x}.
}
This is typically surprising, as most joins involve a relationship of
one-to-one, one-to-many, or many-to-one, and is often the result of an
improperly specified join. Many-to-many relationships are particularly
problematic because they can result in a Cartesian explosion of the number of
rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting \code{relationship = "many-to-many"}.
In production code, it is best to preemptively set \code{relationship} to whatever
relationship you expect to exist between the keys of \code{x} and \code{y}, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so
they don't warn on them by default, but you should still take extra care when
specifying an inequality join, because they also have the capability to
return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set \code{relationship = "many-to-one"} to enforce this.
Note that in SQL, most database providers won't let you specify a
many-to-many relationship between two tables, instead requiring that you
create a third \emph{junction table} that results in two one-to-many relationships
instead.
}
\section{Methods}{
These functions are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{inner_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("inner_join")}.
\item \code{left_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("left_join")}.
\item \code{right_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("right_join")}.
\item \code{full_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("full_join")}.
}
}
\examples{
data(pbmc_small)
tt <- pbmc_small
tt |> left_join(tt |>
distinct(groups) |>
mutate(new_column=1:2))
}
\seealso{
Other joins:
\code{\link[dplyr]{cross_join}()},
\code{\link[dplyr]{filter-joins}},
\code{\link[dplyr]{nest_join}()}
}
================================================
FILE: man/mutate.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{mutate}
\alias{mutate}
\alias{mutate.Seurat}
\title{Create, modify, and delete columns}
\usage{
\method{mutate}{Seurat}(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs.
The name gives the name of the column in the output.
The value can be:
\itemize{
\item A vector of length 1, which will be recycled to the correct length.
\item A vector the same length as the current group (or the whole data frame
if ungrouped).
\item \code{NULL}, to remove the column.
\item A data frame or tibble, to create multiple columns in the output.
}}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Columns from \code{.data} will be preserved according to the \code{.keep} argument.
\item Existing columns that are modified by \code{...} will always be returned in
their original location.
\item New columns created through \code{...} will be placed according to the
\code{.before} and \code{.after} arguments.
\item The number of rows is not affected.
\item Columns given the value \code{NULL} will be removed.
\item Groups will be recomputed if a grouping variable is mutated.
\item Data frame attributes are preserved.
}
}
\description{
\code{mutate()} creates new columns that are functions of existing variables.
It can also modify (if the name is the same as an existing
column) and delete columns (by setting their value to \code{NULL}).
}
\section{Useful mutate functions}{
\itemize{
\item \code{\link{+}}, \code{\link{-}}, \code{\link[=log]{log()}}, etc., for their usual mathematical meanings
\item \code{\link[dplyr:lead]{lead()}}, \code{\link[dplyr:lag]{lag()}}
\item \code{\link[dplyr:dense_rank]{dense_rank()}}, \code{\link[dplyr:min_rank]{min_rank()}}, \code{\link[dplyr:percent_rank]{percent_rank()}}, \code{\link[dplyr:row_number]{row_number()}},
\code{\link[dplyr:cume_dist]{cume_dist()}}, \code{\link[dplyr:ntile]{ntile()}}
\item \code{\link[=cumsum]{cumsum()}}, \code{\link[dplyr:cummean]{cummean()}}, \code{\link[=cummin]{cummin()}}, \code{\link[=cummax]{cummax()}}, \code{\link[dplyr:cumany]{cumany()}}, \code{\link[dplyr:cumall]{cumall()}}
\item \code{\link[dplyr:na_if]{na_if()}}, \code{\link[dplyr:coalesce]{coalesce()}}
\item \code{\link[dplyr:if_else]{if_else()}}, \code{\link[dplyr:recode]{recode()}}, \code{\link[dplyr:case_when]{case_when()}}
}
}
\section{Grouped tibbles}{
Because mutating expressions are computed within groups, they may
yield different results on grouped tibbles. This will be the case
as soon as an aggregating, lagging, or ranking function is
involved. Compare this ungrouped mutate:
\if{html}{\out{}}\preformatted{starwars |>
select(name, mass, species) |>
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{
}}
With the grouped equivalent:
\if{html}{\out{}}\preformatted{starwars |>
select(name, mass, species) |>
group_by(species) |>
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{
}}
The former normalises \code{mass} by the global average whereas the
latter normalises by the averages within species levels.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("mutate")}.
}
\examples{
data(pbmc_small)
pbmc_small |> mutate(nFeature_RNA=1)
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{rename}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
================================================
FILE: man/nest.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tidyr_methods.R
\name{nest}
\alias{nest}
\alias{nest.Seurat}
\title{Nest rows into a list-column of data frames}
\usage{
\method{nest}{Seurat}(.data, ..., .names_sep = NULL)
}
\arguments{
\item{.data}{A data frame.}
\item{...}{<\code{\link[tidyr:tidyr_tidy_select]{tidy-select}}> Columns to nest; these will
appear in the inner data frames.
Specified using name-variable pairs of the form
\code{new_col = c(col1, col2, col3)}. The right hand side can be any valid
tidyselect expression.
If not supplied, then \code{...} is derived as all columns \emph{not} selected by
\code{.by}, and will use the column name from \code{.key}.
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}:
previously you could write \code{df |> nest(x, y, z)}.
Convert to \code{df |> nest(data = c(x, y, z))}.}
\item{.names_sep}{If \code{NULL}, the default, the inner names will come from
the former outer names. If a string, the new inner names will use the
outer names with \code{names_sep} automatically stripped. This makes
\code{names_sep} roughly symmetric between nesting and unnesting.}
}
\value{
`tidyseurat_nested`
}
\description{
Nesting creates a list-column of data frames; unnesting flattens it back out
into regular columns. Nesting is implicitly a summarising operation: you
get one row for each group defined by the non-nested columns. This is useful
in conjunction with other summaries that work with whole datasets, most
notably models.
Learn more in \code{vignette("nest")}.
}
\details{
If neither \code{...} nor \code{.by} are supplied, \code{nest()} will nest all variables,
and will use the column name supplied through \code{.key}.
}
\section{New syntax}{
tidyr 1.0.0 introduced a new syntax for \code{nest()} and \code{unnest()} that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using \code{\link[tidyr:nest_legacy]{nest_legacy()}} and \code{\link[tidyr:unnest_legacy]{unnest_legacy()}} as follows:
\if{html}{\out{}}\preformatted{library(tidyr)
nest <- nest_legacy
unnest <- unnest_legacy
}\if{html}{\out{
}}
}
\section{Grouped data frames}{
\code{df |> nest(data = c(x, y))} specifies the columns to be nested; i.e. the
columns that will appear in the inner data frame. \code{df |> nest(.by = c(x, y))} specifies the columns to nest \emph{by}; i.e. the columns that will remain in
the outer data frame. An alternative way to achieve the latter is to \code{nest()}
a grouped data frame created by \code{\link[dplyr:group_by]{dplyr::group_by()}}. The grouping variables
remain in the outer data frame and the others are nested. The result
preserves the grouping of the input.
Variables supplied to \code{nest()} will override grouping variables so that
\code{df |> group_by(x, y) |> nest(data = !z)} will be equivalent to
\code{df |> nest(data = !z)}.
You can't supply \code{.by} with a grouped data frame, as the groups already
represent what you are nesting by.
}
\examples{
data(pbmc_small)
pbmc_small |>
nest(data=-groups) |>
unnest(data)
}
================================================
FILE: man/pbmc_small_nested_interactions.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{pbmc_small_nested_interactions}
\alias{pbmc_small_nested_interactions}
\title{Intercellular ligand-receptor interactions for
38 ligands from a single cell RNA-seq cluster.}
\format{
A `tibble` containing 100 rows and 9 columns.
Cells are a subsample of the PBMC dataset of 2,700 single cells.
Cell interactions were identified with `SingleCellSignalR`.
\describe{
\item{sample}{sample identifier}
\item{ligand}{cluster and ligand identifier}
\item{receptor}{cluster and receptor identifier}
\item{ligand.name}{ligand name}
\item{receptor.name}{receptor name}
\item{origin}{cluster containing ligand}
\item{destination}{cluster containing receptor}
\item{interaction.type}{type of interation, paracrine or autocrine}
\item{LRscore}{interaction score}
}
}
\source{
\url{https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html}
}
\usage{
data(pbmc_small_nested_interactions)
}
\value{
`tibble`
}
\description{
A dataset containing ligand-receptor interactions within a sample.
There are 38 ligands from a single cell cluster versus 35 receptors
in 6 other clusters.
}
\keyword{datasets}
================================================
FILE: man/pipe.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils-pipe.R
\name{\%>\%}
\alias{\%>\%}
\title{Pipe operator}
\usage{
lhs \%>\% rhs
}
\value{
void
}
\description{
See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details.
}
\examples{
data(pbmc_small)
pbmc_small \%>\% print()
}
\keyword{internal}
================================================
FILE: man/pivot_longer.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tidyr_methods.R
\name{pivot_longer}
\alias{pivot_longer}
\alias{pivot_longer.Seurat}
\title{Pivot data from wide to long}
\usage{
\method{pivot_longer}{Seurat}(
data,
cols,
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL,
...
)
}
\arguments{
\item{data}{A data frame to pivot.}
\item{cols}{<\code{\link[tidyr:tidyr_tidy_select]{tidy-select}}> Columns to pivot into
longer format.}
\item{names_to}{A character vector specifying the new column or columns to
create from the information stored in the column names of \code{data} specified
by \code{cols}.
\itemize{
\item If length 0, or if \code{NULL} is supplied, no columns will be created.
\item If length 1, a single column will be created which will contain the
column names specified by \code{cols}.
\item If length >1, multiple columns will be created. In this case, one of
\code{names_sep} or \code{names_pattern} must be supplied to specify how the
column names should be split. There are also two additional character
values you can take advantage of:
\itemize{
\item \code{NA} will discard the corresponding component of the column name.
\item \code{".value"} indicates that the corresponding component of the column
name defines the name of the output column containing the cell values,
overriding \code{values_to} entirely.
}
}}
\item{names_prefix}{A regular expression used to remove matching text
from the start of each variable name.}
\item{names_sep, names_pattern}{If \code{names_to} contains multiple values,
these arguments control how the column name is broken up.
\code{names_sep} takes the same specification as \code{\link[tidyr:separate]{separate()}}, and can either
be a numeric vector (specifying positions to break on), or a single string
(specifying a regular expression to split on).
\code{names_pattern} takes the same specification as \code{\link[tidyr:extract]{extract()}}, a regular
expression containing matching groups (\verb{()}).
If these arguments do not give you enough control, use
\code{pivot_longer_spec()} to create a spec object and process manually as
needed.}
\item{names_ptypes, values_ptypes}{Optionally, a list of column name-prototype
pairs. Alternatively, a single empty prototype can be supplied, which will
be applied to all columns. A prototype (or ptype for short) is a
zero-length vector (like \code{integer()} or \code{numeric()}) that defines the type,
class, and attributes of a vector. Use these arguments if you want to
confirm that the created columns are the types that you expect. Note that
if you want to change (instead of confirm) the types of specific columns,
you should use \code{names_transform} or \code{values_transform} instead.}
\item{names_transform, values_transform}{Optionally, a list of column
name-function pairs. Alternatively, a single function can be supplied,
which will be applied to all columns. Use these arguments if you need to
change the types of specific columns. For example, \code{names_transform = list(week = as.integer)} would convert a character variable called \code{week}
to an integer.
If not specified, the type of the columns generated from \code{names_to} will
be character, and the type of the variables generated from \code{values_to}
will be the common type of the input columns used to generate them.}
\item{names_repair}{What happens if the output has invalid column names?
The default, \code{"check_unique"} is to error if the columns are duplicated.
Use \code{"minimal"} to allow duplicates in the output, or \code{"unique"} to
de-duplicated by adding numeric suffixes. See \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}
for more options.}
\item{values_to}{A string specifying the name of the column to create
from the data stored in cell values. If \code{names_to} is a character
containing the special \code{.value} sentinel, this value will be ignored,
and the name of the value column will be derived from part of the
existing column names.}
\item{values_drop_na}{If \code{TRUE}, will drop rows that contain only \code{NA}s
in the \code{values_to} column. This effectively converts explicit missing values
to implicit missing values, and should generally be used only when missing
values in \code{data} were created by its structure.}
\item{...}{Additional arguments passed on to methods.}
}
\value{
`tidyseurat`
}
\description{
\code{pivot_longer()} "lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
\code{\link[tidyr:pivot_wider]{pivot_wider()}}
Learn more in \code{vignette("pivot")}.
}
\details{
\code{pivot_longer()} is an updated approach to \code{\link[tidyr:gather]{gather()}}, designed to be both
simpler to use and to handle more use cases. We recommend you use
\code{pivot_longer()} for new code; \code{gather()} isn't going away but is no longer
under active development.
}
\examples{
data(pbmc_small)
pbmc_small |> pivot_longer(
cols=c(orig.ident, groups),
names_to="name", values_to="value")
}
================================================
FILE: man/plotly.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plotly_methods.R
\name{plotly}
\alias{plotly}
\alias{plot_ly}
\alias{plot_ly.tbl_df}
\alias{plot_ly.Seurat}
\title{Initiate a plotly visualization}
\usage{
plot_ly(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
\method{plot_ly}{tbl_df}(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
\method{plot_ly}{Seurat}(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
}
\arguments{
\item{data}{A data frame (optional) or \link[crosstalk:SharedData]{crosstalk::SharedData} object.}
\item{...}{Arguments (i.e., attributes) passed along to the trace \code{type}.
See \code{\link[plotly:schema]{schema()}} for a list of acceptable attributes for a given trace \code{type}
(by going to \code{traces} -> \code{type} -> \code{attributes}). Note that attributes
provided at this level may override other arguments
(e.g. \code{plot_ly(x = 1:10, y = 1:10, color = I("red"), marker = list(color = "blue"))}).}
\item{type}{A character string specifying the trace type (e.g. \code{"scatter"}, \code{"bar"}, \code{"box"}, etc).
If specified, it \emph{always} creates a trace, otherwise}
\item{name}{Values mapped to the trace's name attribute. Since a trace can
only have one name, this argument acts very much like \code{split} in that it
creates one trace for every unique value.}
\item{color}{Values mapped to relevant 'fill-color' attribute(s)
(e.g. \href{https://plotly.com/r/reference/#scatter-fillcolor}{fillcolor},
\href{https://plotly.com/r/reference/#scatter-marker-color}{marker.color},
\href{https://plotly.com/r/reference/#scatter-textfont-color}{textfont.color}, etc.).
The mapping from data values to color codes may be controlled using
\code{colors} and \code{alpha}, or avoided altogether via \code{\link[=I]{I()}} (e.g., \code{color = I("red")}).
Any color understood by \code{\link[grDevices:col2rgb]{grDevices::col2rgb()}} may be used in this way.}
\item{colors}{Either a colorbrewer2.org palette name (e.g. "YlOrRd" or "Blues"),
or a vector of colors to interpolate in hexadecimal "#RRGGBB" format,
or a color interpolation function like \code{colorRamp()}.}
\item{alpha}{A number between 0 and 1 specifying the alpha channel applied to \code{color}.
Defaults to 0.5 when mapping to \href{https://plotly.com/r/reference/#scatter-fillcolor}{fillcolor} and 1 otherwise.}
\item{stroke}{Similar to \code{color}, but values are mapped to relevant 'stroke-color' attribute(s)
(e.g., \href{https://plotly.com/r/reference/#scatter-marker-line-color}{marker.line.color}
and \href{https://plotly.com/r/reference/#scatter-line-color}{line.color}
for filled polygons). If not specified, \code{stroke} inherits from \code{color}.}
\item{strokes}{Similar to \code{colors}, but controls the \code{stroke} mapping.}
\item{alpha_stroke}{Similar to \code{alpha}, but applied to \code{stroke}.}
\item{size}{(Numeric) values mapped to relevant 'fill-size' attribute(s)
(e.g., \href{https://plotly.com/r/reference/#scatter-marker-size}{marker.size},
\href{https://plotly.com/r/reference/#scatter-textfont-size}{textfont.size},
and \href{https://plotly.com/r/reference/#scatter-error_x-width}{error_x.width}).
The mapping from data values to symbols may be controlled using
\code{sizes}, or avoided altogether via \code{\link[=I]{I()}} (e.g., \code{size = I(30)}).}
\item{sizes}{A numeric vector of length 2 used to scale \code{size} to pixels.}
\item{span}{(Numeric) values mapped to relevant 'stroke-size' attribute(s)
(e.g.,
\href{https://plotly.com/r/reference/#scatter-marker-line-width}{marker.line.width},
\href{https://plotly.com/r/reference/#scatter-line-width}{line.width} for filled polygons,
and \href{https://plotly.com/r/reference/#scatter-error_x-thickness}{error_x.thickness})
The mapping from data values to symbols may be controlled using
\code{spans}, or avoided altogether via \code{\link[=I]{I()}} (e.g., \code{span = I(30)}).}
\item{spans}{A numeric vector of length 2 used to scale \code{span} to pixels.}
\item{symbol}{(Discrete) values mapped to \href{https://plotly.com/r/reference/#scatter-marker-symbol}{marker.symbol}.
The mapping from data values to symbols may be controlled using
\code{symbols}, or avoided altogether via \code{\link[=I]{I()}} (e.g., \code{symbol = I("pentagon")}).
Any \link{pch} value or \href{https://plotly.com/r/reference/#scatter-marker-symbol}{symbol name} may be used in this way.}
\item{symbols}{A character vector of \link{pch} values or \href{https://plotly.com/r/reference/#scatter-marker-symbol}{symbol names}.}
\item{linetype}{(Discrete) values mapped to \href{https://plotly.com/r/reference/#scatter-line-dash}{line.dash}.
The mapping from data values to symbols may be controlled using
\code{linetypes}, or avoided altogether via \code{\link[=I]{I()}} (e.g., \code{linetype = I("dash")}).
Any \code{lty} (see \link{par}) value or \href{https://plotly.com/r/reference/#scatter-line-dash}{dash name} may be used in this way.}
\item{linetypes}{A character vector of \code{lty} values or \href{https://plotly.com/r/reference/#scatter-line-dash}{dash names}}
\item{split}{(Discrete) values used to create multiple traces (one trace per value).}
\item{frame}{(Discrete) values used to create animation frames.}
\item{width}{Width in pixels (optional, defaults to automatic sizing).}
\item{height}{Height in pixels (optional, defaults to automatic sizing).}
\item{source}{a character string of length 1. Match the value of this string
with the source argument in \code{\link[plotly:event_data]{event_data()}} to retrieve the
event data corresponding to a specific plot (shiny apps can have multiple plots).}
}
\value{
`plotly`
}
\description{
This function maps R objects to \href{https://plotly.com/javascript/}{plotly.js},
an (MIT licensed) web-based interactive charting library. It provides
abstractions for doing common things (e.g. mapping data values to
fill colors (via \code{color}) or creating \link[plotly]{animation}s (via \code{frame})) and sets
some different defaults to make the interface feel more 'R-like'
(i.e., closer to \code{\link[=plot]{plot()}} and \code{\link[ggplot2:qplot]{ggplot2::qplot()}}).
}
\details{
Unless \code{type} is specified, this function just initiates a plotly
object with 'global' attributes that are passed onto downstream uses of
\code{\link[plotly:add_trace]{add_trace()}} (or similar). A \link{formula} must always be used when
referencing column name(s) in \code{data} (e.g. \code{plot_ly(mtcars, x = ~wt)}).
Formulas are optional when supplying values directly, but they do
help inform default axis/scale titles
(e.g., \code{plot_ly(x = mtcars$wt)} vs \code{plot_ly(x = ~mtcars$wt)})
}
\examples{
data(pbmc_small)
plot_ly(pbmc_small)
}
\references{
\url{https://plotly-r.com/overview.html}
}
\seealso{
\itemize{
\item For initializing a plotly-geo object: \code{\link[plotly:plot_geo]{plot_geo()}}
\item For initializing a plotly-mapbox object: \code{\link[plotly:plot_mapbox]{plot_mapbox()}}
\item For translating a ggplot2 object to a plotly object: \code{\link[plotly:ggplotly]{ggplotly()}}
\item For modifying any plotly object: \code{\link[plotly:layout]{layout()}}, \code{\link[plotly:add_trace]{add_trace()}}, \code{\link[plotly:style]{style()}}
\item For linked brushing: \code{\link[plotly:highlight]{highlight()}}
\item For arranging multiple plots: \code{\link[plotly:subplot]{subplot()}}, \code{\link[crosstalk:bscols]{crosstalk::bscols()}}
\item For inspecting plotly objects: \code{\link[plotly:plotly_json]{plotly_json()}}
\item For quick, accurate, and searchable plotly.js reference: \code{\link[plotly:schema]{schema()}}
}
}
\author{
Carson Sievert
}
================================================
FILE: man/pull.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{pull}
\alias{pull}
\alias{pull.Seurat}
\title{Extract a single column}
\usage{
\method{pull}{Seurat}(.data, var = -1, name = NULL, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{var}{A variable specified as:
\itemize{
\item a literal variable name
\item a positive integer, giving the position counting from the left
\item a negative integer, giving the position counting from the right.
}
The default returns the last column (on the assumption that's the
column you've created most recently).
This argument is taken by expression and supports
\link[rlang:topic-inject]{quasiquotation} (you can unquote column
names and column locations).}
\item{name}{An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as \code{var}.}
\item{...}{For use by methods.}
}
\value{
A vector the same size as \code{.data}.
}
\description{
\code{pull()} is similar to \code{$}. It's mostly useful because it looks a little
nicer in pipes, it also works with remote data frames, and it can optionally
name the output.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("pull")}.
}
\examples{
data(pbmc_small)
pbmc_small |> pull(groups)
}
================================================
FILE: man/quo_names.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utilities.R
\name{quo_names}
\alias{quo_names}
\title{Convert array of quosure (e.g. c(col_a, col_b)) into character vector}
\usage{
quo_names(v)
}
\arguments{
\item{v}{A array of quosures (e.g. c(col_a, col_b))}
}
\value{
A character vector
}
\description{
Convert array of quosure (e.g. c(col_a, col_b)) into character vector
}
\keyword{internal}
================================================
FILE: man/rename.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{rename}
\alias{rename}
\alias{rename.Seurat}
\title{Rename columns}
\usage{
\method{rename}{Seurat}(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{For \code{rename()}: <\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}> Use
\code{new_name = old_name} to rename selected variables.
For \code{rename_with()}: additional arguments passed onto \code{.fn}.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are not affected.
\item Column names are changed; column order is preserved.
\item Data frame attributes are preserved.
\item Groups are updated to reflect new names.
}
}
\description{
\code{rename()} changes the names of individual variables using
\code{new_name = old_name} syntax; \code{rename_with()} renames columns using a
function.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("rename")}.
}
\examples{
data(pbmc_small)
pbmc_small |> rename(s_score=nFeature_RNA)
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{mutate}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
================================================
FILE: man/return_arguments_of.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utilities.R
\name{return_arguments_of}
\alias{return_arguments_of}
\title{returns variables from an expression}
\usage{
return_arguments_of(expression)
}
\arguments{
\item{expression}{an expression}
}
\value{
list of symbols
}
\description{
returns variables from an expression
}
================================================
FILE: man/right_join.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{right_join}
\alias{right_join}
\alias{right_join.Seurat}
\title{Mutating joins}
\usage{
\method{right_join}{Seurat}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
}
\arguments{
\item{x, y}{A pair of data frames, data frame extensions (e.g. a tibble), or
lazy data frames (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{by}{A join specification created with \code{\link[dplyr:join_by]{join_by()}}, or a character
vector of variables to join by.
If \code{NULL}, the default, \verb{*_join()} will perform a natural join, using all
variables in common across \code{x} and \code{y}. A message lists the variables so
that you can check they're correct; suppress the message by supplying \code{by}
explicitly.
To join on different variables between \code{x} and \code{y}, use a \code{\link[dplyr:join_by]{join_by()}}
specification. For example, \code{join_by(a == b)} will match \code{x$a} to \code{y$b}.
To join by multiple variables, use a \code{\link[dplyr:join_by]{join_by()}} specification with
multiple expressions. For example, \code{join_by(a == b, c == d)} will match
\code{x$a} to \code{y$b} and \code{x$c} to \code{y$d}. If the column names are the same between
\code{x} and \code{y}, you can shorten this by listing only the variable names, like
\code{join_by(a, c)}.
\code{\link[dplyr:join_by]{join_by()}} can also be used to perform inequality, rolling, and overlap
joins. See the documentation at \link[dplyr:join_by]{?join_by} for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, \code{by = c("a", "b")} joins \code{x$a}
to \code{y$a} and \code{x$b} to \code{y$b}. If variable names differ between \code{x} and \code{y},
use a named character vector like \code{by = c("x_a" = "y_a", "x_b" = "y_b")}.
To perform a cross-join, generating all combinations of \code{x} and \code{y}, see
\code{\link[dplyr:cross_join]{cross_join()}}.}
\item{copy}{If \code{x} and \code{y} are not from the same data source,
and \code{copy} is \code{TRUE}, then \code{y} will be copied into the
same src as \code{x}. This allows you to join tables across srcs, but
it is a potentially expensive operation so you must opt into it.}
\item{suffix}{If there are non-joined duplicate variables in \code{x} and
\code{y}, these suffixes will be added to the output to disambiguate them.
Should be a character vector of length 2.}
\item{...}{Other parameters passed onto methods.}
}
\value{
An object of the same type as \code{x} (including the same groups). The order of
the rows and columns of \code{x} is preserved as much as possible. The output has
the following properties:
\itemize{
\item The rows are affect by the join type.
\itemize{
\item \code{inner_join()} returns matched \code{x} rows.
\item \code{left_join()} returns all \code{x} rows.
\item \code{right_join()} returns matched of \code{x} rows, followed by unmatched \code{y} rows.
\item \code{full_join()} returns all \code{x} rows, followed by unmatched \code{y} rows.
}
\item Output columns include all columns from \code{x} and all non-key columns from
\code{y}. If \code{keep = TRUE}, the key columns from \code{y} are included as well.
\item If non-key columns in \code{x} and \code{y} have the same name, \code{suffix}es are added
to disambiguate. If \code{keep = TRUE} and key columns in \code{x} and \code{y} have
the same name, \code{suffix}es are added to disambiguate these as well.
\item If \code{keep = FALSE}, output columns included in \code{by} are coerced to their
common type between \code{x} and \code{y}.
}
}
\description{
Mutating joins add columns from \code{y} to \code{x}, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
\subsection{Inner join}{
An \code{inner_join()} only keeps observations from \code{x} that have a matching key
in \code{y}.
The most important property of an inner join is that unmatched rows in either
input are not included in the result. This means that generally inner joins
are not appropriate in most analyses, because it is too easy to lose
observations.
}
\subsection{Outer joins}{
The three outer joins keep observations that appear in at least one of the
data frames:
\itemize{
\item A \code{left_join()} keeps all observations in \code{x}.
\item A \code{right_join()} keeps all observations in \code{y}.
\item A \code{full_join()} keeps all observations in \code{x} and \code{y}.
}
}
}
\section{Many-to-many relationships}{
By default, dplyr guards against many-to-many relationships in equality joins
by throwing a warning. These occur when both of the following are true:
\itemize{
\item A row in \code{x} matches multiple rows in \code{y}.
\item A row in \code{y} matches multiple rows in \code{x}.
}
This is typically surprising, as most joins involve a relationship of
one-to-one, one-to-many, or many-to-one, and is often the result of an
improperly specified join. Many-to-many relationships are particularly
problematic because they can result in a Cartesian explosion of the number of
rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting \code{relationship = "many-to-many"}.
In production code, it is best to preemptively set \code{relationship} to whatever
relationship you expect to exist between the keys of \code{x} and \code{y}, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so
they don't warn on them by default, but you should still take extra care when
specifying an inequality join, because they also have the capability to
return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set \code{relationship = "many-to-one"} to enforce this.
Note that in SQL, most database providers won't let you specify a
many-to-many relationship between two tables, instead requiring that you
create a third \emph{junction table} that results in two one-to-many relationships
instead.
}
\section{Methods}{
These functions are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{inner_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("inner_join")}.
\item \code{left_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("left_join")}.
\item \code{right_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("right_join")}.
\item \code{full_join()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("full_join")}.
}
}
\examples{
data(pbmc_small)
tt <- pbmc_small
tt |> right_join(tt |>
distinct(groups) |>
mutate(new_column=1:2) |>
slice(1))
}
\seealso{
Other joins:
\code{\link[dplyr]{cross_join}()},
\code{\link[dplyr]{filter-joins}},
\code{\link[dplyr]{nest_join}()}
}
================================================
FILE: man/rowwise.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{rowwise}
\alias{rowwise}
\alias{rowwise.Seurat}
\title{Group input by rows}
\usage{
\method{rowwise}{Seurat}(data, ...)
}
\arguments{
\item{data}{Input data frame.}
\item{...}{<\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}> Variables to be preserved
when calling \code{\link[dplyr:summarise]{summarise()}}. This is typically a set of variables whose
combination uniquely identify each row.
\strong{NB}: unlike \code{group_by()} you can not create new variables here but
instead you can select multiple variables with (e.g.) \code{everything()}.}
}
\value{
A row-wise data frame with class \code{rowwise_df}. Note that a
\code{rowwise_df} is implicitly grouped by row, but is not a \code{grouped_df}.
}
\description{
\code{rowwise()} allows you to compute on a data frame a row-at-a-time.
This is most useful when a vectorised function doesn't exist.
Most dplyr verbs preserve row-wise grouping. The exception is \code{\link[dplyr:summarise]{summarise()}},
which return a \link[dplyr]{grouped_df}. You can explicitly ungroup with \code{\link[dplyr:ungroup]{ungroup()}}
or \code{\link[dplyr:as_tibble]{as_tibble()}}, or convert to a \link[dplyr]{grouped_df} with \code{\link[dplyr:group_by]{group_by()}}.
}
\section{List-columns}{
Because a rowwise has exactly one row per group it offers a small
convenience for working with list-columns. Normally, \code{summarise()} and
\code{mutate()} extract a groups worth of data with \code{[}. But when you index
a list in this way, you get back another list. When you're working with
a \code{rowwise} tibble, then dplyr will use \code{[[} instead of \code{[} to make your
life a little easier.
}
\examples{
# TODO
}
\seealso{
\code{\link[dplyr:nest_by]{nest_by()}} for a convenient way of creating rowwise data frames
with nested data.
}
================================================
FILE: man/sample_n.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{sample_n}
\alias{sample_n}
\alias{sample_n.Seurat}
\alias{sample_frac}
\alias{sample_frac.Seurat}
\title{Sample n rows from a table}
\usage{
\method{sample_n}{Seurat}(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)
\method{sample_frac}{Seurat}(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)
}
\arguments{
\item{tbl}{A data.frame.}
\item{size}{<\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}>
For \code{sample_n()}, the number of rows to select.
For \code{sample_frac()}, the fraction of rows to select.
If \code{tbl} is grouped, \code{size} applies to each group.}
\item{replace}{Sample with or without replacement?}
\item{weight}{<\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}> Sampling weights.
This must evaluate to a vector of non-negative numbers the same length as
the input. Weights are automatically standardised to sum to 1.}
\item{.env}{DEPRECATED.}
\item{...}{ignored}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}}
\code{sample_n()} and \code{sample_frac()} have been superseded in favour of
\code{\link[dplyr:slice_sample]{slice_sample()}}. While they will not be deprecated in the near future,
retirement means that we will only perform critical bug fixes, so we recommend
moving to the newer alternative.
These functions were superseded because we realised it was more convenient to
have two mutually exclusive arguments to one function, rather than two
separate functions. This also made it to clean up a few other smaller
design issues with \code{sample_n()}/\code{sample_frac}:
\itemize{
\item The connection to \code{slice()} was not obvious.
\item The name of the first argument, \code{tbl}, is inconsistent with other
single table verbs which use \code{.data}.
\item The \code{size} argument uses tidy evaluation, which is surprising and
undocumented.
\item It was easier to remove the deprecated \code{.env} argument.
\item \code{...} was in a suboptimal position.
}
}
\examples{
data(pbmc_small)
pbmc_small |> sample_n(50)
pbmc_small |> sample_frac(0.1)
}
================================================
FILE: man/select.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{select}
\alias{select}
\alias{select.Seurat}
\title{Keep or drop columns using their names and types}
\usage{
\method{select}{Seurat}(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}> One or more unquoted
expressions separated by commas. Variable names can be used as if they
were positions in the data frame, so expressions like \code{x:y} can
be used to select a range of variables.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are not affected.
\item Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if \code{new_name = old_name} form is used.
\item Data frame attributes are preserved.
\item Groups are maintained; you can't select off grouping variables.
}
}
\description{
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. \code{a:f} selects all columns from \code{a} on the left to \code{f} on the
right) or type (e.g. \code{where(is.numeric)} selects all numeric columns).
\subsection{Overview of selection features}{
Tidyverse selections implement a dialect of R where operators make
it easy to select variables:
\itemize{
\item \code{:} for selecting a range of consecutive variables.
\item \code{!} for taking the complement of a set of variables.
\item \code{&} and \code{|} for selecting the intersection or the union of two
sets of variables.
\item \code{c()} for combining selections.
}
In addition, you can use \strong{selection helpers}. Some helpers select specific
columns:
\itemize{
\item \code{\link[tidyselect:everything]{everything()}}: Matches all variables.
\item \code{\link[tidyselect:everything]{last_col()}}: Select last variable, possibly with an offset.
\item \code{\link[dplyr:group_cols]{group_cols()}}: Select all grouping columns.
}
Other helpers select variables by matching patterns in their names:
\itemize{
\item \code{\link[tidyselect:starts_with]{starts_with()}}: Starts with a prefix.
\item \code{\link[tidyselect:starts_with]{ends_with()}}: Ends with a suffix.
\item \code{\link[tidyselect:starts_with]{contains()}}: Contains a literal string.
\item \code{\link[tidyselect:starts_with]{matches()}}: Matches a regular expression.
\item \code{\link[tidyselect:starts_with]{num_range()}}: Matches a numerical range like x01, x02, x03.
}
Or from variables stored in a character vector:
\itemize{
\item \code{\link[tidyselect:all_of]{all_of()}}: Matches variable names in a character vector. All
names must be present, otherwise an out-of-bounds error is
thrown.
\item \code{\link[tidyselect:all_of]{any_of()}}: Same as \code{all_of()}, except that no error is thrown
for names that don't exist.
}
Or using a predicate function:
\itemize{
\item \code{\link[tidyselect:where]{where()}}: Applies a function to all variables and selects those
for which the function returns \code{TRUE}.
}
}
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("select")}.
}
\section{Examples}{
Here we show the usage for the basic selection operators. See the
specific help pages to learn about helpers like \code{\link[dplyr:starts_with]{starts_with()}}.
The selection language can be used in functions like
\code{dplyr::select()}. Let's first attach
the tidyverse:
\if{html}{\out{}}\preformatted{library(tidyverse)
# For better printing
iris <- as_tibble(iris)
}\if{html}{\out{
}}
Select variables by name:
\if{html}{\out{}}\preformatted{starwars |> select(height)
#> # A tibble: 87 x 1
#> height
#>
#> 1 172
#> 2 167
#> 3 96
#> 4 202
#> # i 83 more rows
iris |> select(Sepal.Length)
#> # A tibble: 150 x 1
#> Sepal.Length
#>
#> 1 5.1
#> 2 4.9
#> 3 4.7
#> 4 4.6
#> # i 146 more rows
}\if{html}{\out{
}}
Select multiple variables by separating them with commas. Note how
the order of columns is determined by the order of inputs:
\if{html}{\out{}}\preformatted{starwars |> select(homeworld, height, mass)
#> # A tibble: 87 x 3
#> homeworld height mass
#>
#> 1 Tatooine 172 77
#> 2 Tatooine 167 75
#> 3 Naboo 96 32
#> 4 Tatooine 202 136
#> # i 83 more rows
iris |> select(Sepal.Length, Petal.Length)
#> # A tibble: 150 x 2
#> Sepal.Length Petal.Length
#>
#> 1 5.1 1.4
#> 2 4.9 1.4
#> 3 4.7 1.3
#> 4 4.6 1.5
#> # i 146 more rows
}\if{html}{\out{
}}
If you use a named vector to select columns, the output will have
its columns renamed:
\if{html}{\out{}}\preformatted{selection <- c(
new_homeworld = "homeworld",
new_height = "height",
new_mass = "mass"
)
starwars |> select(all_of(selection))
#> # A tibble: 87 x 3
#> new_homeworld new_height new_mass
#>
#> 1 Tatooine 172 77
#> 2 Tatooine 167 75
#> 3 Naboo 96 32
#> 4 Tatooine 202 136
#> # i 83 more rows
}\if{html}{\out{
}}
\subsection{Operators:}{
The \code{:} operator selects a range of consecutive variables:
\if{html}{\out{}}\preformatted{starwars |> select(name:mass)
#> # A tibble: 87 x 3
#> name height mass
#>
#> 1 Luke Skywalker 172 77
#> 2 C-3PO 167 75
#> 3 R2-D2 96 32
#> 4 Darth Vader 202 136
#> # i 83 more rows
}\if{html}{\out{
}}
The \code{!} operator negates a selection:
\if{html}{\out{}}\preformatted{starwars |> select(!(name:mass))
#> # A tibble: 87 x 11
#> hair_color skin_color eye_color birth_year sex gender homeworld species
#>
#> 1 blond fair blue 19 male masculine Tatooine Human
#> 2 gold yellow 112 none masculine Tatooine Droid
#> 3 white, blue red 33 none masculine Naboo Droid
#> 4 none white yellow 41.9 male masculine Tatooine Human
#> # i 83 more rows
#> # i 3 more variables: films , vehicles , starships
iris |> select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#> Sepal.Width Petal.Width Species
#>
#> 1 3.5 0.2 setosa
#> 2 3 0.2 setosa
#> 3 3.2 0.2 setosa
#> 4 3.1 0.2 setosa
#> # i 146 more rows
iris |> select(!ends_with("Width"))
#> # A tibble: 150 x 3
#> Sepal.Length Petal.Length Species
#>
#> 1 5.1 1.4 setosa
#> 2 4.9 1.4 setosa
#> 3 4.7 1.3 setosa
#> 4 4.6 1.5 setosa
#> # i 146 more rows
}\if{html}{\out{
}}
\code{&} and \code{|} take the intersection or the union of two selections:
\if{html}{\out{}}\preformatted{iris |> select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Width
#>
#> 1 0.2
#> 2 0.2
#> 3 0.2
#> 4 0.2
#> # i 146 more rows
iris |> select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#> Petal.Length Petal.Width Sepal.Width
#>
#> 1 1.4 0.2 3.5
#> 2 1.4 0.2 3
#> 3 1.3 0.2 3.2
#> 4 1.5 0.2 3.1
#> # i 146 more rows
}\if{html}{\out{
}}
To take the difference between two selections, combine the \code{&} and
\code{!} operators:
\if{html}{\out{}}\preformatted{iris |> select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Length
#>
#> 1 1.4
#> 2 1.4
#> 3 1.3
#> 4 1.5
#> # i 146 more rows
}\if{html}{\out{
}}
}
}
\examples{
data(pbmc_small)
pbmc_small |> select(cell, orig.ident)
}
\seealso{
Other single table verbs:
\code{\link[dplyr]{arrange}()},
\code{\link[dplyr]{filter}()},
\code{\link[dplyr]{mutate}()},
\code{\link[dplyr]{reframe}()},
\code{\link[dplyr]{rename}()},
\code{\link[dplyr]{slice}()},
\code{\link[dplyr]{summarise}()}
}
================================================
FILE: man/separate.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tidyr_methods.R
\name{separate}
\alias{separate}
\alias{separate.Seurat}
\title{Separate a character column into multiple columns with a regular
expression or numeric locations}
\usage{
\method{separate}{Seurat}(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
}
\arguments{
\item{data}{A data frame.}
\item{col}{<\code{\link[tidyr:tidyr_tidy_select]{tidy-select}}> Column to expand.}
\item{into}{Names of new variables to create as character vector.
Use \code{NA} to omit the variable in the output.}
\item{sep}{Separator between columns.
If character, \code{sep} is interpreted as a regular expression. The default
value is a regular expression that matches any sequence of
non-alphanumeric values.
If numeric, \code{sep} is interpreted as character positions to split at. Positive
values start at 1 at the far-left of the string; negative value start at -1 at
the far-right of the string. The length of \code{sep} should be one less than
\code{into}.}
\item{remove}{If \code{TRUE}, remove input column from output data frame.}
\item{convert}{If \code{TRUE}, will run \code{\link[=type.convert]{type.convert()}} with
\code{as.is = TRUE} on new columns. This is useful if the component
columns are integer, numeric or logical.
NB: this will cause string \code{"NA"}s to be converted to \code{NA}s.}
\item{extra}{If \code{sep} is a character vector, this controls what
happens when there are too many pieces. There are three valid options:
\itemize{
\item \code{"warn"} (the default): emit a warning and drop extra values.
\item \code{"drop"}: drop any extra values without a warning.
\item \code{"merge"}: only splits at most \code{length(into)} times
}}
\item{fill}{If \code{sep} is a character vector, this controls what
happens when there are not enough pieces. There are three valid options:
\itemize{
\item \code{"warn"} (the default): emit a warning and fill from the right
\item \code{"right"}: fill with missing values on the right
\item \code{"left"}: fill with missing values on the left
}}
\item{...}{Additional arguments passed on to methods.}
}
\value{
`tidyseurat`
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}}
\code{separate()} has been superseded in favour of \code{\link[tidyr:separate_wider_position]{separate_wider_position()}}
and \code{\link[tidyr:separate_wider_delim]{separate_wider_delim()}} because the two functions make the two uses
more obvious, the API is more polished, and the handling of problems is
better. Superseded functions will not go away, but will only receive
critical bug fixes.
Given either a regular expression or a vector of character positions,
\code{separate()} turns a single character column into multiple columns.
}
\examples{
data(pbmc_small)
un <- pbmc_small |> unite("new_col", c(orig.ident, groups))
un |> separate(new_col, c("orig.ident", "groups"))
}
\seealso{
\code{\link[tidyr:unite]{unite()}}, the complement, \code{\link[tidyr:extract]{extract()}} which uses regular
expression capturing groups.
}
================================================
FILE: man/slice.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{slice}
\alias{slice}
\alias{slice.Seurat}
\alias{slice_head}
\alias{slice_tail}
\alias{slice_sample}
\alias{slice_min}
\alias{slice_max}
\alias{slice_sample.Seurat}
\alias{slice_head.Seurat}
\alias{slice_tail.Seurat}
\alias{slice_min.Seurat}
\alias{slice_max.Seurat}
\title{Subset rows using their positions}
\usage{
\method{slice}{Seurat}(.data, ..., .by = NULL, .preserve = FALSE)
\method{slice_sample}{Seurat}(
.data,
...,
n = NULL,
prop = NULL,
by = NULL,
weight_by = NULL,
replace = FALSE
)
\method{slice_head}{Seurat}(.data, ..., n, prop, by = NULL)
\method{slice_tail}{Seurat}(.data, ..., n, prop, by = NULL)
\method{slice_min}{Seurat}(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE
)
\method{slice_max}{Seurat}(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE
)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{For \code{slice()}: <\code{\link[rlang:args_data_masking]{data-masking}}>
Integer row values.
Provide either positive values to keep, or negative values to drop.
The values provided must be either all positive or all negative.
Indices beyond the number of rows in the input are silently ignored.
For \verb{slice_*()}, these arguments are passed on to methods.}
\item{.by, by}{<\code{\link[dplyr:dplyr_tidy_select]{tidy-select}}> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to \code{\link[dplyr:group_by]{group_by()}}. For
details and examples, see \link[dplyr:dplyr_by]{?dplyr_by}.}
\item{.preserve}{Relevant when the \code{.data} input is grouped. If \code{.preserve = FALSE} (the default), the grouping structure is recalculated based on the
resulting data, otherwise the grouping is kept as is.}
\item{n, prop}{Provide either \code{n}, the number of rows, or \code{prop}, the
proportion of rows to select. If neither are supplied, \code{n = 1} will be
used. If \code{n} is greater than the number of rows in the group
(or \code{prop > 1}), the result will be silently truncated to the group size.
\code{prop} will be rounded towards zero to generate an integer number of
rows.
A negative value of \code{n} or \code{prop} will be subtracted from the group
size. For example, \code{n = -2} with a group of 5 rows will select 5 - 2 = 3
rows; \code{prop = -0.25} with 8 rows will select 8 * (1 - 0.25) = 6 rows.}
\item{weight_by}{<\code{\link[rlang:args_data_masking]{data-masking}}> Sampling
weights. This must evaluate to a vector of non-negative numbers the same
length as the input. Weights are automatically standardised to sum to 1.
See the \code{Details} section for more technical details regarding these
weights.}
\item{replace}{Should sampling be performed with (\code{TRUE}) or without
(\code{FALSE}, the default) replacement.}
\item{order_by}{<\code{\link[rlang:args_data_masking]{data-masking}}> Variable or
function of variables to order by. To order by multiple variables, wrap
them in a data frame or tibble.}
\item{with_ties}{Should ties be kept together? The default, \code{TRUE},
may return more rows than you request. Use \code{FALSE} to ignore ties,
and return the first \code{n} rows.}
\item{na_rm}{Should missing values in \code{order_by} be removed from the result?
If \code{FALSE}, \code{NA} values are sorted to the end (like in \code{\link[dplyr:arrange]{arrange()}}), so
they will only be included if there are insufficient non-missing values to
reach \code{n}/\code{prop}.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Each row may appear 0, 1, or many times in the output.
\item Columns are not modified.
\item Groups are not modified.
\item Data frame attributes are preserved.
}
}
\description{
\code{slice()} lets you index rows by their (integer) locations. It allows you
to select, remove, and duplicate rows. It is accompanied by a number of
helpers for common use cases:
\itemize{
\item \code{slice_head()} and \code{slice_tail()} select the first or last rows.
\item \code{slice_sample()} randomly selects rows.
\item \code{slice_min()} and \code{slice_max()} select rows with the smallest or largest
values of a variable.
}
If \code{.data} is a \link[dplyr]{grouped_df}, the operation will be performed on each group,
so that (e.g.) \code{slice_head(df, n = 5)} will select the first five rows in
each group.
}
\details{
Slice does not work with relational databases because they have no
intrinsic notion of row order. If you want to perform the equivalent
operation, use \code{\link[dplyr:filter]{filter()}} and \code{\link[dplyr:row_number]{row_number()}}.
For \code{slice_sample()}, note that the weights provided in \code{weight_by} are
passed through to the \code{prob} argument of \code{\link[base:sample]{base::sample.int()}}. This means
they cannot be used to reconstruct summary statistics from the underlying
population. See \href{https://stats.stackexchange.com/q/639211/}{this discussion}
for more details.
}
\section{Methods}{
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
}
\examples{
data(pbmc_small)
pbmc_small |> slice(1)
# Slice group-wise using .by
pbmc_small |> slice(1:2, .by=groups)
# slice_sample() allows you to random select with or without replacement
pbmc_small |> slice_sample(n=5)
# if using replacement, and duplicate cells are returned, a tibble will be
# returned because duplicate cells cannot exist in Seurat objects
pbmc_small |> slice_sample(n=1, replace=TRUE) # returns Seurat
pbmc_small |> slice_sample(n=100, replace=TRUE) # returns tibble
# weight by a variable
pbmc_small |> slice_sample(n=5, weight_by=nCount_RNA)
# sample by group
pbmc_small |> slice_sample(n=5, by=groups)
# sample using proportions
pbmc_small |> slice_sample(prop=0.10)
# First rows based on existing order
pbmc_small |> slice_head(n=5)
# Last rows based on existing order
pbmc_small |> slice_tail(n=5)
# Rows with minimum and maximum values of a metadata variable
pbmc_small |> slice_min(nFeature_RNA, n=5)
# slice_min() and slice_max() may return more rows than requested
# in the presence of ties.
pbmc_small |> slice_min(nFeature_RNA, n=2)
# Use with_ties=FALSE to return exactly n matches
pbmc_small |> slice_min(nFeature_RNA, n=2, with_ties=FALSE)
# Or use additional variables to break the tie:
pbmc_small |> slice_min(tibble::tibble(nFeature_RNA, nCount_RNA), n=2)
# Use by for group-wise operations
pbmc_small |> slice_min(nFeature_RNA, n=5, by=groups)
# Rows with minimum and maximum values of a metadata variable
pbmc_small |> slice_max(nFeature_RNA, n=5)
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
================================================
FILE: man/summarise.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{summarise}
\alias{summarise}
\alias{summarise.Seurat}
\alias{summarize}
\alias{summarize.Seurat}
\title{Summarise each group down to one row}
\usage{
\method{summarise}{Seurat}(.data, ...)
\method{summarize}{Seurat}(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs of
summary functions. The name will be the name of the variable in the result.
The value can be:
\itemize{
\item A vector of length 1, e.g. \code{min(x)}, \code{n()}, or \code{sum(is.na(y))}.
\item A data frame with 1 row, to add multiple columns from a single expression.
}}
}
\value{
An object \emph{usually} of the same type as \code{.data}.
\itemize{
\item The rows come from the underlying \code{\link[dplyr:group_keys]{group_keys()}}.
\item The columns are a combination of the grouping keys and the summary
expressions that you provide.
\item The grouping structure is controlled by the \verb{.groups=} argument, the
output may be another \link[dplyr]{grouped_df}, a \link[dplyr]{tibble} or a \link[dplyr]{rowwise} data frame.
\item Data frame attributes are \strong{not} preserved, because \code{summarise()}
fundamentally creates a new data frame.
}
}
\description{
\code{summarise()} creates a new data frame. It returns one row for each
combination of grouping variables; if there are no grouping variables, the
output will have a single row summarising all observations in the input. It
will contain one column for each grouping variable and one column for each of
the summary statistics that you have specified.
\code{summarise()} and \code{summarize()} are synonyms.
}
\section{Useful functions}{
\itemize{
\item Center: \code{\link[=mean]{mean()}}, \code{\link[=median]{median()}}
\item Spread: \code{\link[=sd]{sd()}}, \code{\link[=IQR]{IQR()}}, \code{\link[=mad]{mad()}}
\item Range: \code{\link[=min]{min()}}, \code{\link[=max]{max()}},
\item Position: \code{\link[dplyr:first]{first()}}, \code{\link[dplyr:last]{last()}}, \code{\link[dplyr:nth]{nth()}},
\item Count: \code{\link[dplyr:n]{n()}}, \code{\link[dplyr:n_distinct]{n_distinct()}}
\item Logical: \code{\link[=any]{any()}}, \code{\link[=all]{all()}}
}
}
\section{Backend variations}{
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in \code{\link[dplyr:mutate]{mutate()}}.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected
results, consider using new names for your summary variables, especially when
creating multiple summaries.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("summarise")}.
}
\examples{
data(pbmc_small)
pbmc_small |> summarise(mean(nCount_RNA))
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{slice}()}
}
\concept{single table verbs}
================================================
FILE: man/tbl_format_header.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/print_method.R
\name{tbl_format_header}
\alias{tbl_format_header}
\alias{tbl_format_header.tidySeurat}
\title{Format the header of a tibble}
\usage{
\method{tbl_format_header}{tidySeurat}(x, setup, ...)
}
\arguments{
\item{x}{A tibble-like object.}
\item{setup}{A setup object returned from \code{\link[pillar:tbl_format_setup]{tbl_format_setup()}}.}
\item{...}{These dots are for future extensions and must be empty.}
}
\value{
A character vector.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
For easier customization, the formatting of a tibble is split
into three components: header, body, and footer.
The \code{tbl_format_header()} method is responsible for formatting the header
of a tibble.
Override this method if you need to change the appearance
of the entire header.
If you only need to change or extend the components shown in the header,
override or extend \code{\link[pillar:tbl_sum]{tbl_sum()}} for your class which is called by the
default method.
}
\examples{
# TODO
}
================================================
FILE: man/tidy.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methods.R
\name{tidy}
\alias{tidy}
\alias{tidy.Seurat}
\title{tidy for Seurat objects}
\usage{
\method{tidy}{Seurat}(x, ...)
}
\arguments{
\item{x}{A Seurat object}
\item{...}{Additional arguments (not used)}
}
\value{
A tidyseurat object
}
\description{
tidy for Seurat objects
}
================================================
FILE: man/unite.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tidyr_methods.R
\name{unite}
\alias{unite}
\alias{unite.Seurat}
\title{Unite multiple columns into one by pasting strings together}
\usage{
\method{unite}{Seurat}(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
}
\arguments{
\item{data}{A data frame.}
\item{col}{The name of the new column, as a string or symbol.
This argument is passed by expression and supports
\link[rlang:topic-inject]{quasiquotation} (you can unquote strings
and symbols). The name is captured from the expression with
\code{\link[rlang:defusing-advanced]{rlang::ensym()}} (note that this kind of interface where
symbols do not represent actual objects is now discouraged in the
tidyverse; we support it here for backward compatibility).}
\item{...}{<\code{\link[tidyr:tidyr_tidy_select]{tidy-select}}> Columns to unite}
\item{sep}{Separator to use between values.}
\item{remove}{If \code{TRUE}, remove input columns from output data frame.}
\item{na.rm}{If \code{TRUE}, missing values will be removed prior to uniting
each value.}
}
\value{
`tidyseurat`
}
\description{
Convenience function to paste together multiple columns into one.
}
\examples{
data(pbmc_small)
pbmc_small |> unite(
col="new_col",
c("orig.ident", "groups"))
}
\seealso{
\code{\link[tidyr:separate]{separate()}}, the complement.
}
================================================
FILE: man/unnest.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tidyr_methods.R
\name{unnest}
\alias{unnest}
\alias{unnest.tidyseurat_nested}
\alias{unnest_seurat}
\title{Unnest a list-column of data frames into rows and columns}
\usage{
\method{unnest}{tidyseurat_nested}(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop,
.id,
.sep,
.preserve
)
unnest_seurat(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop,
.id,
.sep,
.preserve
)
}
\arguments{
\item{data}{A data frame.}
\item{cols}{<\code{\link[tidyr:tidyr_tidy_select]{tidy-select}}> List-columns to unnest.
When selecting multiple columns, values from the same row will be recycled
to their common size.}
\item{...}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}:
previously you could write \code{df |> unnest(x, y, z)}.
Convert to \code{df |> unnest(c(x, y, z))}. If you previously created a new
variable in \code{unnest()} you'll now need to do it explicitly with \code{mutate()}.
Convert \code{df |> unnest(y = fun(x, y, z))}
to \code{df |> mutate(y = fun(x, y, z)) |> unnest(y)}.}
\item{keep_empty}{By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like \code{NULL} or an empty data frame or vector), then that
entire row will be dropped from the output. If you want to preserve all
rows, use \code{keep_empty = TRUE} to replace size-0 elements with a single row
of missing values.}
\item{ptype}{Optionally, a named list of column name-prototype pairs to
coerce \code{cols} to, overriding the default that will be guessed from
combining the individual values. Alternatively, a single empty ptype
can be supplied, which will be applied to all \code{cols}.}
\item{names_sep}{If \code{NULL}, the default, the outer names will come from the
inner names. If a string, the outer names will be formed by pasting
together the outer and the inner column names, separated by \code{names_sep}.}
\item{names_repair}{Used to check that output data frame has valid
names. Must be one of the following options:
\itemize{
\item \verb{"minimal}": no name repair or checks, beyond basic existence,
\item \verb{"unique}": make sure names are unique and not empty,
\item \verb{"check_unique}": (the default), no name repair, but check they are unique,
\item \verb{"universal}": make the names unique and syntactic
\item a function: apply custom name repair.
\item \link[tidyr]{tidyr_legacy}: use the name repair from tidyr 0.8.
\item a formula: a purrr-style anonymous function (see \code{\link[rlang:as_function]{rlang::as_function()}})
}
See \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}} for more details on these terms and the
strategies used to enforce them.}
\item{.drop, .preserve}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}:
all list-columns are now preserved; If there are any that you
don't want in the output use \code{select()} to remove them prior to
unnesting.}
\item{.id}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}:
convert \code{df |> unnest(x, .id = "id")} to \verb{df |> mutate(id = names(x)) |> unnest(x))}.}
\item{.sep}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}:
use \code{names_sep} instead.}
}
\value{
`tidyseurat`
}
\description{
Unnest expands a list-column containing data frames into rows and columns.
}
\section{New syntax}{
tidyr 1.0.0 introduced a new syntax for \code{nest()} and \code{unnest()} that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using \code{\link[tidyr:nest_legacy]{nest_legacy()}} and \code{\link[tidyr:unnest_legacy]{unnest_legacy()}} as follows:
\if{html}{\out{}}\preformatted{library(tidyr)
nest <- nest_legacy
unnest <- unnest_legacy
}\if{html}{\out{
}}
}
\examples{
data(pbmc_small)
pbmc_small |>
nest(data=-groups) |>
unnest(data)
}
\seealso{
Other rectangling:
\code{\link[tidyr]{hoist}()},
\code{\link[tidyr]{unnest_longer}()},
\code{\link[tidyr]{unnest_wider}()}
}
================================================
FILE: tests/testthat/test-dplyr.R
================================================
context('dplyr test')
library(Seurat)
data("pbmc_small")
set.seed(42)
test_that("arrange", {
pbmc_small |>
arrange(nFeature_RNA) |>
expect_warning(regexp = "`arrange\\(\\)` was deprecated in tidyseurat .*")
# pbmc_small_pca_arranged <- pbmc_small |> arrange(nFeature_RNA) |> Seurat::ScaleData() |> Seurat::FindVariableFeatures() |> Seurat::RunPCA()
# pbmc_small_pca <- pbmc_small |> Seurat::ScaleData() |> Seurat::FindVariableFeatures() |> Seurat::RunPCA()
# expect_equal(
# Seurat::VariableFeatures(pbmc_small_pca_arranged),
# Seurat::VariableFeatures(pbmc_small_pca)
# )
# # Failing only for ATLAS CRAN, but succeding for the rest
# expect_equal(
# pbmc_small_pca_arranged[["pca"]]@cell.embeddings,
# pbmc_small_pca[["pca"]]@cell.embeddings,
# tolerance=0.1
# )
# expect_equal(
# pbmc_small_pca_arranged |> as_tibble() |>dplyr::slice_head(n = 1),
# pbmc_small_pca |> as_tibble() |> dplyr::slice_min(nFeature_RNA, n = 1)
# )
})
test_that("bind_cols", {
pbmc_small_bind <- pbmc_small |> select(nCount_RNA, nFeature_RNA)
pbmc_small |>
ttservice::bind_cols(pbmc_small_bind) |>
select(nCount_RNA...2, nFeature_RNA...3) |>
ncol() |>
expect_equal(2)
})
test_that("distinct", {
expect_equal(pbmc_small |> distinct(groups) |> ncol(), 1)
})
test_that("filter", {
expect_equal(
pbmc_small |> filter(groups == "g1") |> ncol(),
sum(pbmc_small[[]]$groups == "g1")
)
})
test_that("group_by", {
expect_equal(
pbmc_small |> group_by(groups) |> nrow(),
nrow(pbmc_small[[]])
)
})
test_that("summarise", {
expect_equal(pbmc_small |> summarise(mean(nCount_RNA)) |> nrow(), 1)
})
test_that("mutate", {
expect_equal(pbmc_small |> mutate(nFeature_RNA = 1) |> distinct(nFeature_RNA) |> nrow(), 1)
})
test_that("rename", {
expect_equal(pbmc_small |> rename(s_score = nFeature_RNA) |> select(s_score) |> ncol(), 1)
})
test_that("left_join", {
expect_equal(
pbmc_small |> left_join(pbmc_small |> distinct(groups) |> mutate(new_column = 1:2) |> slice(1)) |> ncol(),
nrow(pbmc_small[[]])
)
})
test_that("inner_join", {
expect_equal(
pbmc_small |> inner_join(pbmc_small |> distinct(groups) |> mutate(new_column = 1:2) |> slice(1)) |> ncol(),
sum(pbmc_small[[]]$groups == "g2")
)
})
test_that("right_join", {
expect_equal(
pbmc_small |> right_join(pbmc_small |> distinct(groups) |> mutate(new_column = 1:2) |> slice(1)) |> ncol(),
sum(pbmc_small[[]]$groups == "g2")
)
})
test_that("full_join", {
expect_equal(
pbmc_small |> full_join(tibble::tibble(groups = "g1", other = 1:4)) |> nrow(),
sum(pbmc_small[[]]$groups == "g1") * 4 + sum(pbmc_small[[]]$groups == "g2")
)
})
test_that("slice", {
expect_equal(pbmc_small |> slice(1) |> ncol(), 1)
expect_equal(
pbmc_small |> slice(1:6) |> colnames(),
colnames(pbmc_small) |> head(6))
})
test_that("sample_n", {
expect_equal(pbmc_small |> sample_n(50) |> ncol(), 50)
expect_equal(
pbmc_small |> sample_n(500, replace = TRUE) |> ncol(),
pbmc_small |> as_tibble() |> ncol()
)
})
test_that("slice_sample", {
pbmc_small |>
slice_sample(n = 50) |>
ncol() |>
expect_equal(50)
})
test_that("slice_head", {
pbmc_small |>
slice_head(n = 50) |>
ncol() |>
expect_equal(50)
expect_equal(
colnames(pbmc_small) |> head(n = 50),
pbmc_small |> slice_head(n = 50) |> colnames()
)
})
test_that("slice_tail", {
pbmc_small |>
slice_tail(n = 50) |>
ncol() |>
expect_equal(50)
expect_equal(
colnames(pbmc_small) |> tail(n = 50),
pbmc_small |> slice_tail(n = 50) |> colnames()
)
})
test_that("slice_min", {
pbmc_small |>
slice_min(nFeature_RNA, n = 5) |>
ncol() |>
expect_equal(5)
# Arrange is deprecated
# expect_equal(
# pbmc_small |> as_tibble() |> arrange(nFeature_RNA) |> head(n = 5) %>% pull(.cell),
# pbmc_small |> slice_min(nFeature_RNA, n = 5) |> colnames()
# )
})
test_that("slice_max", {
pbmc_small |>
slice_max(nFeature_RNA, n = 5) |>
ncol() |>
expect_equal(5)
# Arrange is deprecated
# expect_equal(
# pbmc_small |> as_tibble() |> arrange(desc(nFeature_RNA)) |> head(n = 5) %>% pull(.cell),
# pbmc_small |> slice_max(nFeature_RNA, n = 5) |> colnames()
# )
})
test_that("slice_min slice_max tibble input for order_by", {
pbmc_small |>
slice_min(tibble::tibble(nFeature_RNA, nCount_RNA), n = 5) |>
ncol() |>
expect_equal(5)
pbmc_small |>
slice_max(tibble::tibble(nFeature_RNA, nCount_RNA), n = 5) |>
ncol() |>
expect_equal(5)
})
test_that("select", {
expect_equal(pbmc_small |> select(cell, orig.ident) |> class() |> as.character(), "Seurat")
expect_equal(pbmc_small |> select(orig.ident) |> class() |> as.character() |> purrr::pluck(1), "tbl_df")
})
test_that("sample_frac", {
expect_equal(
pbmc_small |> sample_frac(0.1) |> ncol(),
nrow(pbmc_small[[]]) * 0.1
)
expect_equal(
pbmc_small |> sample_frac(10, replace = TRUE) |> ncol(),
pbmc_small |> as_tibble() |> ncol()
)
})
test_that("count", {
expect_equal(
pbmc_small |> count(groups) |> nrow(),
pbmc_small[[]]$groups |> unique() |> length()
)
})
test_that("add_count", {
expect_equal(
pbmc_small |> add_count(groups) |> nrow(),
pbmc_small |> rownames() |> length()
)
})
test_that("rowwise", {
expect_equal(
pbmc_small |> rowwise() |> mutate(m = mean(c(nCount_RNA, nFeature_RNA))) |> purrr::pluck("m", 1),
((pbmc_small[, 1]$nCount_RNA + pbmc_small[, 1]$nFeature_RNA) / 2) |> unname()
)
})
test_that("group_split() works for one variable", {
fd <- pbmc_small |>
group_split(groups)
expect_equal(length(fd), length(unique(pbmc_small$groups)))
})
test_that("group_split() works for combination of variables", {
fd <- pbmc_small |>
group_split(groups, letter.idents)
expect_equal(length(fd), length(unique(pbmc_small$groups)) *
length(unique(pbmc_small$letter.idents)))
})
test_that("group_split() works for one logical statement", {
fd_log <- pbmc_small |>
group_split(groups=="g1")
fd_var <- pbmc_small |>
group_split(groups=="g1")
expect_equal(lapply(fd_var, count), lapply(fd_log, count))
})
test_that("group_split() works for two logical statements", {
fd <- pbmc_small |>
group_split(PC_1>0 & groups=="g1")
fd_counts <- lapply(fd, count)
expect_equal(c(fd_counts[[1]], fd_counts[[2]], use.names = FALSE),
list(75, 5))
})
================================================
FILE: tests/testthat/test-ggplotly_methods.R
================================================
context('ggplot test')
data("pbmc_small")
df <- pbmc_small
df$number <- rnorm(ncol(df))
df$factor <- sample(gl(3, 1, ncol(df)))
test_that("ggplot", {
# cell metadata
p <- ggplot(df, aes(factor, number))
expect_silent(show(p))
expect_s3_class(p, "ggplot")
# assay data
g <- sample(rownames(df), 1)
fd <- join_features(df, g, shape="wide")
p <- ggplot(fd, aes(factor, .data[[g]]))
expect_silent(show(p))
expect_s3_class(p, "ggplot")
# reduced dimensions
p <- ggplot(df, aes(PC_1, PC_2, col=factor))
expect_silent(show(p))
expect_s3_class(p, "ggplot")
})
test_that("plotly", {
# cell metadata
p <- plot_ly(df, x=~factor, y=~number, type="violin")
expect_silent(show(p))
expect_s3_class(p, "plotly")
# assay data
g <- sample(rownames(df), 1)
fd <- join_features(df, g, shape="wide")
p <- plot_ly(fd, x=~factor, y=g, type="violin")
expect_silent(show(p))
expect_s3_class(p, "plotly")
# reduced dimensions
p <- plot_ly(fd, x=~PC_1, y=~PC_2, type="scatter", mode="markers")
expect_silent(show(p))
expect_s3_class(p, "plotly")
})
================================================
FILE: tests/testthat/test-methods.R
================================================
context('methods test')
data("pbmc_small")
test_that("join_features_long", {
pbmc_small |>
join_features("CD3D", shape="long") |>
slice(1) |>
pull(.abundance_RNA) |>
expect_equal(6.35, tolerance = 0.1)
})
test_that("join_features_wide", {
pbmc_small |>
join_features("CD3D", shape="wide") |>
slice(1) |>
pull(CD3D) |>
expect_equal(6.35, tolerance = 0.1)
})
test_that("join_features_default_wide", {
pbmc_small |>
join_features("CD3D") |>
slice(1) |>
pull(CD3D) |>
expect_equal(6.35, tolerance = 0.1)
})
test_that("aggregate_cells() returns expected values", {
# Create pseudo-bulk object for testing
pbmc_pseudo_bulk <-
pbmc_small |>
aggregate_cells(c(groups, letter.idents), assays = "RNA")
# Check row length is unchanged
pbmc_pseudo_bulk |>
distinct(.feature) |>
nrow() |>
expect_equal(pbmc_small |> nrow())
# Check column length is correctly modified
pbmc_pseudo_bulk |>
distinct(.sample) |>
nrow() |>
expect_equal(pbmc_small |>
as_tibble() |>
select(groups, letter.idents) |>
unique() |>
nrow()
)
# Spot check for correctly aggregated count value of ACAP1 gene
pbmc_pseudo_bulk |>
filter(.feature == "ACAP1" & .sample == "g1___A") |>
select(RNA) |>
as.numeric() |>
expect_equal(
Seurat::DietSeurat(pbmc_small, assays = "RNA", features = "ACAP1")[, pbmc_small |>
as_tibble() |>
filter(groups == "g1", letter.idents == "A") |>
pull(.cell)] |>
LayerData() |>
sum())
# Aggregate with tidyselect
pbmc_small |>
aggregate_cells(c(any_of("groups"), letter.idents), assays = "RNA") |>
expect_no_error()
})
test_that("get_abundance_sc_wide", {
expect_equal(
pbmc_small |> get_abundance_sc_wide() |> nrow(),
pbmc_small[[]] |> nrow()
)
expect_equal(
pbmc_small |> get_abundance_sc_wide() |> pull("S100A9") |> sum(),
pbmc_small |> FetchData("S100A9") |> sum(),
tolerance = 0.1
)
})
test_that("get_abundance_sc_long", {
expect_equal(pbmc_small |> get_abundance_sc_long() |> ncol(), 3)
expect_equal(
pbmc_small |> get_abundance_sc_long() |> filter(.feature == "S100A9") |> pull(".abundance_RNA") |> sum(),
pbmc_small |> FetchData("S100A9") |> sum(),
tolerance = 0.1
)
})
================================================
FILE: tests/testthat/test-pillar.R
================================================
context('pillar test')
test_string <- "A small string to test the function of pillar utilities."
test_that("pillar___format_comment", {
test_string |>
pillar___format_comment(width = 20) |>
stringr::str_count("# ") |>
expect_equal(5)
})
test_that("pillar___strwrap2", {
test_string |>
pillar___strwrap2(width = 20, indent = 4) |>
stringr::str_count(" ") |>
expect_equal(c(0, 1, 1, 1, 1))
})
test_that("pillar___wrap", {
test_string |>
pillar___wrap(width = 20) |>
stringr::str_count("\n") |>
expect_equal(3)
})
================================================
FILE: tests/testthat/test-print.R
================================================
context('print test')
data("pbmc_small")
test_that("print", {
text <- capture.output(print(pbmc_small))
expect_equal(grep("Seurat-tibble abstraction", text), 1)
i <- grep(str <- ".*Features=([0-9]+).*", text)
expect_equal(gsub(str, "\\1", text[i]), paste(nrow(pbmc_small)))
i <- grep(str <- ".*Cells=([0-9]+).*", text)
expect_equal(gsub(str, "\\1", text[i]), paste(ncol(pbmc_small)))
})
test_that("glimpse", {
text <- capture.output(glimpse(pbmc_small))
expect_equal(length(text), 37)
})
================================================
FILE: tests/testthat/test-tidyr.R
================================================
context('tidyr test')
data("pbmc_small")
tt <- GetAssayData(pbmc_small, layer = 'counts', assay = "RNA") |> CreateSeuratObject() |> mutate(groups = sprintf("g%s", rep(1:2, dplyr::n()/2)))
test_that("nest_unnest", {
col_names <- colnames(tt[[]]) |> c("cell")
x <- tt |> nest(data = -groups) |> unnest(data) |> Seurat::NormalizeData() |> Seurat::ScaleData() |> Seurat::FindVariableFeatures() |> Seurat::RunPCA()
y <- tt |> Seurat::NormalizeData() |> Seurat::ScaleData() |> Seurat::FindVariableFeatures() |> Seurat::RunPCA()
expect_equal(
x[["pca"]]@cell.embeddings |> as_tibble(rownames = "cell") |> arrange(cell) |> pull(PC_1),
y[["pca"]]@cell.embeddings |> as_tibble(rownames = "cell") |> arrange(cell) |> pull(PC_1)
)
})
test_that("fast_vs_slow_nest", {
expect_identical(
tt |> mutate(groups2 = groups) |> nest(data = -c(groups, groups2)) |> select(-groups2),
tt |> nest(data = -groups)
)
})
test_that("nest_unnest_slice_1", {
expect_equal(
tt |> nest(data = -groups) |> slice(1) |> unnest(data) |> ncol(),
sum(tt[[]]$groups == "g1")
)
})
test_that("unite separate", {
un <- tt |> unite("new_col", c(orig.ident, groups))
se <- un |> separate(col = new_col, into = c("orig.ident", "groups"))
expect_equal(un |> select(new_col) |> slice(1) |> pull(new_col), "SeuratProject_g1")
expect_equal(se |> select(orig.ident) |> ncol(), 1)
})
test_that("extract", {
expect_equal(
tt |> extract(groups, into = "g", regex = "g([0-9])", convert = TRUE) |> pull(g) |> class(),
"integer"
)
})
test_that("pivot_longer", {
expect_equal(
tt |> pivot_longer(c(orig.ident, groups), names_to = "name", values_to = "value") |> class() |> magrittr::extract2(1),
"tbl_df"
)
})
================================================
FILE: tests/testthat/test-utilities.R
================================================
context('utilities test')
data("pbmc_small")
test_that("get_special_column_name_symbol", {
expect_equal(get_special_column_name_symbol(".cell")$symbol, rlang::sym(".cell"))
expect_equal(get_special_column_name_symbol(".cell")$name, c(".cell"))
})
test_that("ping_old_special_column_into_metadata", {
ping_old_special_column_into_metadata(pbmc_small) |>
as_tibble() |>
colnames() |>
purrr::pluck(1) |>
expect_equal("cell")
})
================================================
FILE: tests/testthat.R
================================================
library(testthat)
library(tidyseurat)
test_check("tidyseurat")
================================================
FILE: vignettes/figures_article.Rmd
================================================
---
title: "Code for producing the figures in the article"
author: "Stefano Mangiola"
date: "`r Sys.Date()`"
package: tidyseurat
output:
html_vignette:
toc_float: true
vignette: >
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Code for producing the figures in the article}
%\usepackage[UTF-8]{inputenc}
---
[](https://lifecycle.r-lib.org/articles/stages.html)
```{r include=FALSE}
# Set path to plotly screenshot. We don't run the plotly code chunk as most servers do not have javascript libraries needed for interactive plotting
screenshot <- "../man/figures/plotly.png"
# The chunk below uses Rmd in man/fragments to avoid duplication, as the content is shared with the vignette and README. As suggested here: https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/
visual_cue <- "../man/figures/logo_interaction-01.png"
```
```{r eval=FALSE}
# Article workflow
library(tidyverse)
library(Seurat)
library(SingleR)
library(plotly)
library(tidyHeatmap)
library(ggalluvial)
library(ggplot2)
library(tidyseurat)
options(future.globals.maxSize = 50068 * 1024^2)
# Use colourblind-friendly colours
friendly_cols <- dittoSeq::dittoColors()
# Set theme
custom_theme <-
list(
scale_fill_manual(values = friendly_cols),
scale_color_manual(values = friendly_cols),
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(),
panel.grid.major = element_line(size = 0.2),
panel.grid.minor = element_line(size = 0.1),
text = element_text(size = 9),
legend.position = "bottom",
strip.background = element_blank(),
axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)
)
)
PBMC_clean_scaled_UMAP_cluster_cell_type <- readRDS("dev/PBMC_clean_scaled_UMAP_cluster_cell_type.rds")
```
```{r eval=FALSE}
p1 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
pivot_longer(
c(mito.fraction, S.Score, G2M.Score),
names_to="property",
values_to="Value"
) %>%
mutate(property = factor(property, levels = c("mito.fraction", "G2M.Score", "S.Score"))) %>%
ggplot(aes(sample, Value)) +
geom_boxplot(outlier.size = 0.5 ) +
facet_wrap(~property, scales = "free_y" ) +
custom_theme +
theme(aspect.ratio=1)
```
```{r eval=FALSE}
p2 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
sample_n(20000) %>%
ggplot(aes(UMAP_1, UMAP_2, color=seurat_clusters)) +
geom_point(size=0.05, alpha=0.2) +
custom_theme +
theme(aspect.ratio=1)
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
sample_n(20000) %>%
plot_ly(
x = ~`UMAP_1`,
y = ~`UMAP_2`,
z = ~`UMAP_3`,
color = ~seurat_clusters,
colors = friendly_cols[1:24],sizes = 50, size = 1
)
markers = readRDS("dev/PBMC_marker_df.rds")
```
```{r eval=FALSE}
p3 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
arrange(first.labels) %>%
mutate(seurat_clusters = fct_inorder(seurat_clusters)) %>%
join_features(features=c("CD3D", "HLA-DRB1")) %>%
ggplot(aes(y=seurat_clusters , x=.abundance_SCT, fill=first.labels)) +
geom_density_ridges(bandwidth = 0.2) +
facet_wrap(~ .feature, nrow = 2) +
coord_flip() +
custom_theme
```
```{r eval=FALSE}
# Plot heatmap
p4 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
sample_n(2000) %>%
DoHeatmap(
features = markers$gene,
group.colors = friendly_cols
)
```
```{r eval=FALSE}
p5 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
sample_n(1000) %>%
join_features(features=markers$gene) %>%
mutate(seurat_clusters = as.integer(seurat_clusters)) %>%
filter(seurat_clusters<10) %>%
group_by(seurat_clusters) %>%
# Plot heatmap
heatmap(
.row = .feature,
.column = .cell,
.value = .abundance_SCT,
palette_grouping = list(rep("black",9)),
palette_value = circlize::colorRamp2(c(-1.5, 0, 1.5), c("purple", "black", "yellow")),
# ComplexHeatmap parameters
row_gap = unit(0.1, "mm"), column_gap = unit(0.1, "mm")
) %>%
# Add annotation
add_tile(sample, palette = friendly_cols[1:7]) %>%
add_point(PC_1)
```
```{r eval=FALSE}
p6 =
PBMC_clean_scaled_UMAP_cluster_cell_type %>%
unite("cluster_cell_type", c(first.labels, seurat_clusters), remove=FALSE) %>%
pivot_longer(
c(seurat_clusters, first.labels_single),
names_to = "classification", values_to = "value"
) %>%
ggplot(aes(x = classification, stratum = value, alluvium = cell,
fill = first.labels, label = value)) +
scale_x_discrete(expand = c(1, 1)) +
geom_flow() +
geom_stratum(alpha = .5) +
# geom_text(stat = "stratum", size = 3) +
geom_text_repel(stat = "stratum", size = 3,
nudge_x = 0.05,
direction = "y",
angle = 0,
vjust = 0,
segment.size = 0.2
) +
scale_fill_manual(values = friendly_cols) +
#guides(fill = FALSE) +
coord_flip() +
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(),
panel.grid.major = element_line(size = 0.2),
panel.grid.minor = element_line(size = 0.1),
text = element_text(size = 9),
legend.position = "bottom",
strip.background = element_blank(),
axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)
)
```
================================================
FILE: vignettes/introduction.Rmd
================================================
---
title: "Overview of the tidyseurat package"
author: "Stefano Mangiola"
date: "`r Sys.Date()`"
package: tidyseurat
output:
html_vignette:
toc_float: true
bibliography: tidyseurat.bib
vignette: >
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Overview of the tidyseurat package}
%\usepackage[UTF-8]{inputenc}
---
[](https://lifecycle.r-lib.org/articles/stages.html)
```{r include=FALSE}
# Set path to plotly screenshot. We don't run the plotly code chunk as most servers do not have javascript libraries needed for interactive plotting
screenshot <- "../man/figures/plotly.png"
# The chunk below uses Rmd in man/fragments to avoid duplication, as the content is shared with the vignette and README. As suggested here: https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/
visual_cue <- "../man/figures/logo_interaction-01.png"
```
```{r child="../man/fragments/intro.Rmd"}
```
# Session Info
```{r}
sessionInfo()
```
# References
================================================
FILE: vignettes/tidyseurat.bib
================================================
@article{butler2018integrating,
title={Integrating single-cell transcriptomic data across different conditions, technologies, and species},
author={Butler, Andrew and Hoffman, Paul and Smibert, Peter and Papalexi, Efthymia and Satija, Rahul},
journal={Nature biotechnology},
volume={36},
number={5},
pages={411--420},
year={2018},
publisher={Nature Publishing Group}
}
@article{stuart2019comprehensive,
title={Comprehensive integration of single-cell data},
author={Stuart, Tim and Butler, Andrew and Hoffman, Paul and Hafemeister, Christoph and Papalexi, Efthymia and Mauck III, William M and Hao, Yuhan and Stoeckius, Marlon and Smibert, Peter and Satija, Rahul},
journal={Cell},
volume={177},
number={7},
pages={1888--1902},
year={2019},
publisher={Elsevier}
}
@article{aran2019reference,
title={Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage},
author={Aran, Dvir and Looney, Agnieszka P and Liu, Leqian and Wu, Esther and Fong, Valerie and Hsu, Austin and Chak, Suzanna and Naikawadi, Ram P and Wolters, Paul J and Abate, Adam R and others},
journal={Nature immunology},
volume={20},
number={2},
pages={163--172},
year={2019},
publisher={Nature Publishing Group}
}
@article{cabello2020singlecellsignalr,
title={SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics},
author={Cabello-Aguilar, Simon and Alame, M{\'e}lissa and Kon-Sun-Tack, Fabien and Fau, Caroline and Lacroix, Matthieu and Colinge, Jacques},
journal={Nucleic acids research},
volume={48},
number={10},
pages={e55--e55},
year={2020},
publisher={Oxford University Press}
}
@article{wickham2019welcome,
title={Welcome to the Tidyverse},
author={Wickham, Hadley and Averick, Mara and Bryan, Jennifer and Chang, Winston and McGowan, Lucy D'Agostino and Fran{\c{c}}ois, Romain and Grolemund, Garrett and Hayes, Alex and Henry, Lionel and Hester, Jim and others},
journal={Journal of Open Source Software},
volume={4},
number={43},
pages={1686},
year={2019}
}